Re: bitmapped secondary indexes on an IOT partitioned file.

From: David Aldridge <david@xxxxxxxxxxxxxxxxxx>
To: tomday2@xxxxxxxxx
Date: 14 Aug 2006 10:41:04 -0700
Date: Mon, 14 Aug 2006 11:39:21 -0600

Here's a number of theoretical considerations.

Index-based access is fine for selecting very small numbers of rows, but
as the number of rows required rises then fine-grained partitioning +
table scans with partition pruning become a much more viable option, and
as you say you are limited in your choice of partitioning options with
IOT's.

Furthermore the IOT is forced into a particular physical row order, and
that may not be sympathetic to the order that would most benefit queries
(through reducing clustering factor on common access paths) or data
compression. It also requires more resources to maintain the structure.

Speaking of compression, with a heap table you can get compression based
on all columns of the table, not just the PK columns (which I think is
the restriction you'd face on an IOT).

I expect that access to the overflow columns would also be a significant
performance hit (not tested it though).

I'd run some very careful tests on the relative performance of accessing
the metrics through bitmap indexes -> mapping table -> IOT table ->
overflow in comparison to bitmap indexes -> heap fact table.

Thomas Day wrote:


Fact tables are supposed to be nothing but foreign keys to dimension
tables plus "facts" - numeric measurements.

 From the 10g Concepts manual, on the benefits of IOT -

"Index-organized tables provide faster access to table rows by the
primary key or any key that is a valid prefix of the primary key.
Presence of nonkey columns of a row in the B-tree leaf block itself
avoids an additional block access. Also, because rows are stored in
primary key order, range access by the primary key (or a valid prefix)
involves minimum block accesses.

In order to allow even faster access to frequently accessed columns, you
can use a row overflow segment (as described later) to push out
infrequently accessed nonkey columns from the B-tree leaf block to an
optional (heap-organized) overflow segment. This allows limiting the
size and content of the portion of a row that is actually stored in the
B-tree leaf block, which may lead to a higher number of rows in each
leaf block and a smaller B-tree.

Unlike a configuration of heap-organized table with a primary key index
where primary key columns are stored both in the table and in the index,
there is no such duplication here because primary key column values are
stored only in the B-tree index.

Because rows are stored in primary key order, a significant amount of
additional storage space savings can be obtained through the use of key
compression.

Use of primary-key based logical rowids, as opposed to physical rowids,
in secondary indexes on index-organized tables allows high availability.
This is because, due to the logical nature of the rowids, secondary
indexes do not become unusable even after a table reorganization
operation that causes movement of the base table rows. At the same time,
through the use of physical guess in the logical rowid, it is possible
to get secondary index based index-organized table access performance
that is comparable to performance for secondary index based access to an
ordinary table."

Since the fact tables will be the start point for the majority of
queries I want the access to the result set to be as fast as possible.
A current complaint is the length of time that it takes to get a
result.  I plan to move most of the measurements to the overflow area
and make the IOT portion just the foreign keys to the dimension tables
(the fact table's primary key).  That's what led me to the idea of a
fact table as an IOT.  The fact table consists of it's primary key plus
some measurements.  Since I use an index to enforce that primary key I
have a situation where most of the data that I want from the fact table
can be satisfied from the index alone.  I decided that I might as well
save the space of the table (since the table and the index will
basically duplicate each other's data) and make the table index
organized.  I'm partitioning it by date since most queries are limited
to a single fiscal year.

But I need a mechanism to speed queries that are not via the primary key
or a valid prefix thereof.  That's where the secondary indexes come in.

I'm a little dissapointed that IOT tables cannot be composite
partitioned.  My dimension tables will be range partitioned by date
(because of the date limited nature of the queries) and hash
subpartitioned to spread the I/O over as many spindles as possilbe (each
subpartition goes to its own tablespace and each tablespace will be on a
separate spindle).

However, that's just my thinking at the planning stage.  If you have
experience with the flaws of using IOT fact tables I certainly want to
learn from that experience and not repeat a known mistake.

What are the problems with IOT fact tables?


--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: bitmapped secondary indexes on an IOT partitioned file.
  - From: Thomas Day

References:
- bitmapped secondary indexes on an IOT partitioned file.
  - From: Thomas Day
- Re: bitmapped secondary indexes on an IOT partitioned file.
  - From: David Aldridge
- Re: bitmapped secondary indexes on an IOT partitioned file.
  - From: Thomas Day

Re: bitmapped secondary indexes on an IOT partitioned file.

Other related posts: