Re: Mythical space savings of IOTs

  • From: Bill Coulam <bcoulam@xxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 8 Jul 2005 17:09:29 -0500

Just stumbled on my answer. Now if anyone can answer WHY this works,
I'd love to hear from you.

As you recall, my 1M row test case came to 127MB with all related
objects included. I was only able to achieve 88MB in my IOT test. No
matter what I did, the index_stats view was showing only 47-50%
utilization of the index blocks in the IOT. Which was frustrating.

There was one aspect of my test that wasn't honest with reality
though. The original table is composite partitioned, by range and then
by hash (legacy model; not mine). I simplified my tests with a regular
IOT. So I lucked out by deciding my last test would be to see what a
partitioned IOT would do. I expected at least 10% greater space usage.
Instead I got a 71% savings!

Again the sample was built on normal heap model, but this time
partitioned by range on start_date and subpartitioned by hash on
line_id (4 subs per part). Total space of objects: 140MB.

Then I rebuilt it using IOT partitioned by range on start_date.
Subparts weren't possible. Lo and behold, after analyzing the IOT,
index_stats shows me 93% utilization of the BTREE_SPACE. Total space
of IOT object: 41.25MB !!!!!

Yee-haw!

I just can't fathom why a normal IOT would reserve so much space in
the index blocks, but a partitioned IOT does not. I changed nothing
else about the COMPRESS, PCTTHRESHOLD, or PCTFREE.

Go figure.

Please.

Thanks!

bc

On 7/8/05, Bill Coulam <bcoulam@xxxxxxxxx> wrote:
> Most documents on IOTs allude to space savings that can be realized by
> eliminating the extra storage required by columns in both the table
> and the PK. They hint at more savings that can be achieved by using
> index compression. I have not been able to approach the space savings
> I was hoping for and was wondering if anyone would care to comment on
> their experiences with IOTs being 1 to 3X bigger than the object they
> were meant to replace.
> 
> I've used IOTs in the past on small, PK-driven lookup tables, but
> never on really big tables.
> 
> Background:
> In our industry, we have some entities that contain what amounts to
> order line items, and order line item details . The details table
> (called components) is in billions of rows (terabtyes) at our clients.
> The components table is modeled in a "vertical" fashion. Each row is
> composed of about 7 fields that make up the unique key, followed by a
> text, numerical or date value, followed by a handful of optional,
> nullable columns. This allows our application to be flexible as to the
> number and type of components allowed for each line item, which in
> turn allows us to resell the app within different markets with
> different business rules without modification.
> 
> On a 1Million row CTAS sample of the components table, the table
> required about 59MB (from user_segments), and the composite UK
> required about 50MB (the UK is almost all the rows in the table).
> There's another global, single column index that takes up another
> 18MB.  So 127MB in all.
> 
> Since our largest client used to have an in-house,
> specific-to-their-company, "flat" component structure that took up
> much less room (where each component is another column on a single
> line item row), they are demanding a 60-80% decrease in storage space
> required by our app.
> 
> We've looked at some other improvements that should net us about 27%
> savings. But the big win I was hoping for was rebuilding the table as
> an IOT so that 1M rows would only take up about 55MB, instead of 127MB
> (the 18MB global index would be moot and dropped for the IOT).
> 
> After playing with everything I can under the ORGANIZATION INDEX
> syntax, the best I've been able to do is an IOT that says it is using
> 88MB. So I've hit 31% savings. Not quite what I was hoping. Doing
> ANALYZE INDEX...VALIDATE STRUCTURE was key for me in that it I wasn't
> able to get the IOT under 128MB until I used the OPT_CMPR_COUNT value
> in index_stats to lower the index key compression. That's when I
> finally dropped to 88MB. Nothing else I've tried, from PCTFREE,
> PCTTHRESHOLD, INCLUDING, using /*+ APPEND */ to load vs. not, and
> PCTFREE and PCTUSED on the empty overflow segment have yielded any
> further savings. I noticed in index_stats, that my IOT is only using
> 47% of the BTREE_SPACE allocated to the PK/IOT segment. Why? I'm not
> that familiar with block internals, so feel free to school me. If
> there were a way to make the index use most of each index block, I'd
> be down to 40MB, which is exactly where I want to be.
> 
> If it's helpful, here is an altered version of the DDL I'm using in my tests:
> CREATE TABLE our_component_table
> (
>   line_id         NUMBER NOT NULL,
>   component_id    NUMBER NOT NULL,
>   point_id        NUMBER NOT NULL,
>   counterpoint_id NUMBER,
>   start_date      DATE NOT NULL,
>   end_date        DATE NOT NULL,
>   seldom_used_val VARCHAR2(100),
>   num_val         NUMBER,
>   txt_val         VARCHAR2(30),
>   dt_val          DATE,
>   optional_col_1  NUMBER, -- NULL at most clients
>   optional_col_2  VARCHAR2(100),  -- NULL at most clients
>   parent_point_id NUMBER, -- NULL at most clients
>   optional_col_3  VARCHAR2(50), -- NULL at most clients
>   optional_col_4  NUMBER, -- NULL at most clients
>   CONSTRAINT nmsc_pk PRIMARY KEY (line_id, component_id, start_date,
> point_id, counterpoint_id, seldom_used_val, optional_col_1,
> optional_col_2, optional_col_3, optional_col_4)
> )
> ORGANIZATION INDEX
> PCTFREE 5
> PCTTHRESHOLD 5
> COMPRESS 3
> TABLESPACE our_tablespace
> ;
> 
> On the real table, it is partitioned by month on start_date, and for
> this test, I'm using a client-specific version of the large UK to
> eliminate the unused columns, which is why I can get 40MB of
> USED_SPACE instead of the expected 50-55MB.
> 
> My biggest complaint is that simply switching from a regular table to
> an IOT should have saved me at least the redundant space used by the
> table, minus a little overhead for logical ROWID, etc. Instead, with
> no key compression and no other indexes but the large PK, adding
> ORGANIZATION INDEX to the same DDL, yielded a segment that was over
> twice as large as the original heap table OR the original B-TREE UK
> alone. That's not what I call eliminating the redundant data storage
> problem.
> 
> Thoughts?
> 
> --
> bill coulam
> bcoulam@xxxxxxxxx
> 


-- 
bill coulam
bcoulam@xxxxxxxxx
--
//www.freelists.org/webpage/oracle-l

Other related posts: