RE: Mythical space savings of IOTs

Bill,

        While IOT's do save you some space they can cause you to consume
space at the same time.  In my experience I've saved the most space when
the table's primary key and columns are equal.  Otherwise the extra
columns end up in the overflow area and I believe that ends up being one
block per row minimum.  Therefore whereas user_segments says your not
really saving any space a run of dbms_space.unused_space may well
provide some serious surprises.  As far as your customer, using a flat
structure may appear to save them some space their probably wasting a
more precious resource, namely time, to serially read all of that data
over & over.  I'd say that their storage goals are is need of serious
revision & reality check.  I mean your not talking about a GB of space
altogether.  All of the PC's I know of today have multi GB disks for
peanuts.  Sounds like a "penny wise, pound foolish" argument to me. 

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Bill Coulam
Sent: Friday, July 08, 2005 3:57 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: Mythical space savings of IOTs

Most documents on IOTs allude to space savings that can be realized by
eliminating the extra storage required by columns in both the table
and the PK. They hint at more savings that can be achieved by using
index compression. I have not been able to approach the space savings
I was hoping for and was wondering if anyone would care to comment on
their experiences with IOTs being 1 to 3X bigger than the object they
were meant to replace.

I've used IOTs in the past on small, PK-driven lookup tables, but
never on really big tables.

Background:
In our industry, we have some entities that contain what amounts to
order line items, and order line item details . The details table
(called components) is in billions of rows (terabtyes) at our clients.
The components table is modeled in a "vertical" fashion. Each row is
composed of about 7 fields that make up the unique key, followed by a
text, numerical or date value, followed by a handful of optional,
nullable columns. This allows our application to be flexible as to the
number and type of components allowed for each line item, which in
turn allows us to resell the app within different markets with
different business rules without modification.

On a 1Million row CTAS sample of the components table, the table
required about 59MB (from user_segments), and the composite UK
required about 50MB (the UK is almost all the rows in the table).
There's another global, single column index that takes up another
18MB.  So 127MB in all.

Since our largest client used to have an in-house,
specific-to-their-company, "flat" component structure that took up
much less room (where each component is another column on a single
line item row), they are demanding a 60-80% decrease in storage space
required by our app.

We've looked at some other improvements that should net us about 27%
savings. But the big win I was hoping for was rebuilding the table as
an IOT so that 1M rows would only take up about 55MB, instead of 127MB
(the 18MB global index would be moot and dropped for the IOT).

After playing with everything I can under the ORGANIZATION INDEX
syntax, the best I've been able to do is an IOT that says it is using
88MB. So I've hit 31% savings. Not quite what I was hoping. Doing
ANALYZE INDEX...VALIDATE STRUCTURE was key for me in that it I wasn't
able to get the IOT under 128MB until I used the OPT_CMPR_COUNT value
in index_stats to lower the index key compression. That's when I
finally dropped to 88MB. Nothing else I've tried, from PCTFREE,
PCTTHRESHOLD, INCLUDING, using /*+ APPEND */ to load vs. not, and
PCTFREE and PCTUSED on the empty overflow segment have yielded any
further savings. I noticed in index_stats, that my IOT is only using
47% of the BTREE_SPACE allocated to the PK/IOT segment. Why? I'm not
that familiar with block internals, so feel free to school me. If
there were a way to make the index use most of each index block, I'd
be down to 40MB, which is exactly where I want to be.

If it's helpful, here is an altered version of the DDL I'm using in my
tests:
CREATE TABLE our_component_table
(
  line_id         NUMBER NOT NULL,
  component_id    NUMBER NOT NULL,
  point_id        NUMBER NOT NULL,
  counterpoint_id NUMBER,
  start_date      DATE NOT NULL,
  end_date        DATE NOT NULL,
  seldom_used_val VARCHAR2(100),
  num_val         NUMBER,
  txt_val         VARCHAR2(30),
  dt_val          DATE,
  optional_col_1  NUMBER, -- NULL at most clients
  optional_col_2  VARCHAR2(100),  -- NULL at most clients
  parent_point_id NUMBER, -- NULL at most clients
  optional_col_3  VARCHAR2(50), -- NULL at most clients
  optional_col_4  NUMBER, -- NULL at most clients
  CONSTRAINT nmsc_pk PRIMARY KEY (line_id, component_id, start_date,
point_id, counterpoint_id, seldom_used_val, optional_col_1,
optional_col_2, optional_col_3, optional_col_4)
)
ORGANIZATION INDEX
PCTFREE 5
PCTTHRESHOLD 5
COMPRESS 3
TABLESPACE our_tablespace
;

On the real table, it is partitioned by month on start_date, and for
this test, I'm using a client-specific version of the large UK to
eliminate the unused columns, which is why I can get 40MB of
USED_SPACE instead of the expected 50-55MB.

My biggest complaint is that simply switching from a regular table to
an IOT should have saved me at least the redundant space used by the
table, minus a little overhead for logical ROWID, etc. Instead, with
no key compression and no other indexes but the large PK, adding
ORGANIZATION INDEX to the same DDL, yielded a segment that was over
twice as large as the original heap table OR the original B-TREE UK
alone. That's not what I call eliminating the redundant data storage
problem.

Thoughts?

-- 
bill coulam
bcoulam@xxxxxxxxx
--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l

Other related posts: