Large Tables, Bad Indexes and Fake Statistics

From: "Don Seiler" <don@xxxxxxxxx>
To: oracle_l <oracle-l@xxxxxxxxxxxxx>
Date: Tue, 19 Jun 2007 12:07:00 -0500

I may have mentioned this before, but the archives only showed me a
tangential reference [1].

The setup:
* Oracle 10.2.0.2 on RHEL 3.
* Table FOO is range partitioned by date, total of 4 monthly
partitions (rolling window)
* There are two indexes on FOO, the PK has 10 fields (say A-J), and a
second index has 5 fields (A,C,D from PK and two others K, L).
* Statistics are gathered automatically via 10g default gather_stats_job

The problem:
A partition of FOO is bulk loaded around 10 times a month.  The
partition is created a few days before it will first be loaded into.
We have a recurring problem when, after the partition is loaded into
(2-3 million records per load), queries on FOO that have A, B, C, D in
the WHERE clause will start using the second index.  The fact that
this second index doesn't use column B means it is walking over a HUGE
set of records, practically a complete partition scan.

It seems like an out-of-whack statistics issue.  After I gather stats
on the table (takes 2.5 hours) the queries then use the PK as desired.
To fight this, we had just this month started calling
dbms_stats.set_table_stats and set_index_stats to indicate a full
month's worth of stats before loading.  Then for the rest of the day
we could fool Oracle, until the nightly stats job then set the real
stats.  However this didn't work yesterday.  I'm not sure why, but I
did notice that the dba_tab_statistics table showed a sample_size for
my user-set stats, when I don't believe there was one before.

The solution:
I'm placing blame first on the PK.  A 10-field primary key is absurd,
especially when 82% of the queries (logged via FGA) only go against
those first four fields.  This starts a whole natural-vs-synthetic key
argument that only dictatorial control will resolve.  Anyway, my
theory is that the CBO recognizes that, after the load, the data is
drastically different than the statistics.  Now the touchy-feely part
of the theory is that the CBO then says all bets are off and looks at
the two indexes and sees that the second index is much, much smaller
and will be much easier to deal with.  And that's why it chooses that
one.  Obviously this isn't very technical and my developers aren't
buying into it.  What traces can I do to spell out what the CBO is
thinking and why it is making certain choices?

So, avoiding a change to the PK (as much as it kills me), the long
term proposed solution is to load data into a staging table, build
indexes and gather statistics and then use partition exchange to bring
all of that online in the "real" table.  However, that would required
a sub-partitioning of the FOO table by load units, so we can't just do
it right off.

My near term work-around is to build a non-unique index on those first
four columns.  In my opinion we'd have just this one and not that PK,
but such things are not up to me now.

I'm up for any theories or proposals at this point.  My frustration
level with this design and the refusal to fault it, is at an all-time
high.

[1] //www.freelists.org/archives/oracle-l/01-2007/msg00995.html

--
Don Seiler
oracle blog: http://ora.seiler.us
ultimate: http://www.mufc.us
--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: Large Tables, Bad Indexes and Fake Statistics
  - From: Don Seiler
- Re: Large Tables, Bad Indexes and Fake Statistics
  - From: jaromir nemec

Large Tables, Bad Indexes and Fake Statistics

Other related posts: