Re: Large Tables, Bad Indexes and Fake Statistics

From: "Don Seiler" <don@xxxxxxxxx>
To: oracle_l <oracle-l@xxxxxxxxxxxxx>
Date: Fri, 17 Aug 2007 14:32:50 -0500
I just wanted to follow-up on this issue.  Thanks to Wolfgang
Breitling, I've got the issue resolved.  I wrote a summary on my blog,
http://ora.seiler.us/2007/08/dr-statslove-or-how-i-learned-to-stop.html

Hopefully it makes sense.  Try not to laugh too hard at my expense.

Thanks again to Wolfgang!

Don.

On 6/19/07, Don Seiler <don@xxxxxxxxx> wrote:
> I may have mentioned this before, but the archives only showed me a
> tangential reference [1].
>
> The setup:
> * Oracle 10.2.0.2 on RHEL 3.
> * Table FOO is range partitioned by date, total of 4 monthly
> partitions (rolling window)
> * There are two indexes on FOO, the PK has 10 fields (say A-J), and a
> second index has 5 fields (A,C,D from PK and two others K, L).
> * Statistics are gathered automatically via 10g default gather_stats_job
>
> The problem:
> A partition of FOO is bulk loaded around 10 times a month.  The
> partition is created a few days before it will first be loaded into.
> We have a recurring problem when, after the partition is loaded into
> (2-3 million records per load), queries on FOO that have A, B, C, D in
> the WHERE clause will start using the second index.  The fact that
> this second index doesn't use column B means it is walking over a HUGE
> set of records, practically a complete partition scan.
>
> It seems like an out-of-whack statistics issue.  After I gather stats
> on the table (takes 2.5 hours) the queries then use the PK as desired.
>  To fight this, we had just this month started calling
> dbms_stats.set_table_stats and set_index_stats to indicate a full
> month's worth of stats before loading.  Then for the rest of the day
> we could fool Oracle, until the nightly stats job then set the real
> stats.  However this didn't work yesterday.  I'm not sure why, but I
> did notice that the dba_tab_statistics table showed a sample_size for
> my user-set stats, when I don't believe there was one before.
>
> The solution:
> I'm placing blame first on the PK.  A 10-field primary key is absurd,
> especially when 82% of the queries (logged via FGA) only go against
> those first four fields.  This starts a whole natural-vs-synthetic key
> argument that only dictatorial control will resolve.  Anyway, my
> theory is that the CBO recognizes that, after the load, the data is
> drastically different than the statistics.  Now the touchy-feely part
> of the theory is that the CBO then says all bets are off and looks at
> the two indexes and sees that the second index is much, much smaller
> and will be much easier to deal with.  And that's why it chooses that
> one.  Obviously this isn't very technical and my developers aren't
> buying into it.  What traces can I do to spell out what the CBO is
> thinking and why it is making certain choices?
>
> So, avoiding a change to the PK (as much as it kills me), the long
> term proposed solution is to load data into a staging table, build
> indexes and gather statistics and then use partition exchange to bring
> all of that online in the "real" table.  However, that would required
> a sub-partitioning of the FOO table by load units, so we can't just do
> it right off.
>
> My near term work-around is to build a non-unique index on those first
> four columns.  In my opinion we'd have just this one and not that PK,
> but such things are not up to me now.
>
> I'm up for any theories or proposals at this point.  My frustration
> level with this design and the refusal to fault it, is at an all-time
> high.
>
> [1] //www.freelists.org/archives/oracle-l/01-2007/msg00995.html
>
> --
> Don Seiler
> oracle blog: http://ora.seiler.us
> ultimate: http://www.mufc.us
>


-- 
Don Seiler
oracle: http://ora.seiler.us
ultimate: http://www.mufc.us
--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- Re: Large Tables, Bad Indexes and Fake Statistics
  - From: Greg Rahn
Re: Large Tables, Bad Indexes and Fake Statistics

Other related posts: