RE: Very Strange Query Access Plan

From: "Mercadante, Thomas F (LABOR)" <Thomas.Mercadante@xxxxxxxxxxxxxxxxx>
To: "Greg Rahn" <greg@xxxxxxxxxxxxxxxxxx>
Date: Thu, 4 Oct 2007 14:55:53 -0400
Greg,

I came to the same conclusion (logically) myself just today - that the
sample being taken by dbms_stats (using AUTO_SAMPLE_SIZE) is probably
the culprit in that it is not taking a decent sample.  The bad DENSITY
value calculated by dbms_stats was .007692308 when the query ran
horrible.  If the formula for calculating DENSITY is 1/NDV, then NDV for
this value = 130.  This all points to dbms_stats gathering a bad sample
of data to calculate this number.

Thanks again to everyone who responded.  I learned a hell-of-a-lot from
this discussion!

Tom 


-----Original Message-----
From: Greg Rahn [mailto:greg@xxxxxxxxxxxxxxxxxx] 
Sent: Thursday, October 04, 2007 2:12 PM
To: Mercadante, Thomas F (LABOR)
Cc: John Kanagaraj; oracle-l@xxxxxxxxxxxxx
Subject: Re: Very Strange Query Access Plan

Since you are on 9.2.0.7 the default METHOD_OPT did not collect
histograms.  It wasn't until 10g that the default changed to collect
them.

I think your data is a poster child for a problem that I have seen
several times in recent years.  This problem is with the sample of
data that DBMS_STATS is gathering with.  You don't mentioned what your
ESTIMATE_PERCENT is (if you are using DBMS_STATS.AUTO_SAMPLE_SIZE or a
fixed %).  If the data has a significant skew, generally a high number
of NDV with a very few values dominating the set of values,
AUTO_SAMPLE_SIZE can have challenges getting a representative sample.
This is due to the way AUTO_SAMPLE_SIZE works: that being an algorithm
based on convergence of the values observed.  With skewed data, often
times the convergence happens before an representative NDV is
calculated. In the case of a fixed %, the chosen value is too low to
get a representative NDV count.  I went into a real-world example last
year at Oracle OpenWorld 2006 which is available here:
http://structureddata.org/presentations/

In your case you have >27% (5M of 18M) of the data having 2 values.
That is pretty significant skew.  In this case I would recommend
finding a sample size that yields representative NDV counts or if you
feel comfortable because you know your data well, setting the NDV
manually.

The AUTO_SAMPLE_SIZE in 11g has been enhanced to eliminate the issue
around  NDV and skewed data.  More on that here:
http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_sta
ts/

Hope this helps.

On 10/4/07, Mercadante, Thomas F (LABOR)
<Thomas.Mercadante@xxxxxxxxxxxxxxxxx> wrote:
> John,
>
> We were gathering stats with no "METHOD_OPT" option.  And according to
> an Oracle SR, the calculation for density is *not* 1/NDV, but:
> DENSITY = SUM(1..NDV)(nocc^2)/(T^2)
> where T is the number of elements sampled, adjusted like nocc
> (i.e. values that span histogram buckets are removed).
> basically, for each distinct value (i.e. NDV) we count the number of
> occurences of that value (the nocc value) tossing any value that spans
> a histogram bucket."
>
> The SR Tech said that the simpler calculation "is a rough
approximation
> of the formula above."
>
> Funny thing - I tested several scenarios and the first calculation
seems
> to hold.
>
> He suggested trying histograms with a various number of buckets and
> testing the result, taking a 10046 trace to see what is happening.  If
I
> am not satisfied with my results, to submit a (possible) bug report.
>
> The skew of the data in this table is the real problem.
>
> 18,000,000 rows.
> Ssn column:
>
> 1,289,561 rows with a value of "undefined"
> 3,656,617 rows with a value of null
> 625,018 distinct values.
>
> So 4.8 million rows of bad data.
>
> Now, try and find a time to test this without killing my users!
>
> Tom

-- 
Regards,

Greg Rahn
http://structureddata.org


--
//www.freelists.org/webpage/oracle-l
References:
- Re: Very Strange Query Access Plan
  - From: Greg Rahn
RE: Very Strange Query Access Plan

Other related posts: