Since you are on 9.2.0.7 the default METHOD_OPT did not collect histograms. It wasn't until 10g that the default changed to collect them. I think your data is a poster child for a problem that I have seen several times in recent years. This problem is with the sample of data that DBMS_STATS is gathering with. You don't mentioned what your ESTIMATE_PERCENT is (if you are using DBMS_STATS.AUTO_SAMPLE_SIZE or a fixed %). If the data has a significant skew, generally a high number of NDV with a very few values dominating the set of values, AUTO_SAMPLE_SIZE can have challenges getting a representative sample. This is due to the way AUTO_SAMPLE_SIZE works: that being an algorithm based on convergence of the values observed. With skewed data, often times the convergence happens before an representative NDV is calculated. In the case of a fixed %, the chosen value is too low to get a representative NDV count. I went into a real-world example last year at Oracle OpenWorld 2006 which is available here: http://structureddata.org/presentations/ In your case you have >27% (5M of 18M) of the data having 2 values. That is pretty significant skew. In this case I would recommend finding a sample size that yields representative NDV counts or if you feel comfortable because you know your data well, setting the NDV manually. The AUTO_SAMPLE_SIZE in 11g has been enhanced to eliminate the issue around NDV and skewed data. More on that here: http://structureddata.org/2007/09/17/oracle-11g-enhancements-to-dbms_stats/ Hope this helps. On 10/4/07, Mercadante, Thomas F (LABOR) <Thomas.Mercadante@xxxxxxxxxxxxxxxxx> wrote: > John, > > We were gathering stats with no "METHOD_OPT" option. And according to > an Oracle SR, the calculation for density is *not* 1/NDV, but: > DENSITY = SUM(1..NDV)(nocc^2)/(T^2) > where T is the number of elements sampled, adjusted like nocc > (i.e. values that span histogram buckets are removed). > basically, for each distinct value (i.e. NDV) we count the number of > occurences of that value (the nocc value) tossing any value that spans > a histogram bucket." > > The SR Tech said that the simpler calculation "is a rough approximation > of the formula above." > > Funny thing - I tested several scenarios and the first calculation seems > to hold. > > He suggested trying histograms with a various number of buckets and > testing the result, taking a 10046 trace to see what is happening. If I > am not satisfied with my results, to submit a (possible) bug report. > > The skew of the data in this table is the real problem. > > 18,000,000 rows. > Ssn column: > > 1,289,561 rows with a value of "undefined" > 3,656,617 rows with a value of null > 625,018 distinct values. > > So 4.8 million rows of bad data. > > Now, try and find a time to test this without killing my users! > > Tom -- Regards, Greg Rahn http://structureddata.org -- //www.freelists.org/webpage/oracle-l