RE: Very Strange Query Access Plan

From: "Mercadante, Thomas F (LABOR)" <Thomas.Mercadante@xxxxxxxxxxxxxxxxx>
To: "John Kanagaraj" <john.kanagaraj@xxxxxxxxx>
Date: Thu, 4 Oct 2007 09:02:15 -0400

John,

We were gathering stats with no "METHOD_OPT" option.  And according to
an Oracle SR, the calculation for density is *not* 1/NDV, but:
DENSITY = SUM(1..NDV)(nocc^2)/(T^2)
where T is the number of elements sampled, adjusted like nocc
(i.e. values that span histogram buckets are removed).
basically, for each distinct value (i.e. NDV) we count the number of
occurences of that value (the nocc value) tossing any value that spans
a histogram bucket."

The SR Tech said that the simpler calculation "is a rough approximation
of the formula above."

Funny thing - I tested several scenarios and the first calculation seems
to hold.

He suggested trying histograms with a various number of buckets and
testing the result, taking a 10046 trace to see what is happening.  If I
am not satisfied with my results, to submit a (possible) bug report.

The skew of the data in this table is the real problem.

18,000,000 rows.
Ssn column:

1,289,561 rows with a value of "undefined"
3,656,617 rows with a value of null
625,018 distinct values.

So 4.8 million rows of bad data.

Now, try and find a time to test this without killing my users!

Tom


-----Original Message-----
From: John Kanagaraj [mailto:john.kanagaraj@xxxxxxxxx] 
Sent: Thursday, October 04, 2007 12:42 AM
To: Mercadante, Thomas F (LABOR)
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Very Strange Query Access Plan

Tom,

> Thanks to Alvaro Jose Fernandez & Ric Van Dyke, this is solved.  The
DENSITY
> and CLUSTER FACTOR values in the user_tab_columns for my database
table had
> bad values.  These values are calculated by the DBMS_STATS package.  I
> manually set these to a much lower figure and my problem went away.

Keep in mind that collecting histograms on a column can affect the
DENSITY. i.e. when a histogram exists, DENSITY != 1/NDV and that can
cause lots of issues. I think the whole thing is explained in a paper
by Wolfgang (or Alberto or Jonathan - I forgot who!)

In this case, I am guessing that this occurred because you collected
stats using DBMS_STATS with the METHOD_OPT=>'FOR ALL INDEXED COLUMNS
SIZE SKEWONLY' option, and the sample suddenly showed that there was a
skew? (Guessing with apologies to the BAAG party!)

> I'm still trying to determine what my next steps are.  One definite
step is
> to stop gathering stats for awhile!

As per Dave Ensor as quoted by Wolfgang, "It is only safe to gather
statistics when to do so will make no difference". It is however,
difficult to NOT gather stats. The safest is to backup existing stats
before gathering them, and that is something 10g does automatically.

-- 
John Kanagaraj <><
DB Soft Inc
http://www.linkedin.com/in/johnkanagaraj
http://jkanagaraj.wordpress.com (Sorry - not an Oracle blog!)
** The opinions and facts contained in this message are entirely mine
and do not reflect those of my employer or customers **


--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: Very Strange Query Access Plan
  - From: Greg Rahn

References:
- Re: Very Strange Query Access Plan
  - From: John Kanagaraj

RE: Very Strange Query Access Plan

Other related posts: