Re: What are the gory details behind DBMS_STATS.AUTO_SAMPLE_SIZE?

  • From: "Ghassan Salem" <salem.ghassan@xxxxxxxxx>
  • To: sac@xxxxxxxxxxxxx
  • Date: Tue, 6 Jun 2006 11:49:51 +0200

dbms_stats tries first to get an idea of the volume of the tables, and so
does a first sampling, and depending on the result tries to find a suitable
sample size. Hence the multiple scans.
but beware, the calculation queries/algos keep changing.
rgds

On 6/6/06, Schultz, Charles <sac@xxxxxxxxxxxxx> wrote:

Thanks. I assume you meant a 10046 Level 4 trace. I tried a level 12 trace (just because I always regret it when I don't), and it does expose some rather interesting pieces of code, just like Niall promised. I asked a Metalink engineer and haven't heard anything official, yet. Or even unofficial for that matter.

 ------------------------------
*From:* John Kanagaraj [mailto:john.kanagaraj@xxxxxxx]
*Sent:* Monday, June 05, 2006 5:16 PM
*To:* Schultz, Charles; oracle-l
*Subject:* RE: What are the gory details behind
DBMS_STATS.AUTO_SAMPLE_SIZE?

 Charles,

I have found some interesting stuff when I turned on a level 4 trace on a
DBMS_STATS collection session using AUTO_SAMPLE_SIZE. Unfortunately, I did
not document what I saw, but I do remember that it performed multiple
scans.... Not a Good Thing (tm).

Regards,

John Kanagaraj <><
DB Soft Inc
Phone: 408-970-7002 (W)

http://www.klove.com - Positive, encouraging music 24x7 worldwide

** The opinions and facts contained in this message are entirely mine and
do not reflect those of my employer or customers **

 ------------------------------
*From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Schultz, Charles
*Sent:* Friday, June 02, 2006 10:52 AM
*To:* oracle-l
*Subject:* What are the gory details behind DBMS_STATS.AUTO_SAMPLE_SIZE?

 I am curious how the algorithms behind DBMS_STATS.AUTO_SAMPLE_SIZE actually
pick a sample size (10.2.0.1). We have witnessed a case where the sample
size for a particular column was reduced 91.6%, causing the number of
distinct values to be inaccurate, further causing a query to choose an FTS
rather than an index. As far as I can tell, the skew has not changed that
much, if at all, and the overall volume has increased by ~1%.

I am aware of workarounds (compute stats, lock stats, use an outline,
etc), but like I said, I am really curious why the algorithm made such a
drastic jump in the first place. I could not easily locate an appropriate
white paper on metalinks, and I get too many hits on Google. Can anyone help
satiate my burning desire to know? =)

*charles schultz*
*oracle dba*
*aits - adsd*
*university of illinois*


Other related posts: