RE: DBMS_STATS [resend chomped version]

  • From: "Mercadante, Thomas F" <thomas.mercadante@xxxxxxxxxxxxxxxxx>
  • To: "'oracle-l@xxxxxxxxxxxxx'" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 21 Jun 2004 07:53:31 -0400

I use the AUTO variable in all my stats gathering processes.  Call me naive,
but if the Oracle docs says to use AUTO, then I will use it.  I get the
feeling that AUTO means complete, because the "sample size" variable in the
USER_TABLES view always has the total number of rows.  Maybe a future
release will be different, but I'm happy with how it's working.

Tom Mercadante
Oracle Certified Professional


-----Original Message-----
From: Niall Litchfield [mailto:niall.litchfield@xxxxxxxxx] 
Sent: Monday, June 21, 2004 6:58 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: DBMS_STATS [resend chomped version]


Comments in-line and thinking aloud as well

On Sun, 20 Jun 2004 21:20:41 -0500, DENNIS WILLIAMS
<dwilliams@xxxxxxxxxxxxx> wrote:
> 
> Jonathan
>    Very provocative ideas.
> 1. I'm assuming you are referring to the AUTO sample size when you say 
> don't let the database work out a sample size? Does anybody know how 
> that works? I hate to trust something if I have no idea how it works.

I haven't done this (I tend to get a blind spot with the word AUTO I like
automating things, but only if I can do it manually :) )

> 2. The study of statistics has a branch related to sample size. 
> Pollsters use that to figure out how many random samples can provide a 
> valid estimate. I'm searching for my old college statistics textbook 
> right now.

There are a number of factors that affect the ideal random sample size, 1 is
the size of the population, 1 is the confidence interval that you want (the
+- bit that says table SALES_HISTORY has 7million rows <b>+-</b> 5% ). and a
thrid is how confident you wish to be about the accuracy of your stats
(typically 95% or 99% accurate).

> 3. I like your idea of creating a table to hold the sample size for 
> each table in the schema. Other columns could hold the reanalyze 
> interval, date of next analyze, etc. This would ensure each table was 
> analyzed appropriately.

I like the idea as well, with the following caveats.

I'd be more than a bit worried that it might turn out like Mike Hordilla's
package for index reorgs that was on OTN a while back to be a highly
complicated, flexible and efficient way of doing something that might well
be best left undone :(. Oh and it might suffer in a similar way from a lack
of understanding by end-users as to what it was doing and why.

For example what my comments in 1 above mean - if correct and its a while
since I was at college - is that the package would need to answer the
following question about each analyzed segment. Do the current stats meet my
accuracy requirements? Are the stats 'Good Enough' in other words. This is
because we wouldn't wish to calculate an 'interval' for running stats for
each table, but calculate the degree of confidence that we have that the
stats are 'good enough'. Unfortunately there is rather a 'gotcha' with this
approach (unless there is a smart stats way around this) which is that in
order to answer the 'are my stats' good enough?'  question we need to know
how large our population is (to at least a good degree of accuracy) - in
order to to this we need to collect stats :(. We can of course project
population deltas for any given segment from past history, but frequently
the past is no good guide to the future - in particular changes in busines
process, application software or economic environment can and do affect the
rate of change of data in our commercial databases.

I think what I am saying that if we buy the argument that one should not
sample every x days (hours, years), because we care about the
appropriateness of the stats and not their 'freshness' then building an
automated system to gather appropriate stats becomes a non-trivial task.

Some googling discovered http://www.surveysystem.com/sscalc.htm which gives
a good coverage of the relevant stats.

> 4. My boss heard some Oracle expert say that 30% was the best sample 
> size. Does anyone have an idea of the source?

someone who wasn't a statistician :) 


-- 
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx put
'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: