Re: DBMS_STATS [resend chomped version]

  • From: Niall Litchfield <niall.litchfield@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Mon, 21 Jun 2004 11:57:52 +0100

Comments in-line and thinking aloud as well

On Sun, 20 Jun 2004 21:20:41 -0500, DENNIS WILLIAMS
<dwilliams@xxxxxxxxxxxxx> wrote:
> 
> Jonathan
>    Very provocative ideas.
> 1. I'm assuming you are referring to the AUTO sample size when you say don't
> let the database work out a sample size? Does anybody know how that works? I
> hate to trust something if I have no idea how it works.

I haven't done this (I tend to get a blind spot with the word AUTO I
like automating things, but only if I can do it manually :) )

> 2. The study of statistics has a branch related to sample size. Pollsters
> use that to figure out how many random samples can provide a valid estimate.
> I'm searching for my old college statistics textbook right now.

There are a number of factors that affect the ideal random sample
size, 1 is the size of the population, 1 is the confidence interval
that you want (the +- bit that says table SALES_HISTORY has 7million
rows <b>+-</b> 5% ). and a thrid is how confident you wish to be about
the accuracy of your stats (typically 95% or 99% accurate).

> 3. I like your idea of creating a table to hold the sample size for each
> table in the schema. Other columns could hold the reanalyze interval, date
> of next analyze, etc. This would ensure each table was analyzed
> appropriately.

I like the idea as well, with the following caveats.

I'd be more than a bit worried that it might turn out like Mike
Hordilla's package for index reorgs that was on OTN a while back to be
a highly complicated, flexible and efficient way of doing something
that might well be best left undone :(. Oh and it might suffer in a
similar way from a lack of understanding by end-users as to what it
was doing and why.

For example what my comments in 1 above mean - if correct and its a
while since I was at college - is that the package would need to
answer the following question about each analyzed segment. Do the
current stats meet my accuracy requirements? Are the stats 'Good
Enough' in other words. This is because we wouldn't wish to calculate
an 'interval' for running stats for each table, but calculate the
degree of confidence that we have that the stats are 'good enough'.
Unfortunately there is rather a 'gotcha' with this approach (unless
there is a smart stats way around this) which is that in order to
answer the 'are my stats' good enough?'  question we need to know how
large our population is (to at least a good degree of accuracy) - in
order to to this we need to collect stats :(. We can of course project
population deltas for any given segment from past history, but
frequently the past is no good guide to the future - in particular
changes in busines process, application software or economic
environment can and do affect the rate of change of data in our
commercial databases.

I think what I am saying that if we buy the argument that one should
not sample every x days (hours, years), because we care about the
appropriateness of the stats and not their 'freshness' then building
an automated system to gather appropriate stats becomes a non-trivial
task.

Some googling discovered http://www.surveysystem.com/sscalc.htm which
gives a good coverage of the relevant stats.

> 4. My boss heard some Oracle expert say that 30% was the best sample size.
> Does anyone have an idea of the source?

someone who wasn't a statistician :) 


-- 
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: