Calculating Distinct counts from a sample

From: "Barr, Stephen" <Stephen.Barr@xxxxxxxxx>
To: oracle-l@xxxxxxxxxxxxx
Date: Sun, 3 Jul 2005 15:48:32 +0100

How does oracle calculate distinct counts from a sample of data?

 

I have traced dbms_stats but it doesn't give much of a clue as to how it
does this. Initially I thought I could just use a simple calculation based
on the percentage of distincts in any sample set compared to the number of
rows in the sample set. So if the distinct count for the sample set was 30%
of the number of rows in a 10% sample, then I would just scale this
up....but I now know how obviously flawed this method is.

 

The only real way I can see of making this work is to apply some sort of
linear regression function to the natural log of a number of additional
samples from the sample set - which would allow me to come up with a
"reasonable" scaling factor.

 

Any mathematicians out there who could shed some light on this?

 

Many thanks,

 

Stephen Barr.

 




-----------------------------------------
Information in this email may be privileged, confidential and is intended
exclusively for the addressee. The views expressed may not be official
policy, but the personal views of the originator. If you have received it
in error, please notify the sender by return e-mail and delete it from your
system. You should not reproduce, distribute, store, retransmit, use or
disclose its contents to anyone.     Please note we reserve the right to
monitor all e-mail communication through our internal and external
networks.

Calculating Distinct counts from a sample

Other related posts: