Re: Dan Tow's SQL formula

  • From: Yong Huang <yong321@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Wed, 25 Feb 2009 21:49:58 -0800 (PST)

[Resend. I'm forced to trim quoted text down when posting a message. Complete 
original message can be found at 
//www.freelists.org/post/oracle-l/Dan-Tows-SQL-formula]

> To better my understanding I did a test with more familiar table which is
> scott.emp table with following query:
>
> select sum(count(mgr)*count(mgr)) / (sum(count(mgr))*sum(count(*)))
> from emp
> group by mgr
>
> SUM(COUNT(MGR)*COUNT(MGR))/(SUM(COUNT(MGR))*SUM(COUNT(*)))
> ----------------------------------------------------------
>                                                 .225274725
>
>
> So I got a selectivity of roughly 0.225. Does not sound right to me because
> there are 6 managers and 13 employees (14 but one has no manager)
>
> My question is really, does anyone understand this formula and actually use
> it?
>
> If so please throw some lights :-)))
>
>
> Thank you all
>
> Alex

I don't know the answer. But Dan Tow's formula looks very much like Jonathan 
Lewis' calculation of density when he describes histograms (see his book on 
p.172), or Ari Mozes's patent 6732085 
(http://www.freepatentsonline.com/6732085.html) where he says "density can be 
calculated as the sum of the square of the repetition counts for non-popular 
values divided by the product of the number of rows in the table and the number 
of non-popular values in the table", which is kind of beyond my knowledge.

Unfortunately, the number you get by applying Dan's formula to scott.emp, 
.225274725, is NOT the density for the mgr column, which is .038461538 on my 
database. But it's close to index selectivity (ix_sel) shown in 10053 trace 
when an index is created on mgr column and a query has "where mgr = <some 
number>" is parsed.

Yong Huang



      
--
//www.freelists.org/webpage/oracle-l


Other related posts: