Re: Density statistic calculation in case of histogram

  • From: Neeraj Bhatia <neeraj.dba@xxxxxxxxx>
  • To: Martin Klier <Martin.Klier@xxxxxxxxxx>
  • Date: Mon, 6 Apr 2009 19:26:10 +0530

Hi Martin,

Version: 10.2.0.1.0

After reading an excellent paper "Histograms - Myths and Facts" by Wolfgang
Breitling, some points get cleared.

1)  In case of width-based histograms, density= 1/ (2*number of distinct
values)

Thus come the figure 0.00005

2) Without histograms, density should be equal to 1/number of distinct
values. But in my case, it is null. I am not using method_opt while
collecting statistics. The global default setting is FOR ALL INDEXED COLUMNS
SIZE 1.

3) In case of height-based histograms, I didn't get these calculations.


density = Σ cnt2 / ( num_rows˜ * Σ cnt )

“the sum of the squared frequencies of all non-popular values divided by the
sum of the frequencies of all non-popular values times the count of rows
with not null values of the histogram column”.

Any comments will be appreciated.

Neeraj.
On Mon, Apr 6, 2009 at 5:34 PM, Martin Klier <Martin.Klier@xxxxxxxxxx>wrote:

> Hi Neeraj,
>
> no straight answer to your question, but on which DB version are you
> testing?
>
> Keep the estimate_percent default values in mind. Maybe if you force
> dbms_stats to compute stats, the result will differ.
>
> --
> Mit freundlichem Gruß
>
>
> Martin Klier
> Senior Oracle Database Administrator
>
> ------------------------------------------------------------------------------
>
> Klug GmbH integrierte Systeme
> Lindenweg 13, D-92552 Teunz
> Tel.:  +49 9671/9216-245
> Fax.: +49 9671/9216-112
> mailto: martin.klier@xxxxxxxxxx
> www.klug-is.de
>
> ------------------------------------------------------------------------------
>
> Geschäftsführer: Johann Klug, Roman Sorgenfrei
> Sitz der Gesellschaft: Teunz, USt-ID-Nr. DE175481608,
> HRB Nr. 2037, Amtsgericht Amberg
>
> oracle-l-bounce@xxxxxxxxxxxxx schrieb am 06.04.2009 08:47:30:
>
> > Von:
> >
> > Neeraj Bhatia <neeraj.dba@xxxxxxxxx>
> >
> > An:
> >
> > oracle-l@xxxxxxxxxxxxx
> >
> > Datum:
> >
> > 06.04.2009 08:48
> >
> > Betreff:
> >
> > Density statistic calculation in case of histogram
> >
> > Gesendet von:
> >
> > oracle-l-bounce@xxxxxxxxxxxxx
>  >
> > Hi,
> > Please tell me how density is calculated in case of histograms.
> > I've a table with very skewed data:
> > scott@ORADB10G> select col_skew, count(*) from tab_skew group by
> > col_skew order by 1;
> >   COL_SKEW   COUNT(*)
> > ---------- ----------
> >          1         10
> >          2         10
> >          3         10
> >          4         10
> >          5         10
> >          6         10
> >          7         10
> >          8         10
> >          9         10
> >         10       9910
> > 10 rows selected.
> > scott@ORADB10G> exec dbms_stats.delete_table_stats(user,'TAB_SKEW');
> > PL/SQL procedure successfully completed.
> > -- Creating width-based histograms, #buckets = #distinct values
> > scott@ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > method_opt=>'FOR COLUMNS COL_SKEW size 10');
> > PL/SQL procedure successfully completed.
> > scott@ORADB10G> select * from dba_tab_col_statistics where
> > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > OWNER  TABLE_NA COLUMN_NAME          NUM_DISTINCT LOW_VA HIGH_V
> > DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > AVG_COL_LEN HISTOGRAM
> > ------ -------- -------------------- ------------ ------ ------
> > ---------- ---------- ----------- --------- ----------- --- ---
> > ----------- ---------------
> > SCOTT  TAB_SKEW COL_SKEW                       10 C102   C10B
> > .00005          0          10 06-APR-09       10000 YES
> > NO            3 FREQUENCY
> > Question: How density is calculated here? Also it's equal to
> > Selectivity of non-popular values.
> > scott@ORADB10G> exec dbms_stats.gather_table_stats(user, 'TAB_SKEW',
> > method_opt=>'FOR COLUMNS COL_SKEW size 5');
> > PL/SQL procedure successfully completed.
> > -- Creating Height-based histograms, #buckets < #distinct values
> > scott@ORADB10G> select * from dba_tab_col_statistics where
> > table_name='TAB_SKEW' and column_name='COL_SKEW';
> > OWNER  TABLE_NA COLUMN_NAME          NUM_DISTINCT LOW_VA HIGH_V
> > DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE
> > AVG_COL_LEN HISTOGRAM
> > ------ -------- -------------------- ------------ ------ ------
> > ---------- ---------- ----------- --------- ----------- --- ---
> > ----------- ---------------
> > SCOTT  TAB_SKEW COL_SKEW                       10 C102   C10B
> > .98209          0           5 06-APR-09       10000 YES
> > NO            3 HEIGHT BALANCED
> > Question: How density is calculated here and how selectivity will be
> > estimated for popolar as well as non-popular values?
> >
> > Any points/advice will be appreciated.
> > Neeraj.
>
>
>

Other related posts: