Density calculation. Was: Incorrect cardinality estimate

  • From: "Laimutis Nedzinskas" <Laimutis.Nedzinskas@xxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 24 Jan 2007 16:26:20 -0000

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Wolfgang Breitling
>Ergo, in order to get the cardinality estimate down you have to change 
>either num_rows or density. I suggest you whack density to
>2.5e-6 or even less. Because you did gather the histogram this won't 
>affect any sql which go after popular values. It will affect sql where 
>you have a join on the ods_process_date column.


It seems if Oracle is doing a natural thing when calculating density. 
If histogram is provided then Oracle tries to remove "noise" from
density calculation.
A small test reveals that Oracle is able to effectively remove very
popular values.
The formula is (kind of) following:

Density=Cardinality_for_Unpopular_Values/Number_of_Rows

For example:

create table test_dens
(id number
, nh number --number column with histogram
, n number  --number column w/o histogram
);

---Case 3
insert into test_dens
select rownum, rownum, rownum from all_objects where rownum<=5000;

insert into test_dens
select rownum, 0, 0 from all_objects where rownum<=2500;

insert into test_dens
select rownum, 0.5, 0.5 from all_objects where rownum<=2500;
--
begin sys.dbms_stats.gather_table_stats('LTDLNE', 'TEST_DENS',
method_opt=>'FOR COLUMNS NH SIZE 254, N SIZE 1'); end;
/

select t.table_name, t.column_name, t.num_distinct, t.density,
t.num_nulls, t.num_buckets, t.histogram 
from user_tab_col_statistics  t where table_name='TEST_DENS';

Table           column  num_rows        density
num_nulls       buckets histogram type  
TEST_DENS       NH              5002            0,0001
0               254             HEIGHT BALANCED
TEST_DENS       N               5002            0,000199920031987205
0               1               NONE

Density for column N is exactly 1/5002
Density for column NH is:

Density=Cardinality_for_Unpopular_Values/Number_of_Rows = 1/10000 =
0,0001
, where Cardinality_for_Unpopular_Values=1 
because we have 5002(total distinct values)-2(popular
values)=5000(unpopular distinct values) for 5000 unpopular rows yielding
cardinality 5000(unpopular rows)/5000(unpopular distinct values)=1

--- 
Another test with Cardinality_for_Unpopular_Values=2: 

insert into test_dens
select rownum, mod(rownum,1000), mod(rownum,1000) from all_objects where
rownum<=5000;

insert into test_dens
select rownum, 10001, 10001 from all_objects where rownum<=2500;

insert into test_dens
select rownum, 10002, 10002 from all_objects where rownum<=2500;

And we have:

Density_for_NH_columns is 0.0005 = (5000/(1002-2))/10000,
where  
5000 - unpopular rows, 
1002 - total distinct values, 
2 - popular rows
10000 - total rows

---

The problem is how Oracle cuts this noise. It may use several techniques
but all of them are heuristic anyway. A guess.



Fyrirvari/Disclaimer
http://www.landsbanki.is/disclaimer
--
//www.freelists.org/webpage/oracle-l


Other related posts: