I think if you apply a predicate to null_col then you'll get good estimates, so *probably* this mis-estimate is limited to counting. You've got the test case handy, so give it a whirl. Something like where null_col > 0 should do it. -----Original Message----- From: Martijn Bos [mailto:maboc@xxxxxxxx] Sent: Monday, November 18, 2013 12:48 PM To: Mark W. Farnham Cc: Oracle-L Subject: Re: null values Please find my comments inline Mark, thanks for allready spending time on my problems. Martijn On Mon, Nov 18, 2013 at 06:52:21AM -0500, Mark W. Farnham wrote: > I missed the input column that your stats included 900 num_rows. > > You may be on to something. If you add two rows, one null for that > column and the other non-null, that num_rows stat should move to 901; > does the estimate move to 1002? > insert into cbo_tab_1 values (1001,1,1,null,'filler'); insert into cbo_tab_1 values (1002,1,1,1,'filler'); commit; ***** Table Stats ***** TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN --------------- -------- ---------- ---------- ---------- ---------- ------------ -------------------- ----------- CBO_TAB_1 VALID 99 1002 1126 0 2013-NOV-18 15:06:53 515 ***** Table Column Stats ***** TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLO USE AVG_COL_LEN HISTOGRAM --------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- -------------------- ----------- --- --- ----------- --------------- CBO_TAB_1 NULL_COL 900 C102 C20A64 .001111111 101 1 2013-NOV-18 15:06:53 901 YES NO 4 NONE ***** Index Stats ***** INDEX_NAME INDEX_TYPE UNIQUENES BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZED ------------------------------ --------------------------- --------- ---------- ----------- ------------- ----------------- ---------- -------------------- CBO_TAB_1_NULL_COL NORMAL NONUNIQUE 1 2 900 901 901 2013-NOV-18 15:06:54 ***** Index Columns **** INDEX_NAME TABLE_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- ------------------------- --------------- CBO_TAB_1_NULL_COL CBO_TAB_1 NULL_COL 1 SQL> explain plan for select count(null_col) from cbo_tab_1; @xpln ID OPERATION OBJECT ALIAS CARDINALITY COST OPTIMIZER BYTES ---------- ---------------------------------------- ---------------------------------------- -------------------- ----------- ---------- --------------- ---------- --- 0 SELECT STATEMENT() 1 2 ALL_ROWS 4 --P 1 SORT(AGGREGATE) 1 4 --P 2 INDEX(FAST FULL SCAN) (INDEX) MARTIJN.CBO_TAB_1_NULL_COL CBO_TAB_1@SEL$1 1002 2 ANALYZED 4008 ID PROJECTION ------ --------------------------------------------------------------------------- 1 (#keys=0) COUNT("NULL_COL")[22] 2 "NULL_COL"[NUMBER,22] SQL> As you can see, the index_stas display 901 rows in the index. (which I think is correct). However, the CBO estimates 1002 rows now. So the problem persits > Perhaps the CBO is (incorrectly) just using the table row stat for > unrestricted query cardinality estimates, still getting the correct > index choice because it is smaller by blocks than the table and no > other index contains the column. > > Of course for full scans the size in blocks drives the relevant cost, > but if the CBO were considering nested loop amounts and multiplying by > a number far off, it could produce sub optimal plans. > Exactly. (See also below) > Hmm. Try your test with 900 nulls and 100 not-null (easiest test from > your existing script by reversing logic of mod to insert null). That > should show us if it is just off by 10% or completely using the table > stats instead of the index stats for cardinality. > The insert now is done as follows: insert into cbo_tab_1 select level, mod(level,10), round(level/100), -- decode(mod(level, 10), 0, null, level), decode(mod(level, 10), 0, level, null), 'filler' from dual connect by level<=1000; ***** Table Stats ***** TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN --------------- -------- ---------- ---------- ---------- ---------- ------------ -------------------- ----------- CBO_TAB_1 VALID 99 1000 1000 0 2013-NOV-18 15:14:06 512 ***** Table Column Stats ***** TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE GLO USE AVG_COL_LEN HISTOGRAM --------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- -------------------- ----------- --- --- ----------- --------------- CBO_TAB_1 NULL_COL 100 C10B C20B .01 900 1 2013-NOV-18 15:14:06 100 YES NO 2 NONE ***** Index Stats ***** INDEX_NAME INDEX_TYPE UNIQUENES BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS LAST_ANALYZED ------------------------------ --------------------------- --------- ---------- ----------- ------------- ----------------- ---------- -------------------- CBO_TAB_1_NULL_COL NORMAL NONUNIQUE 0 1 100 100 100 2013-NOV-18 15:14:06 The statistics show that the index has 100 rows. Again, I think that that should be enough information for the CBO, given the query. And the plan output: explain plan for select count(null_col) from cbo_tab_1; @xpln ID OPERATION OBJECT ALIAS CARDINALITY COST OPTIMIZER BYTES ---------- ---------------------------------------- ---------------------------------------- -------------------- ----------- ---------- --------------- ---------- --- 0 SELECT STATEMENT() 1 1 ALL_ROWS 2 --P 1 SORT(AGGREGATE) 1 2 --P 2 INDEX(FULL SCAN) (INDEX) MARTIJN.CBO_TAB_1_NULL_COL CBO_TAB_1@SEL$1 1000 1 ANALYZED 2000 ID PROJECTION ------ --------------------------------------------------------------------------- 1 (#keys=0) COUNT("NULL_COL")[22] 2 "NULL_COL"[NUMBER,22] SQL> Now the cardianlity estimate is way off. So potentially creating sub-optimal exceution-paths for more complex query's. I do notice that the excecution path changed from Fast Full Scan to Full Scan. > "Thought" is the spelling you were wondering about. > Thank you. Somehow this is a very difficult word to spell for me. > -----Original Message----- > From: Martijn Bos [mailto:maboc@xxxxxxxx] > Sent: Monday, November 18, 2013 1:22 AM > To: Mark W. Farnham > Subject: Re: null values > > Hi Mark, > > Thanks for your response. > Please consider my remarks (inline) > > Best regards > Martijn > > On Sun, Nov 17, 2013 at 09:57:05PM -0500, Mark W. Farnham wrote: > > I think it is because there are 900 distinct keys, but the index is > > non-unique, so it cannot know merely from the stats there actually > > are no duplicates and the rest as nulls. > > whether there are duplicates or not is, as far as I understand, not so > interesting. > > if my data is: > > 1 > 2 > null > 1 > 2 > null > > then count(column) == 4 > > So duplicates are not so important, in this case I think. > > > But it does get selected because it will be fewer total blocks for > > the fast full scan than the table. > > > > In the index_stats is a column num_rows which also says 900. I would > have thaught (not sure how to spell the past tense of think :-)) that > that statistics is sufficient. > > > > mwf > > > > -----Original Message----- > > From: oracle-l-bounce@xxxxxxxxxxxxx > > [mailto:oracle-l-bounce@xxxxxxxxxxxxx] > > On Behalf Of Martijn Bos > > Sent: Sunday, November 17, 2013 4:51 PM > > To: Oracle-L > > Subject: null values > > > > Hi List, > > > > On a play-around sytem I do the following: > > (This is a 11.2.0.4 RDBMS on linux x86 (32 bits)) > > > > drop table cbo_tab_1; > > create table cbo_tab_1 (id number, sca number(10,0), clu > > number(10,0), null_col number(10,0), filler char(500)) pctfree 99 > > pctused 1; > > > > -- Following statement makes sure that ther actually are nulls in > > column null_col insert into cbo_tab_1 select level, mod(level,10), > > round(level/100), > > decode(mod(level, 10), 0, null, level), 'filler' from dual connect by > > level<=1000; > > commit; > > > > create unique index CBO_TAB_1_ID on CBO_TAB_1(id); > > -- create index cbo_tab_1_id on cbo_tab_1(id); create index > > cbo_tab_1_sca on cbo_tab_1(sca); create index cbo_tab_1_clu on > > cbo_tab_1(clu); create index cbo_tab_1_null_col on > > cbo_tab_1(null_col); > > > > exec dbms_stats.gather_table_stats('MARTIJN','CBO_TAB_1', > > block_sample=>false, cascade=>true, method_opt=>'for all columns > > size 1'); alter system flush shared_pool; --Since I try it a lot I > > need it to parse over and over again. > > > > explain plan for select count(null_col) from cbo_tab_1; --HERE IS > > THE SQL OF WHICH I DON'T UNDERSTAND THE EXPLAIN PLAN > > > > SQL> @xpln -- A script I created just to get a little more > > SQL> understanding of > > explaining statement. > > > > ID OPERATION OBJECT > > ALIAS CARDINALITY COST OPTIMIZER BYTES > > ---------- ---------------------------------------- > > ---------------------------------------- -------------------- > > ----------- > > ---------- --------------- ---------- > > --- 0 SELECT STATEMENT() > > 1 2 ALL_ROWS 4 > > --P 1 SORT(AGGREGATE) > > 1 4 > > --P 2 INDEX(FAST FULL SCAN) (INDEX) > > MARTIJN.CBO_TAB_1_NULL_COL CBO_TAB_1@SEL$1 1000 > 2 > > ANALYZED 4000 > > > > ID PROJECTION > > ------ > > > ---------------------------------------------------------------------- > ----- > > 1 (#keys=0) COUNT("NULL_COL")[22] > > 2 "NULL_COL"[NUMBER,22] > > SQL> > > > > (if you find that suspect, I also added a more documented way to > > display explain plans) > > > > SQL> select * from table(dbms_xplan.display(format=>'all')); > > > > PLAN_TABLE_OUTPUT > > -------------------------------------------------------------------- > > -- > > ------ > > -------------------------------------------------------------------- > > -- > > ------ > > ------------------------------------------------ > > Plan hash value: 2862360477 > > > > -------------------------------------------------------------------- > > -- > > ------ > > ---------------- > > | Id | Operation | Name | Rows | Bytes | Cost > > (%CPU)| Time | > > -------------------------------------------------------------------- > > -- > > ------ > > ---------------- > > | 0 | SELECT STATEMENT | | 1 | 4 | 2 > > (0)| 00:02:01 | > > | 1 | SORT AGGREGATE | | 1 | 4 | > > | | > > | 2 | INDEX FAST FULL SCAN| CBO_TAB_1_NULL_COL | 1000 | 4000 | 2 > > (0)| 00:02:01 | > > -------------------------------------------------------------------- > > -- > > ------ > > ---------------- > > > > Query Block Name / Object Alias (identified by operation id): > > ------------------------------------------------------------- > > > > 1 - SEL$1 > > 2 - SEL$1 / CBO_TAB_1@SEL$1 > > > > Column Projection Information (identified by operation id): > > ----------------------------------------------------------- > > > > 1 - (#keys=0) COUNT("NULL_COL")[22] > > 2 - "NULL_COL"[NUMBER,22] > > SQL> > > > > > > > > Now the thing that is troubling me is the cardinality estimate at id > > 2 (1000). I would say that the CBO could esitmate 900 rows. > > Where does the CBO get's that idea of 1000? > > > > > > SQL> select TABLE_NAME, STATUS, PCT_FREE, PCT_USED, NUM_ROWS, > > SQL> BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED, AVG_ROW_LEN > > from user_tables > > where table_name='CBO_TAB_1'; > > > > TABLE_NAME STATUS PCT_FREE PCT_USED NUM_ROWS BLOCKS > > EMPTY_BLOCKS LAST_ANALYZED AVG_ROW_LEN > > --------------- -------- ---------- ---------- ---------- ---------- > > ------------ -------------------- ----------- > > CBO_TAB_1 VALID 99 1000 1000 > > 0 2013-NOV-17 22:27:10 515 > > > > SQL> select index_name, index_type, uniqueness, blevel, leaf_blocks, > > SQL> distinct_keys, clustering_factor, num_rows, last_analyzed > > from user_indexes > > where table_name='CBO_TAB_1'; > > > > INDEX_NAME INDEX_TYPE UNIQUENES > > BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS > LAST_ANALYZED > > ------------------------------ --------------------------- --------- > > ---------- ----------- ------------- ----------------- ---------- > > -------------------- > > CBO_TAB_1_ID NORMAL UNIQUE > > 1 2 1000 1000 1000 2013-NOV-17 > > 22:27:10 > > CBO_TAB_1_SCA NORMAL NONUNIQUE > > 1 2 10 1000 1000 2013-NOV-17 > > 22:27:10 > > CBO_TAB_1_CLU NORMAL NONUNIQUE > > 1 2 11 1000 1000 2013-NOV-17 > > 22:27:10 > > CBO_TAB_1_NULL_COL NORMAL NONUNIQUE > > 1 2 900 900 900 2013-NOV-17 > > 22:27:10 > > SQL> > > > > As far as I'm concerned/understand the CBO chooses the right index > > (CBO_TAB_1_NULL_COL). CBO_TAB_1_NULL_COL has 900 num rows, since the > > other > > 100 rows are null. > > I would say that in this specific case it would be easy for the CBO > > to estimate 900 rows since it only visits this index (all > > information is in the index). > > > > Now I'm thinking that I might oversee something very simple (or that > > i'm flat out wrong), so would anyone be so kind as to point me in > > the right direction for some guidance, or (better yet :-)) explain > > where my thinking is wrong. > > > > > > Best Regards, > > Martijn > > > -- //www.freelists.org/webpage/oracle-l