RE: null values

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: "'Martijn Bos'" <maboc@xxxxxxxx>
Date: Mon, 18 Nov 2013 22:52:39 -0500
I think if you apply a predicate to null_col then you'll get good estimates,
so *probably* this mis-estimate is limited to counting. You've got the test
case handy, so give it a whirl. Something like
where null_col > 0
should do it.

-----Original Message-----
From: Martijn Bos [mailto:maboc@xxxxxxxx] 
Sent: Monday, November 18, 2013 12:48 PM
To: Mark W. Farnham
Cc: Oracle-L
Subject: Re: null values

Please find my comments inline

Mark, thanks for allready spending time on my problems.

Martijn


On Mon, Nov 18, 2013 at 06:52:21AM -0500, Mark W. Farnham wrote:
> I missed the input column that your stats included 900 num_rows.
> 
> You may be on to something. If you add two rows, one null for that 
> column and the other non-null, that num_rows stat should move to 901; 
> does the estimate move to 1002?
> 

insert into cbo_tab_1 values (1001,1,1,null,'filler'); insert into cbo_tab_1
values (1002,1,1,1,'filler'); commit;

***** Table Stats *****
TABLE_NAME      STATUS     PCT_FREE   PCT_USED   NUM_ROWS     BLOCKS
EMPTY_BLOCKS LAST_ANALYZED        AVG_ROW_LEN
--------------- -------- ---------- ---------- ---------- ----------
------------ -------------------- -----------
CBO_TAB_1       VALID            99                  1002       1126
0 2013-NOV-18 15:06:53         515


***** Table Column Stats *****
TABLE_NAME      COLUMN_NAME     NUM_DISTINCT LOW_VALUE  HIGH_VALUE
DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZED        SAMPLE_SIZE GLO USE
AVG_COL_LEN HISTOGRAM
--------------- --------------- ------------ ---------- ----------
---------- ---------- ----------- -------------------- ----------- --- ---
----------- ---------------
CBO_TAB_1       NULL_COL                 900 C102       C20A64
.001111111        101           1 2013-NOV-18 15:06:53         901 YES NO
4 NONE


***** Index Stats *****
INDEX_NAME                     INDEX_TYPE                  UNIQUENES
BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED
------------------------------ --------------------------- ---------
---------- ----------- ------------- ----------------- ----------
--------------------
CBO_TAB_1_NULL_COL             NORMAL                      NONUNIQUE
1           2           900               901        901 2013-NOV-18
15:06:54


***** Index Columns ****
INDEX_NAME                     TABLE_NAME      COLUMN_NAME
COLUMN_POSITION
------------------------------ --------------- -------------------------
---------------
CBO_TAB_1_NULL_COL             CBO_TAB_1       NULL_COL
1

SQL> explain plan for select count(null_col) from cbo_tab_1; @xpln

ID         OPERATION                                OBJECT
ALIAS                CARDINALITY       COST OPTIMIZER            BYTES
---------- ----------------------------------------
---------------------------------------- -------------------- -----------
---------- --------------- ----------
--- 0       SELECT STATEMENT()
1          2 ALL_ROWS                 4
--P 1        SORT(AGGREGATE)
1                                     4
--P 2         INDEX(FAST FULL SCAN)                 (INDEX)
MARTIJN.CBO_TAB_1_NULL_COL       CBO_TAB_1@SEL$1             1002          2
ANALYZED              4008

    ID PROJECTION
------
---------------------------------------------------------------------------
     1 (#keys=0) COUNT("NULL_COL")[22]
     2 "NULL_COL"[NUMBER,22]
SQL>

As you can see, the index_stas display 901 rows in the index. (which I think
is correct). 
However, the CBO estimates 1002 rows now.
So the problem persits

> Perhaps the CBO is (incorrectly) just using the table row stat for 
> unrestricted query cardinality estimates, still getting the correct 
> index choice because it is smaller by blocks than the table and no 
> other index contains the column.
> 
> Of course for full scans the size in blocks drives the relevant cost, 
> but if the CBO were considering nested loop amounts and multiplying by 
> a number far off, it could produce sub optimal plans.
> 

Exactly. (See also below)


> Hmm. Try your test with 900 nulls and 100 not-null (easiest test from 
> your existing script by reversing logic of mod to insert null). That 
> should show us if it is just off by 10% or completely using the table 
> stats instead of the index stats for cardinality.
> 

The insert now is done as follows:
insert into cbo_tab_1 select level, 
                             mod(level,10), 
                             round(level/100), 
--                             decode(mod(level, 10), 0, null, level), 
                             decode(mod(level, 10), 0, level, null), 
                             'filler' 
                      from   dual 
                      connect by level<=1000;

***** Table Stats *****
TABLE_NAME      STATUS     PCT_FREE   PCT_USED   NUM_ROWS     BLOCKS
EMPTY_BLOCKS LAST_ANALYZED        AVG_ROW_LEN
--------------- -------- ---------- ---------- ---------- ----------
------------ -------------------- -----------
CBO_TAB_1       VALID            99                  1000       1000
0 2013-NOV-18 15:14:06         512


***** Table Column Stats *****
TABLE_NAME      COLUMN_NAME     NUM_DISTINCT LOW_VALUE  HIGH_VALUE
DENSITY  NUM_NULLS NUM_BUCKETS LAST_ANALYZED        SAMPLE_SIZE GLO USE
AVG_COL_LEN HISTOGRAM
--------------- --------------- ------------ ---------- ----------
---------- ---------- ----------- -------------------- ----------- --- ---
----------- ---------------
CBO_TAB_1       NULL_COL                 100 C10B       C20B
.01        900           1 2013-NOV-18 15:14:06         100 YES NO
2 NONE


***** Index Stats *****
INDEX_NAME                     INDEX_TYPE                  UNIQUENES
BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED
------------------------------ --------------------------- ---------
---------- ----------- ------------- ----------------- ----------
--------------------
CBO_TAB_1_NULL_COL             NORMAL                      NONUNIQUE
0           1           100               100        100 2013-NOV-18
15:14:06

The statistics show that the index has 100 rows.
Again, I think that that should be enough information for the CBO, given the
query. 

And the plan output:
explain plan for select count(null_col) from cbo_tab_1; @xpln

ID         OPERATION                                OBJECT
ALIAS                CARDINALITY       COST OPTIMIZER            BYTES
---------- ----------------------------------------
---------------------------------------- -------------------- -----------
---------- --------------- ----------
--- 0       SELECT STATEMENT()
1          1 ALL_ROWS                 2
--P 1        SORT(AGGREGATE)
1                                     2
--P 2         INDEX(FULL SCAN)                      (INDEX)
MARTIJN.CBO_TAB_1_NULL_COL       CBO_TAB_1@SEL$1             1000          1
ANALYZED              2000

    ID PROJECTION
------
---------------------------------------------------------------------------
     1 (#keys=0) COUNT("NULL_COL")[22]
     2 "NULL_COL"[NUMBER,22]
SQL>

Now the cardianlity estimate is way off. So potentially creating sub-optimal
exceution-paths for more complex query's.

I do notice that the excecution path changed from Fast Full Scan to Full
Scan.




> "Thought" is the spelling you were wondering about.
> 

Thank you. Somehow this is a very difficult word to spell for me. 



> -----Original Message-----
> From: Martijn Bos [mailto:maboc@xxxxxxxx]
> Sent: Monday, November 18, 2013 1:22 AM
> To: Mark W. Farnham
> Subject: Re: null values
> 
> Hi Mark,
> 
> Thanks for your response.
> Please consider my remarks (inline)
> 
> Best regards
> Martijn
> 
> On Sun, Nov 17, 2013 at 09:57:05PM -0500, Mark W. Farnham wrote:
> > I think it is because there are 900 distinct keys, but the index is 
> > non-unique, so it cannot know merely from the stats there actually 
> > are no duplicates and the rest as nulls.
> 
> whether there are duplicates or not is, as far as I understand, not so 
> interesting.
> 
> if my data is:
> 
> 1
> 2
> null
> 1
> 2
> null
> 
> then count(column) == 4
> 
> So duplicates are not so important, in this case I think.
> 
> > But it does get selected because it will be fewer total blocks for 
> > the fast full scan than the table.
> > 
> 
> In the index_stats is a column num_rows which also says 900. I would 
> have thaught (not sure how to spell the past tense of think :-)) that 
> that statistics is sufficient.
> 
> 
> > mwf
> > 
> > -----Original Message-----
> > From: oracle-l-bounce@xxxxxxxxxxxxx 
> > [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> > On Behalf Of Martijn Bos
> > Sent: Sunday, November 17, 2013 4:51 PM
> > To: Oracle-L
> > Subject: null values
> > 
> > Hi List,
> > 
> > On a play-around sytem I do the following:
> > (This is a 11.2.0.4 RDBMS on linux x86 (32 bits))
> > 
> > drop table cbo_tab_1;
> > create table cbo_tab_1 (id number, sca number(10,0), clu 
> > number(10,0), null_col number(10,0), filler char(500)) pctfree 99 
> > pctused 1;
> > 
> > -- Following statement makes sure that ther actually are nulls in 
> > column null_col insert into cbo_tab_1 select level, mod(level,10), 
> > round(level/100),
> > decode(mod(level, 10), 0, null, level), 'filler' from   dual connect by
> > level<=1000;
> > commit;
> > 
> > create unique index CBO_TAB_1_ID on CBO_TAB_1(id);
> > -- create index cbo_tab_1_id on cbo_tab_1(id); create index 
> > cbo_tab_1_sca on cbo_tab_1(sca); create index cbo_tab_1_clu on 
> > cbo_tab_1(clu); create index cbo_tab_1_null_col on 
> > cbo_tab_1(null_col);
> > 
> > exec dbms_stats.gather_table_stats('MARTIJN','CBO_TAB_1',
> > block_sample=>false, cascade=>true, method_opt=>'for all columns 
> > size 1'); alter system flush shared_pool; --Since I try it a lot I 
> > need it to parse over and over again.
> > 
> > explain plan for select count(null_col) from cbo_tab_1;  --HERE IS 
> > THE SQL OF WHICH I DON'T UNDERSTAND THE EXPLAIN PLAN
> > 
> > SQL> @xpln -- A script I created just to get a little more 
> > SQL> understanding of
> > explaining statement.  
> > 
> > ID         OPERATION                                OBJECT
> > ALIAS                CARDINALITY       COST OPTIMIZER            BYTES
> > ---------- ----------------------------------------
> > ---------------------------------------- --------------------
> > -----------
> > ---------- --------------- ----------
> > --- 0       SELECT STATEMENT()
> > 1          2 ALL_ROWS                 4
> > --P 1        SORT(AGGREGATE)
> > 1                                     4
> > --P 2         INDEX(FAST FULL SCAN)                 (INDEX)
> > MARTIJN.CBO_TAB_1_NULL_COL       CBO_TAB_1@SEL$1             1000
> 2
> > ANALYZED              4000
> > 
> >     ID PROJECTION
> > ------
> >
> ----------------------------------------------------------------------
> -----
> >      1 (#keys=0) COUNT("NULL_COL")[22]
> >      2 "NULL_COL"[NUMBER,22]
> > SQL>
> > 
> > (if you find that suspect, I also added a more documented way to 
> > display explain plans)
> > 
> > SQL> select * from table(dbms_xplan.display(format=>'all'));
> > 
> > PLAN_TABLE_OUTPUT
> > --------------------------------------------------------------------
> > --
> > ------
> > --------------------------------------------------------------------
> > --
> > ------
> > ------------------------------------------------
> > Plan hash value: 2862360477
> > 
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------------
> > | Id  | Operation             | Name               | Rows  | Bytes |
Cost
> > (%CPU)| Time     |
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------------
> > |   0 | SELECT STATEMENT      |                    |     1 |     4 |
2
> > (0)| 00:02:01 |
> > |   1 |  SORT AGGREGATE       |                    |     1 |     4 |
> > |          |
> > |   2 |   INDEX FAST FULL SCAN| CBO_TAB_1_NULL_COL |  1000 |  4000 |
2
> > (0)| 00:02:01 |
> > --------------------------------------------------------------------
> > --
> > ------
> > ----------------
> > 
> > Query Block Name / Object Alias (identified by operation id):
> > -------------------------------------------------------------
> > 
> >    1 - SEL$1
> >    2 - SEL$1 / CBO_TAB_1@SEL$1
> > 
> > Column Projection Information (identified by operation id):
> > -----------------------------------------------------------
> > 
> >    1 - (#keys=0) COUNT("NULL_COL")[22]
> >    2 - "NULL_COL"[NUMBER,22]
> > SQL>
> > 
> > 
> > 
> > Now the thing that is troubling me is the cardinality estimate at id 
> > 2 (1000). I would say that the CBO could esitmate 900 rows.
> > Where does the CBO get's that idea of 1000?
> > 
> > 
> > SQL> select TABLE_NAME, STATUS, PCT_FREE, PCT_USED, NUM_ROWS, 
> > SQL> BLOCKS, EMPTY_BLOCKS, LAST_ANALYZED, AVG_ROW_LEN
> > from   user_tables
> > where  table_name='CBO_TAB_1';
> > 
> > TABLE_NAME      STATUS     PCT_FREE   PCT_USED   NUM_ROWS     BLOCKS
> > EMPTY_BLOCKS LAST_ANALYZED        AVG_ROW_LEN
> > --------------- -------- ---------- ---------- ---------- ----------
> > ------------ -------------------- -----------
> > CBO_TAB_1       VALID            99                  1000       1000
> > 0 2013-NOV-17 22:27:10         515
> > 
> > SQL> select index_name, index_type, uniqueness, blevel, leaf_blocks, 
> > SQL> distinct_keys, clustering_factor, num_rows, last_analyzed
> > from   user_indexes
> > where  table_name='CBO_TAB_1';
> > 
> > INDEX_NAME                     INDEX_TYPE                  UNIQUENES
> > BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS
> LAST_ANALYZED
> > ------------------------------ --------------------------- ---------
> > ---------- ----------- ------------- ----------------- ----------
> > --------------------
> > CBO_TAB_1_ID                   NORMAL                      UNIQUE
> > 1           2          1000              1000       1000 2013-NOV-17
> > 22:27:10
> > CBO_TAB_1_SCA                  NORMAL                      NONUNIQUE
> > 1           2            10              1000       1000 2013-NOV-17
> > 22:27:10
> > CBO_TAB_1_CLU                  NORMAL                      NONUNIQUE
> > 1           2            11              1000       1000 2013-NOV-17
> > 22:27:10
> > CBO_TAB_1_NULL_COL             NORMAL                      NONUNIQUE
> > 1           2           900               900        900 2013-NOV-17
> > 22:27:10
> > SQL>
> > 
> > As far as I'm concerned/understand the CBO chooses the right index 
> > (CBO_TAB_1_NULL_COL). CBO_TAB_1_NULL_COL has 900 num rows, since the 
> > other
> > 100 rows are null.
> > I would say that in this specific case it would be easy for the CBO 
> > to estimate 900 rows since it only visits this index (all 
> > information is in the index).
> > 
> > Now I'm thinking that I might oversee something very simple (or that 
> > i'm flat out wrong), so would anyone be so kind as to point me in 
> > the right direction for some guidance, or (better yet :-)) explain 
> > where my thinking is wrong.
> > 
> > 
> > Best Regards,
> > Martijn
> > 
> 

--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- RE: null values
  - From: Mark W. Farnham
References:
- null values
  - From: Martijn Bos
- Re: null values
  - From: Martijn Bos
RE: null values

Other related posts: