Re: A question about huge difference in cardinality of a query with 3 predicates

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>, "shivam71@xxxxxxxxxxx" <shivam71@xxxxxxxxxxx>
Date: Sun, 29 Oct 2017 14:55:06 +0000

Nirav,

Thanks for posting so much relevant information.
There are several possible effects visible here. The first is simply that to
the optimizer the selectivity of a

combination of columns is the product of the individual selectivities. This is
why you get an estimate of 142 when you drop the histograms:
        143 = 2231714 * (1/3 * 1/50 * 1/104)

Secondly (as Andrew says), if one of your predicates goes out of range, Oracle
scales down its selectivity by a measure of how far out of range it is.

Third, when a column has a frequency histogram on it and you ask for a value
that doesn't seem to be in the histogram then Oracle uses "half the least
popular" value as the selectivity (and then scales that if you're also out of
range).  You can see that two of your frequency histograms have "num_buckets"
less than "num_distinct", so the gather for the histogram must have missed some
values (or num_buckets would equal num_distinct). Notice that the sample for
the histograms is only 5,548 rows, so with a heavy skew (which I think you must
have to get 10,000 rows as the final result) it's not surprising that you've
missed a few values somewhere. You may do better if you gather histograms on
these columns with a sample size of 100% - but you'd still run into the
"combination of columns" problem.

So - you need to create a column group (dbms_stats.create_extended_stats)
across all three columns or (possibly) on the two columns that are most closely
correlated; and you'll need a histogram on the extended stats or the individual
histograms will make the optimizer ignore the benefit of the column group. Even
then, unfortunately, you may run into problems because you really need the
column group histogram to be a frequency histogram and you may have too many
combinations across the three columns to get one (hence my comment about
picking two out of three).

Here's a search link to a few notes that may be useful:
https://jonathanlewis.wordpress.com/?s=column+group+histogram.

Regards
Jonathan Lewis

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx <oracle-l-bounce@xxxxxxxxxxxxx> on behalf
of Nirav A Shah <shivam71@xxxxxxxxxxx>
Sent: 28 October 2017 16:52:23
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: A question about huge difference in cardinality of a query with 3
predicates

Hi All,

I am having an issue of a query getting a bad plan  -which I have narrowed down
to major mismatch between estimated and actual cardinality of one of the tables
involved in the query. I need

help from experts on understanding why the cardinality estimate is wrong and
what can be done to rectify.  For this purpose I have created a test table with
required columns and tried to

create same situation (of cardinality mismatch) and following are the details
on it. (Oracle version is : 11.2.0.4 on Linux.)

Table structure: CREATE TABLE processes

(

   process_id    NUMBER (12) NOT NULL PRIMARY KEY,

   clientid      VARCHAR2 (20) NOT NULL,

   evt_type_id   NUMBER (12) NOT NULL,

   status        VARCHAR2 (20) NOT NULL

)

Following are the stats about the table (Thanks to the print_table utility of
Tom Kyte Sir - the below is output from "select * from all_tables where
table_name='PROCESSES'" from

which several default stuff is remove to keep this a little brief)

OWNER                         : TEST

TABLE_NAME                    : PROCESSES

PCT_FREE                      : 10

NUM_ROWS                      : 2231714

BLOCKS                        : 12137

AVG_ROW_LEN                   : 31

SAMPLE_SIZE                   : 2231714

Here is the query and test data :

variable v_coid varchar2(16);

variable v_status varchar2(16);

variable v_evt_type_id number;

exec :v_coid := 'G31MQQNM99ABCA9V';

exec :v_status := 'INP';

exec :v_event_type_id :=5014;

SQL> SELECT COUNT (*) FROM processes wpi  WHERE    WPI.CLIENTID = :v_coid
AND wpi.status = :v_status AND WPI.evt_type_id = :v_event_type_id;

  COUNT(*)

----------

     10404

1 row selected.

Following is the Execution plsn:

Plan hash value: 3304333532

------------------------------------------------------------------------------------------

| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time
| Buffers |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.08
|   11406 |

|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.08
|   11406 |

|*  2 |   TABLE ACCESS FULL| PROCESSES |      1 |     62 |  10404 |00:00:00.08
|   11406 |

------------------------------------------------------------------------------------------

Peeked Binds (identified by position):

--------------------------------------

   1 - (VARCHAR2(30), CSID=873): 'G31MQQNM99ABCA9V'

   2 - (VARCHAR2(30), CSID=873): 'INP'

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   3 - (NUMBER): 3004

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."STATUS"=:V_STATUS AND

              "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID))

29 rows selected.

Here we can see the Huge difference - the estimated cardinality of just 62
versus the acutal cardinality of 10404. Note that all these 3 columns involved
in the query have frequency histograms

on them and following are details on it...and this is the issue - the
difference between estimated cardinality of 62 versus actual value of 10404
that I need help to sort out.

exec print_table('select * from all_tab_col_statistics where owner=''TEST'' AND
TABLE_NAME=''PROCESSES''');

OWNER                         : TEST -- same for all the columns below

TABLE_NAME                    : PROCESSES -- same for all the columns below

COLUMN_NAME                   : PROCESS_ID

NUM_DISTINCT                  : 2231714

LOW_VALUE                     : C50D01015C04

HIGH_VALUE                    : C5170B1A5218

DENSITY                       : .000000444209506971868

NUM_NULLS                     : 0

NUM_BUCKETS                   : 254

LAST_ANALYZED                 : 20-oct-2017 04:20:20 --for all columns below
too...the stats are current.

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 7

HISTOGRAM                     : HEIGHT BALANCED

-----------------

OWNER                         : TEST

TABLE_NAME                    : PROCESSES

COLUMN_NAME                   : CLIENTID

NUM_DISTINCT                  : 104

LOW_VALUE                     : 30324652425132433157563030303936

HIGH_VALUE                    : 47354D384A4D54394545484A51475859

DENSITY                       : .000000222104676853238

NUM_NULLS                     : 0

NUM_BUCKETS                   : 87

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 17

HISTOGRAM                     : FREQUENCY

-----------------

COLUMN_NAME                   : EVT_TYPE_ID

NUM_DISTINCT                  : 50

LOW_VALUE                     : C102

HIGH_VALUE                    : C21F07

DENSITY                       : .000000222104676853238

NUM_NULLS                     : 0

NUM_BUCKETS                   : 38

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 4

HISTOGRAM                     : FREQUENCY

-----------------

COLUMN_NAME                   : STATUS

NUM_DISTINCT                  : 3

LOW_VALUE                     : 434F4D

HIGH_VALUE                    : 494E50

DENSITY                       : .000000222104676853238

NUM_NULLS                     : 0

NUM_BUCKETS                   : 3

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 4

HISTOGRAM                     : FREQUENCY

-----------------

Please let me know if I should provide any other details. One thing I found was
that if I do dynamic sampling level 8 then the cardinality estimate improved to
nearly 7000 (don't recall that exact figure),but I would not like to depend on
it. The other thing is if I drop the histograms then the cardinality improved
slightly - to 143 as shown in the plan below but that is still way off the
actual value which is more than 10K (10404 to be exact).

------------------------------------------------------------------------------------------

| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time
| Buffers |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.08
|   11406 |

|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.08
|   11406 |

|*  2 |   TABLE ACCESS FULL| PROCESSES |      1 |    143 |  10404 |00:00:00.08
|   11406 |

------------------------------------------------------------------------------------------

Peeked Binds (identified by position):

--------------------------------------

   1 - (VARCHAR2(30), CSID=873): 'G31MQQNMH3TW7A9V'

   2 - (VARCHAR2(30), CSID=873): 'INP'

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   3 - (NUMBER): 3004

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID

              AND "WPI"."STATUS"=:V_STATUS))

Regards, Nirav

--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- Re: A question about huge difference in cardinality of a query with 3 predicates
  - From: Nirav A Shah

References:
- Re: A question about huge difference in cardinality of a query with 3 predicates
  - From: Nirav A Shah

Re: A question about huge difference in cardinality of a query with 3 predicates

Other related posts: