Re: A question about huge difference in cardinality of a query with 3 predicates

From: Nirav A Shah <shivam71@xxxxxxxxxxx>
To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
Date: Sat, 28 Oct 2017 15:52:23 +0000

Hi All,

I am having an issue of a query getting a bad plan  -which I have narrowed down
to major mismatch between estimated and actual cardinality of one of the tables
involved in the query. I need

help from experts on understanding why the cardinality estimate is wrong and
what can be done to rectify.  For this purpose I have created a test table with
required columns and tried to

create same situation (of cardinality mismatch) and following are the details
on it. (Oracle version is : 11.2.0.4 on Linux.)

Table structure: CREATE TABLE processes

(

   process_id    NUMBER (12) NOT NULL PRIMARY KEY,

   clientid      VARCHAR2 (20) NOT NULL,

   evt_type_id   NUMBER (12) NOT NULL,

   status        VARCHAR2 (20) NOT NULL

)

Following are the stats about the table (Thanks to the print_table utility of
Tom Kyte Sir - the below is output from "select * from all_tables where
table_name='PROCESSES'" from

which several default stuff is remove to keep this a little brief)

OWNER                         : TEST

TABLE_NAME                    : PROCESSES

PCT_FREE                      : 10

NUM_ROWS                      : 2231714

BLOCKS                        : 12137

AVG_ROW_LEN                   : 31

SAMPLE_SIZE                   : 2231714

Here is the query and test data :

variable v_coid varchar2(16);

variable v_status varchar2(16);

variable v_evt_type_id number;

exec :v_coid := 'G31MQQNM99ABCA9V';

exec :v_status := 'INP';

exec :v_event_type_id :=5014;

SQL> SELECT COUNT (*) FROM processes wpi  WHERE    WPI.CLIENTID = :v_coid
AND wpi.status = :v_status AND WPI.evt_type_id = :v_event_type_id;

  COUNT(*)

----------

     10404

1 row selected.

Following is the Execution plsn:

Plan hash value: 3304333532

------------------------------------------------------------------------------------------

| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time
| Buffers |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.08
|   11406 |

|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.08
|   11406 |

|*  2 |   TABLE ACCESS FULL| PROCESSES |      1 |     62 |  10404 |00:00:00.08
|   11406 |

------------------------------------------------------------------------------------------

Peeked Binds (identified by position):

--------------------------------------

   1 - (VARCHAR2(30), CSID=873): 'G31MQQNM99ABCA9V'

   2 - (VARCHAR2(30), CSID=873): 'INP'

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   3 - (NUMBER): 3004

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."STATUS"=:V_STATUS AND

              "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID))

29 rows selected.

Here we can see the Huge difference - the estimated cardinality of just 62
versus the acutal cardinality of 10404. Note that all these 3 columns involved
in the query have frequency histograms

on them and following are details on it...and this is the issue - the
difference between estimated cardinality of 62 versus actual value of 10404
that I need help to sort out.

exec print_table('select * from all_tab_col_statistics where owner=''TEST'' AND
TABLE_NAME=''PROCESSES''');

OWNER                         : TEST -- same for all the columns below

TABLE_NAME                    : PROCESSES -- same for all the columns below

COLUMN_NAME                   : PROCESS_ID

NUM_DISTINCT                  : 2231714

LOW_VALUE                     : C50D01015C04

HIGH_VALUE                    : C5170B1A5218

DENSITY                       : .000000444209506971868

NUM_NULLS                     : 0

NUM_BUCKETS                   : 254

LAST_ANALYZED                 : 20-oct-2017 04:20:20 --for all columns below
too...the stats are current.

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 7

HISTOGRAM                     : HEIGHT BALANCED

-----------------

OWNER                         : TEST

TABLE_NAME                    : PROCESSES

COLUMN_NAME                   : CLIENTID

NUM_DISTINCT                  : 104

LOW_VALUE                     : 30324652425132433157563030303936

HIGH_VALUE                    : 47354D384A4D54394545484A51475859

DENSITY                       : .000000222104676853238

NUM_NULLS                     : 0

NUM_BUCKETS                   : 87

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 17

HISTOGRAM                     : FREQUENCY

-----------------

COLUMN_NAME                   : EVT_TYPE_ID

NUM_DISTINCT                  : 50

LOW_VALUE                     : C102

HIGH_VALUE                    : C21F07

DENSITY                       : .000000222104676853238

NUM_NULLS                     : 0

NUM_BUCKETS                   : 38

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 4

HISTOGRAM                     : FREQUENCY

-----------------

COLUMN_NAME                   : STATUS

NUM_DISTINCT                  : 3

LOW_VALUE                     : 434F4D

HIGH_VALUE                    : 494E50

DENSITY                       : .000000222104676853238

NUM_NULLS                     : 0

NUM_BUCKETS                   : 3

SAMPLE_SIZE                   : 5548

AVG_COL_LEN                   : 4

HISTOGRAM                     : FREQUENCY

-----------------

Please let me know if I should provide any other details. One thing I found was
that if I do dynamic sampling level 8 then the cardinality estimate improved to
nearly 7000 (don't recall that exact figure),but I would not like to depend on
it. The other thing is if I drop the histograms then the cardinality improved
slightly - to 143 as shown in the plan below but that is still way off the
actual value which is more than 10K (10404 to be exact).

------------------------------------------------------------------------------------------

| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |   A-Time
| Buffers |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |      1 |        |      1 |00:00:00.08
|   11406 |

|   1 |  SORT AGGREGATE    |           |      1 |      1 |      1 |00:00:00.08
|   11406 |

|*  2 |   TABLE ACCESS FULL| PROCESSES |      1 |    143 |  10404 |00:00:00.08
|   11406 |

------------------------------------------------------------------------------------------

Peeked Binds (identified by position):

--------------------------------------

   1 - (VARCHAR2(30), CSID=873): 'G31MQQNMH3TW7A9V'

   2 - (VARCHAR2(30), CSID=873): 'INP'

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   3 - (NUMBER): 3004

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID

              AND "WPI"."STATUS"=:V_STATUS))

Regards, Nirav

Follow-Ups:
- Re: A question about huge difference in cardinality of a query with 3 predicates
  - From: Andy Sayer
- Re: A question about huge difference in cardinality of a query with 3 predicates
  - From: Jonathan Lewis

Re: A question about huge difference in cardinality of a query with 3 predicates

Other related posts: