Nirav,
Thanks for posting so much relevant information.
There are several possible effects visible here. The first is simply that to
the optimizer the selectivity of a
combination of columns is the product of the individual selectivities. This is
why you get an estimate of 142 when you drop the histograms:
143 = 2231714 * (1/3 * 1/50 * 1/104)
Secondly (as Andrew says), if one of your predicates goes out of range, Oracle
scales down its selectivity by a measure of how far out of range it is.
Third, when a column has a frequency histogram on it and you ask for a value
that doesn't seem to be in the histogram then Oracle uses "half the least
popular" value as the selectivity (and then scales that if you're also out of
range). You can see that two of your frequency histograms have "num_buckets"
less than "num_distinct", so the gather for the histogram must have missed some
values (or num_buckets would equal num_distinct). Notice that the sample for
the histograms is only 5,548 rows, so with a heavy skew (which I think you must
have to get 10,000 rows as the final result) it's not surprising that you've
missed a few values somewhere. You may do better if you gather histograms on
these columns with a sample size of 100% - but you'd still run into the
"combination of columns" problem.
So - you need to create a column group (dbms_stats.create_extended_stats)
across all three columns or (possibly) on the two columns that are most closely
correlated; and you'll need a histogram on the extended stats or the individual
histograms will make the optimizer ignore the benefit of the column group. Even
then, unfortunately, you may run into problems because you really need the
column group histogram to be a frequency histogram and you may have too many
combinations across the three columns to get one (hence my comment about
picking two out of three).
Here's a search link to a few notes that may be useful:
https://jonathanlewis.wordpress.com/?s=column+group+histogram.
Regards
Jonathan Lewis
________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx <oracle-l-bounce@xxxxxxxxxxxxx> on behalf
of Nirav A Shah <shivam71@xxxxxxxxxxx>
Sent: 28 October 2017 16:52:23
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: A question about huge difference in cardinality of a query with 3
predicates
Hi All,
I am having an issue of a query getting a bad plan -which I have narrowed down
to major mismatch between estimated and actual cardinality of one of the tables
involved in the query. I need
help from experts on understanding why the cardinality estimate is wrong and
what can be done to rectify. For this purpose I have created a test table with
required columns and tried to
create same situation (of cardinality mismatch) and following are the details
on it. (Oracle version is : 11.2.0.4 on Linux.)
Table structure: CREATE TABLE processes
(
process_id NUMBER (12) NOT NULL PRIMARY KEY,
clientid VARCHAR2 (20) NOT NULL,
evt_type_id NUMBER (12) NOT NULL,
status VARCHAR2 (20) NOT NULL
)
Following are the stats about the table (Thanks to the print_table utility of
Tom Kyte Sir - the below is output from "select * from all_tables where
table_name='PROCESSES'" from
which several default stuff is remove to keep this a little brief)
OWNER : TEST
TABLE_NAME : PROCESSES
PCT_FREE : 10
NUM_ROWS : 2231714
BLOCKS : 12137
AVG_ROW_LEN : 31
SAMPLE_SIZE : 2231714
Here is the query and test data :
variable v_coid varchar2(16);
variable v_status varchar2(16);
variable v_evt_type_id number;
exec :v_coid := 'G31MQQNM99ABCA9V';
exec :v_status := 'INP';
exec :v_event_type_id :=5014;
SQL> SELECT COUNT (*) FROM processes wpi WHERE WPI.CLIENTID = :v_coid
AND wpi.status = :v_status AND WPI.evt_type_id = :v_event_type_id;
COUNT(*)
----------
10404
1 row selected.
Following is the Execution plsn:
Plan hash value: 3304333532
------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08
| 11406 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08
| 11406 |
|* 2 | TABLE ACCESS FULL| PROCESSES | 1 | 62 | 10404 |00:00:00.08
| 11406 |
------------------------------------------------------------------------------------------
Peeked Binds (identified by position):
--------------------------------------
1 - (VARCHAR2(30), CSID=873): 'G31MQQNM99ABCA9V'
2 - (VARCHAR2(30), CSID=873): 'INP'
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 - (NUMBER): 3004
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."STATUS"=:V_STATUS AND
"WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID))
29 rows selected.
Here we can see the Huge difference - the estimated cardinality of just 62
versus the acutal cardinality of 10404. Note that all these 3 columns involved
in the query have frequency histograms
on them and following are details on it...and this is the issue - the
difference between estimated cardinality of 62 versus actual value of 10404
that I need help to sort out.
exec print_table('select * from all_tab_col_statistics where owner=''TEST'' AND
TABLE_NAME=''PROCESSES''');
OWNER : TEST -- same for all the columns below
TABLE_NAME : PROCESSES -- same for all the columns below
COLUMN_NAME : PROCESS_ID
NUM_DISTINCT : 2231714
LOW_VALUE : C50D01015C04
HIGH_VALUE : C5170B1A5218
DENSITY : .000000444209506971868
NUM_NULLS : 0
NUM_BUCKETS : 254
LAST_ANALYZED : 20-oct-2017 04:20:20 --for all columns below
too...the stats are current.
SAMPLE_SIZE : 5548
AVG_COL_LEN : 7
HISTOGRAM : HEIGHT BALANCED
-----------------
OWNER : TEST
TABLE_NAME : PROCESSES
COLUMN_NAME : CLIENTID
NUM_DISTINCT : 104
LOW_VALUE : 30324652425132433157563030303936
HIGH_VALUE : 47354D384A4D54394545484A51475859
DENSITY : .000000222104676853238
NUM_NULLS : 0
NUM_BUCKETS : 87
SAMPLE_SIZE : 5548
AVG_COL_LEN : 17
HISTOGRAM : FREQUENCY
-----------------
COLUMN_NAME : EVT_TYPE_ID
NUM_DISTINCT : 50
LOW_VALUE : C102
HIGH_VALUE : C21F07
DENSITY : .000000222104676853238
NUM_NULLS : 0
NUM_BUCKETS : 38
SAMPLE_SIZE : 5548
AVG_COL_LEN : 4
HISTOGRAM : FREQUENCY
-----------------
COLUMN_NAME : STATUS
NUM_DISTINCT : 3
LOW_VALUE : 434F4D
HIGH_VALUE : 494E50
DENSITY : .000000222104676853238
NUM_NULLS : 0
NUM_BUCKETS : 3
SAMPLE_SIZE : 5548
AVG_COL_LEN : 4
HISTOGRAM : FREQUENCY
-----------------
Please let me know if I should provide any other details. One thing I found was
that if I do dynamic sampling level 8 then the cardinality estimate improved to
nearly 7000 (don't recall that exact figure),but I would not like to depend on
it. The other thing is if I drop the histograms then the cardinality improved
slightly - to 143 as shown in the plan below but that is still way off the
actual value which is more than 10K (10404 to be exact).
------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.08
| 11406 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.08
| 11406 |
|* 2 | TABLE ACCESS FULL| PROCESSES | 1 | 143 | 10404 |00:00:00.08
| 11406 |
------------------------------------------------------------------------------------------
Peeked Binds (identified by position):
--------------------------------------
1 - (VARCHAR2(30), CSID=873): 'G31MQQNMH3TW7A9V'
2 - (VARCHAR2(30), CSID=873): 'INP'
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 - (NUMBER): 3004
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("WPI"."CLIENTID"=:V_COID AND "WPI"."EVT_TYPE_ID"=:V_EVT_TYPE_ID
AND "WPI"."STATUS"=:V_STATUS))
Regards, Nirav
--
//www.freelists.org/webpage/oracle-l