Re: A question about huge difference in cardinality of a query with 3 predicates

From: Nirav A Shah <shivam71@xxxxxxxxxxx>
To: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
Date: Sun, 29 Oct 2017 16:52:16 +0000

Thanks a lot Jonathan and Andrew!! I am now getting convinced that it is
histogram that is causing the issue but would love to see how in a test setup
the issue could get resolved. Meaning , how can I differently create a
histogram in a test setup that gets the cardinality in a nearly ballpark
range...

I will go over the suggestions you have mentioned and post back in next two
days time.

Thanks again,

Nirav

________________________________
From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
Sent: Sunday, October 29, 2017 8:25 PM
To: oracle-l@xxxxxxxxxxxxx; shivam71@xxxxxxxxxxx
Subject: Re: A question about huge difference in cardinality of a query with 3
predicates

Nirav,

Thanks for posting so much relevant information.
There are several possible effects visible here. The first is simply that to
the optimizer the selectivity of a

combination of columns is the product of the individual selectivities. This is
why you get an estimate of 142 when you drop the histograms:
143 = 2231714 * (1/3 * 1/50 * 1/104)

Secondly (as Andrew says), if one of your predicates goes out of range, Oracle
scales down its selectivity by a measure of how far out of range it is.

Third, when a column has a frequency histogram on it and you ask for a value
that doesn't seem to be in the histogram then Oracle uses "half the least
popular" value as the selectivity (and then scales that if you're also out of
range). You can see that two of your frequency histograms have "num_buckets"
less than "num_distinct", so the gather for the histogram must have missed some
values (or num_buckets would equal num_distinct). Notice that the sample for
the histograms is only 5,548 rows, so with a heavy skew (which I think you must
have to get 10,000 rows as the final result) it's not surprising that you've
missed a few values somewhere. You may do better if you gather histograms on
these columns with a sample size of 100% - but you'd still run into the
"combination of columns" problem.

So - you need to create a column group (dbms_stats.create_extended_stats)
across all three columns or (possibly) on the two columns that are most closely
correlated; and you'll need a histogram on the extended stats or the individual
histograms will make the optimizer ignore the benefit of the column group. Even
then, unfortunately, you may run into problems because you really need the
column group histogram to be a frequency histogram and you may have too many
combinations across the three columns to get one (hence my comment about
picking two out of three).

Here's a search link to a few notes that may be useful:
https://jonathanlewis.wordpress.com/?s=column+group+histogram.
Oracle Scratchpad on
WordPress.com<https://jonathanlewis.wordpress.com/?s=column+group+histogram>
jonathanlewis.wordpress.com
Just another Oracle weblog

References:
- Re: A question about huge difference in cardinality of a query with 3 predicates
  - From: Nirav A Shah
- Re: A question about huge difference in cardinality of a query with 3 predicates
  - From: Jonathan Lewis

Re: A question about huge difference in cardinality of a query with 3 predicates

Other related posts: