Re: Optimizer not using histogram to calculate estimated number of rows?

Assuming SIZE=254 and estimate_percent = null

ndv=52999 > size=254 => height-balanced histogram

Only if 'HVA_FUNC' is recorded as a popular value the histogram
data will be used; if either  'HVA_FUNC' is not popular or it is not
recorded in the histogram, num_rows*density is used.

To build the histogram, rows are sorted and then sampled over
an grid spaced approx num_rows / size = 166067 / 254 = 653.

So a value most occur at least 2*653 times to be recorded as
popular (= sampled at least two times), it might be recorded as
popular if it occurs at least  653 times (depending on the exact sampling grid),
it is never recorded as popular if it occurs less than  653 times.

In the other case 26000 / 254 = 102, hence it is more likely that some
values get recorded as popular (since you mention "60 values have over
100 rows each",
some of them might be recorded as popular).

hth
Alberto

On Thu, Jun 11, 2009 at 00:20, David Kurtz<info@xxxxxxxxxxxxxxx> wrote:
> I am working on 10.2.0.4 on Linux.  I have a table with a highly skewed set
> of data.
>
> 166067 rows, 52999 distinct values in an indexed column.
> For 3000+ values have 3 rows each, about 60 values have over 100 rows each.
> So I thought this is a good candidate for a histogram.
(snip)

>
> However, the optimizer doesn't seem to use the histogram in calculating the
> cost
> (That is causing problems in a more complex query, because the is causing
> the plan not to change).
>
> select /*+GATHER_PLAN_STATISTICS*/ count(*) from psroleuser where roleuser
> ='HVA_FUNC'
>
> ---------------------------------------------------------------------------------------------
> | Id  | Operation         | Name          | Starts | E-Rows | A-Rows |
> A-Time   | Buffers |
> ---------------------------------------------------------------------------------------------
> |   1 |  SORT AGGREGATE   |               |      1 |      1 |      1
> |00:00:00.01 |       4 |
> |*  2 |   INDEX RANGE SCAN| PS_PSROLEUSER |      1 |      3 |    169
> |00:00:00.01 |       4 |
> ---------------------------------------------------------------------------------------------
>
> I have another version of this table on a test database, but the volume is
> slightly smaller (26000 rows, 3300 distinct values)
>
> That uses the histogram exactly as I would expect, the estimated number of
> rows is close to reality, and the execution plan of the complex query
> changes.





-- 
Alberto Dell'Era
"the more you know, the faster you go"
--
http://www.freelists.org/webpage/oracle-l


Other related posts: