RE: Which plan is better - what COST really means ?

At 10:00 AM 1/15/2005, you wrote:

>Which type of IO by your opinion optimizer takes as base type for cost
>As far as I know different types of operations (FS, IRS, IQS, IFS, FILTER)
>execute IO differently. I mean CPU utilization is different for single IO
>unit in case of different operations.
>Another question is If cost is IO, how optimizer estimates sorting/hash
>etc. actions in IO units?

The "cost" in the CBO's calculations is basically the count of the number 
of blocks that need to be examined in order to get the result. For example, 
for a fully qualified primary key access to a row one block in each of the 
index levels needs to be examined from the root over the branch blocks to 
the leaf and then the data blocks needs to be examined, ergo the cost is 
"LVLS +1", the height of the index plus one. For a full table scan, each 
block up to the hwm needs to be examined. The CBO is aware of multiblock 
reads, so the number of blocks is divided be db_file_multi_block_read_count 
to come up with a basic cost for the FTS. However, the CBO is also aware 
that not every multiblock read will be set up to read dfmrc blocks so it 
applies a fudge factor to the basic FTS cost => table-scan-cost (tsc) = 
These were the easy cases. The CBO needed 1 or all blocks of the table. If 
you have a predicate where an unknown number of rows qualify, the CBO must 
first estimate how many rows will qualify. That's the selectivity of a 
predicate and this is where the first assumption comes in, that each of the 
possible num_distinct predicate values will yield the same 
num_rows/num_distinct (= FilterFactor = Selectivity) rows - unless you 
collected a frequency histogram then the CBO has the exact number. Once it 
has the row estimate, the CBO needs to convert that into a block estimate, 
its cost "currency". If the base access method is a FTS, the cost is the 
FTS cost as explained above, else it uses the clustering_factor of the 
index as a measure of how clustered vs scattered the rows found via the 
index are. If they are clustered (a low clustering factor) the row estimate 
translates into a lower block estimate than if they are scattered. But 
remember that this is a statistic and therefore an aggregate. The 
clustering factor measures how many block transitions there are when you 
progress from one index leaf value to the next one. But if your predicates 
don't include all the trailing index columns then the rows you're seeking 
may be far more scattered than the clustering_factor for the full index 
indicates. Even worse if you are missing leading index columns (skip_scan).

To answer Juri's question about the costs of sorting/hashing. As long as 
from the size of the sort/hash the CBO can assert that no spill to the temp 
tablespace is necessary (that is where sort/hash area size play a role), 
the sort and hash cost seem to be constant at 2 and 1 respectively (from 
what I see in 9.2). Beyond that the number of blocks to be written/read 
(i.e. the cost "currency") get estimated from the number of sort and merge 
runs required or hash probes required and then tacked on to the cost.

I hope you appreciate by now how reality can differ from even the best 
attempts by the CBO of estimating the cost (ie single block IOs) of a 
query. That is why it is virtually impossible to compare the costs of two 
plans even for the same query, even though the cost is by design not a 
meaningless number. You have to know the details of how those costs were 
derived. Assume just for a moment you have a single table sql with a single 
range predicate which results in an index range scan with cost N. If I set 
optimizer_index_cost_adj=50, the cost will come out as N/2. Would you 
assume that just by doing that the query now runs twice as fast?


Wolfgang Breitling
Centrex Consulting Corporation 


Other related posts: