Go to the FreeLists Home Page Home Signup Help Login
 



Browse oracle-l: This Month's ArchiveMain Archive PageRelated postsPrevious by DateNext by Date

Re: Dynamic Sampling: some questions about the guts

  • From: "Alberto Dell'Era" <alberto.dellera@xxxxxxxxx>
  • To: sacrophyte@xxxxxxxxx
  • Date: Sun, 2 Dec 2007 02:18:13 +0100
> 2) How does the Optimizer use these sample counts to influence the costs and
> subsequent access/join methods?

If you try this scenario, and collect a 10053 trace:

exec dbms_random.seed(0);

create table t (x int, y int, z int);
create index t_x_idx on t(x);
create index t_y_idx on t(y);
create index t_x_y_idx on t(x,y);
insert into t (x,y,z)
select mod(rownum-1,10) x, mod(rownum-1,10) y, mod(rownum-1,10) z from
dual connect by level <= 10000
order by dbms_random.random;

alter session set optimizer_dynamic_sampling=5;
-- just to have pretty round numbers:
alter session set "_optimizer_cost_model"=io;
alter session set events '10053 trace name context forever, level 1';
set autotrace traceonly explain
select /*+ dynamic_sampling */ * from t where x = 1 and y = 1 and z = 1;
set autotrace off
alter session set events '10053 trace name context off';

The sampling query (reformatted for clarity) is, in 10.2.0.3:

SELECT /* OPT_DYN_SAMP */ ...
FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T") FULL("T")
NO_PARALLEL_INDEX("T") */
1                                                                  AS C1,
CASE WHEN "T"."X"=1 AND "T"."Y"=1 AND "T"."Z"=1  THEN 1 ELSE 0 END AS C2,
CASE WHEN "T"."Y"=1                              THEN 1 ELSE 0 END AS C3,
CASE WHEN "T"."Y"=1 AND "T"."X"=1                THEN 1 ELSE 0 END AS C4,
CASE WHEN "T"."X"=1                              THEN 1 ELSE 0 END AS C5

C1 estimates num_rows, C2 the query filtered cardinality, and the others
how much index keys are going to be visited if the index is chosen, and hence,
the index selectivity - for each index eligible to be used.

** Executed dynamic sampling query:
    level : 5
    sample pct. : 100.000000
    actual sample size : 10000
    filtered sample card. : 1000  -- Alberto: from C2
    filtered sample card. (index T_Y_IDX): 1000 -- Alberto: from C3
    filtered sample card. (index T_X_Y_IDX): 1000  -- Alberto: from C4
    filtered sample card. (index T_X_IDX): 1000 -- Alberto: from C5
    orig. card. : 82
    block cnt. table stat. : 20
    block cnt. for sampling: 20
    max. sample block cnt. : 64
    sample block cnt. : 20
    min. sel. est. : 0.00000100
    index T_Y_IDX selectivity est.: 0.10000000
    index T_X_Y_IDX selectivity est.: 0.10000000
    index T_X_IDX selectivity est.: 0.10000000

Of course the numbers returned by the query are in general corrected
by the sampling factor - if you estimated by 1%, they are multiplied
by 100. Here the sampling was 100%.

The numbers are then plugged into the usual formulae; for example,
the estimated cost of the index access for T_X_Y_IDX is

  Access Path: index (AllEqRange)
    Index: T_X_Y_IDX
    resc_io: 85.00  resc_cpu: 0
    ix_sel: 0.1  ix_sel_with_filters: 0.1
    Cost: 85.00  Resp: 85.00  Degree: 1

From Jonathan's "Cost Based Oracle" or Wolfgang's famous paper,
we get that the usual formula is
blevel + ix_sel * leaf_blocks + ix_sel_with_filters * clustering_factor

that is "blevel blocks are visited going from the root to the leaves,
ix_sel * leaf_blocks are the leaf block got (visited, range scan),
and ix_sel_with_filters * clustering_factor are the table blocks got".

For this scenario, ix_sel  = ix_sel_with_filters, since all index columns
are used for access, none for (pre)filtering before following the rowid.

blevel and clustering_factor are taken from the index stats (here
clustering_factor is defaulted to 800):

  Index: T_X_Y_IDX  Col#: 1 2    (NOT ANALYZED)
    LVLS: 1  #LB: 25  #DK: 100  LB/K: 1.00  DB/K: 1.00  CLUF: 800.00

The leaf_blocks is AFAIK taken from the index segment header
and set equal to the number of index blocks (in an index, almost
all blocks are leaves):

** Dynamic sampling updated index stats.: T_X_Y_IDX, blocks=43

so 1 + 0.1 * 43 + 0.1 * 800 = 85.3

Notice that the clustering factor is not dynamically estimated, and since it
is very frequently the major contributor, it is well worth to have
the index stats in place.

Caveat: this is how I understand it, I haven't made any exhaustive
investigation. In fact most of this test case comes from my fuzzy recalls
of a discussion I had with Wolfgang eons ago.
It should be enough to answer much of your question anyway ...

HTH
Alberto

-- 
Alberto Dell'Era
"the more you know, the faster you go"
--
http://www.freelists.org/webpage/oracle-l


Other related posts:

  • Dynamic Sampling: some questions about the guts
  • Re: Dynamic Sampling: some questions about the guts
  • Re: Dynamic Sampling: some questions about the guts
  • Re: Dynamic Sampling: some questions about the guts
  • Re: Dynamic Sampling: some questions about the guts
  • Re: Dynamic Sampling: some questions about the guts
  • Re: Dynamic Sampling: some questions about the guts




  • [ Home | Signup | Help | Login | Archives | Lists ]

    All trademarks and copyrights within the FreeLists archives are owned by their respective owners.
    Everything else ©2008 Avenir Technologies, LLC.