Re: Dynamic Sampling: some questions about the guts

  • From: David Aldridge <david@xxxxxxxxxxxxxxxxxx>
  • To: sacrophyte@xxxxxxxxx, oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Sat, 1 Dec 2007 14:05:38 -0800 (PST)

I wouold guess that this might be to do with the optimizer looking at different 
access paths to the table. For example, it might be not be much more selective 
to access the table via 
   
  "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1 
 AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2 
AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3 
 AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4 

  than it is to access it with:
   
  "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4

If there is one index on all four columns and one index on just the two columns 
then accessing via the two column index might be more efficient than using the 
four-column one.
   
  Just speculation though.

  In one of our reporting environments (OSEE 10.2.0.2) we have 
optimizer_dynamic_sampling cranked up to 5. We have a case where dynamic 
sampling is pushing the CBO to choose a poor plan. Our situation highlights 
Level 4, specifically "tables that have single-table predicates that reference 
2 or more columns". (from the documentation)

Looking at a 10046 trace, I see a OPT_DYN_SAMP query that counts a sample of 
rows that match variations on the predicate. I have found that a 10046 can 
always be good grounds for finding more questions than it answers. *grin* 

To avoid the bandwidth, I will show just the relevant parts of the sampling 
query:
SELECT /* OPT_DYN_SAMP */
 ...
FROM
 (SELECT /* ... */
   1 AS C1,
   CASE
    WHEN     "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1 
         AND "RCRAPP1"."RCRAPP1_INFC_CODE"=:bind2 
         AND "RCRAPP1"."RCRAPP1_CURR_REC_IND"=:bind3 
         AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4 
    THEN 1
    ELSE 0
   END AS C2,
   CASE
    WHEN     "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
         AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4
    THEN 1
    ELSE 0
   END AS C3,
   CASE
    WHEN     "RCRAPP1"."RCRAPP1_AIDY_CODE"=:bind1
         AND "RCRAPP1"."RCRAPP1_VPDI_CODE"=:bind4
    THEN 1
    ELSE 0
   END AS C4
 FROM ...
 SAMPLE BLOCK (.037704 , 1) SEED (1) "RCRAPP1" 
) SAMPLESUB
/

First question: Why the predicate variations? I am assuming that sometimes C4 
is different than C3, so I am going to ignore C4 for now. 
2) How does the Optimizer use these sample counts to influence the costs and 
subsequent access/join methods?

I believe what is happening in our case is that sampling is "just unlucky" and 
it happens to come up with a inaccurate picture of how many rows match (an 
order of magnitude too many). I can prove this by using this query directly 
with different sample sizes. 

I am hoping that by understanding dynamic sampling a little better, I can 
understand how to resolve this issue. I realize that one can add the 
dynamic_sampling query hint, and perhaps that is the "best" solution for this. 
Aside from adjusting dynamic sampling (whether it be at the system level or via 
a hint), is there any other way to address this situation? 

TIA,

-- 
Charles Schultz 

Other related posts: