RE: why is the optimizer doing this?

From: <Dominic.Brooks@xxxxxxxxxxxxxxxxxxx>
To: <MATT.ADAMS@xxxxxx>, <oracle-l@xxxxxxxxxxxxx>
Date: Fri, 11 Feb 2011 14:39:32 +0000
Optimizer mode of FIRST_ROWS was deprecated in 10g, no?
Or was it 10gR2 when it was officially deprecated?

Anyway, on page 76 of "Cost-based Oracle Fundamentals", Jonathan Lewis says
"One of the critical features of first_rows optimization ... was the existence 
of a few rules for overriding normal costing behaviour.
One such rule was that if there was an index that could be used to avoid a 
sort, then the optimizer would use it - apparently, regardless of how expensive 
the path might be".


So, one of the reasons why FIRST_ROWS is now deprecated in favour of 
FIRST_ROWS_N (where n = 1,10,100 or 1000).


See also 10gR1 Performance Tuning Guide
http://download.oracle.com/docs/cd/B14117_01/server.101/b10752/optimops.htm#38218

" FIRST_ROWS
The optimizer uses a mix of cost and heuristics to find a best plan for fast 
delivery of the first few rows. 

Note: Using heuristics sometimes leads the query optimizer to generate a plan 
with a cost that is significantly larger than the cost of a plan without 
applying the heuristic. FIRST_ROWS is available for backward compatibility and 
plan stability; use FIRST_ROWS_n instead."


Hope this helps.

Cheers,
Dominic

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Adams, Matthew (GE, Appl & Light)
Sent: 11 February 2011 14:00
To: oracle-l@xxxxxxxxxxxxx
Subject: why is the optimizer doing this?

If anyone who understand the optimizer better than me wants to wade
through this and give me their opinion, I would be very grateful.
Fair warning...this kind of long and involved

Platform:  Oracle 10.2.0.4  on HP-UX (Itanium) (I've applied the
10.2.0.5 patchset thinking this might bug 7430474.  Problem remained)
Optimizer Mode:  FIRST_ROWS

Table  TEST1 has three columns  
A - CHAR(8) - NOT NULL
B - number(4) - NOT NULL
C - CHAR(8) - NOT NULL

The primary key "TEST1_PK"  is   (A,B)
there is also a unique index "TEST1_I2" on (B,A)

The data in column A is a char representation of a date  in 'YYYYMMDD'
format.  There are 10862 distinct values of A.   For EACH value of A,
there are  317 values of B.  The data in C is irrelevant to this issue.
So the data looks like.

A         B
--------  -------
20110201        1
20110201        2
20110201        3
....
20110201      316
20110201      317
20110202        1
20110202        2
20110202        3
.....
20110202      316
20110202      317
etc

Given the query

select a, b, c from test1 
where a = '20110210' and b > 10   (this query will return 307 rows)

I would expect range scan on the primary key (a,b) to be the optimal
plan.  And indeed, if I force the execution plan to use that index
(which I can do by using a ALL_ROWS hint, a RULE hint, and INDEX hint or
by removing the ORDER BY clause), it uses the PK and returns in about .2
seconds.   

However, as executed with no hints, it returns in about 1.06 seconds,
which would not be a big deal if we weren't running it somewhere between
3 to 5 times per second (with different parameters and using bind
variables).

In order to try to understand why the optimizer was doing this I
generated an EVENT 10053 trace file.  Here are some relevant portions

BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TEST1  Alias: TEST1
    #Rows: 3413297  #Blks:  12585  AvgRowLen:  21.00
Index Stats::
  Index: TEST1_I2  Col#: 2 1
    LVLS: 2  #LB: 10822  #DK: 3413297  LB/K: 1.00  DB/K: 1.00  CLUF:
3408068.00
  Index: TEST1_PK  Col#: 1 2
    LVLS: 2  #LB: 10834  #DK: 3413297  LB/K: 1.00  DB/K: 1.00  CLUF:
23542.00

OK, this looks about like I expected.  The clustering factor is much
better for the primary key, as I more or less expected based on the way
the table is loaded.


Access Path: TableScan
    Cost:  1145.44  Resp: 1145.44  Degree: 0
     Cost_io: 987.00  Cost_cpu: 1523229875
      Resp_io: 987.00  Resp_cpu: 1523229875
kkofmx: index filter:"TEST1"."A"='20110210' AND "TEST1"."B">10
  Using density: 9.2064e-05 of col #1 as selectivity of unpopular value
pred
  Access Path: index (RangeScan)
    Index: TEST1_I2
    resc_io: 10805.00  resc_cpu: 739184909
    ix_sel: 0.97  ix_sel_with_filters: 8.9302e-05
    Cost: 10881.89  Resp: 10881.89  Degree: 1
  Using density: 9.2064e-05 of col #1 as selectivity of unpopular value
pred
  Access Path: index (RangeScan)
    Index: TEST1_PK
    resc_io: 6.00  resc_cpu: 161679
    ix_sel: 8.9302e-05  ix_sel_with_filters: 8.9302e-05
    Cost: 6.02  Resp: 6.02  Degree: 1 
****** trying bitmap/domain indexes ******
****** finished trying bitmap/domain indexes ******
  Best:: AccessPath: IndexRange  Index: TEST1_PK
         Cost: 6.02  Degree: 1  Resp: 6.02  Card: 304.81  Bytes: 0


This looks good.  It evaluated the Table Scan with a cost of 1145, a
Range Scan of the TEST1_I2 index with a cost of 10881 and a Range Scan
of the primary key with a cost of 6, and decided that the best path was
a range scan of the PK.


Here's the part I don't understand.  Immediately after the section
above, it does this.

***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
***********************
Join order[1]:  TEST1[TEST1]#0
***********************
Best so far: Table#: 0  cost: 6.0168  card: 304.8148  bytes: 6405
****** Recost for ORDER BY (using index) ************
***************************************
SINGLE TABLE ACCESS PATH
  -----------------------------------------
  BEGIN Single Table Cardinality Estimation
  -----------------------------------------
  Using density: 9.2064e-05 of col #1 as selectivity of unpopular value
pred
  Table: TEST1  Alias: TEST1     
    Card: Original: 3413297  Rounded: 305  Computed: 304.81  Non
Adjusted: 304.81
  -----------------------------------------
  END   Single Table Cardinality Estimation
  -----------------------------------------
  Access Path: TableScan
    Cost:  1145.44  Resp: 1145.44  Degree: 0
      Cost_io: 987.00  Cost_cpu: 1523229875
      Resp_io: 987.00  Resp_cpu: 1523229875
kkofmx: index filter:"TEST1"."A"='20110210' AND "TEST1"."B">10
  Using density: 9.2064e-05 of col #1 as selectivity of unpopular value
pred
  Access Path: index (RangeScan)
    Index: TEST1_I2
    resc_io: 10805.00  resc_cpu: 739184909
    ix_sel: 0.97  ix_sel_with_filters: 8.9302e-05
    Cost: 10881.89  Resp: 10881.89  Degree: 1
  Best:: AccessPath: IndexRange  Index: TEST1_I2
         Cost: 10881.89  Degree: 1  Resp: 10881.89  Card: 304.81  Bytes:
21


Notice that line up there that says 
'****** Recost for ORDER BY (using index) ************' ??

It apparently is going to go through the calculations again and adjust
for the cost of the 'order by' clause. But it doesn't even consider
using the Primary Key!!  It only evaluates the Table Scan and Range scan
of TEST_I2.
And chooses the more expensive of the two!!!


Are there other parts of the 10053 trace that I should be examining more
closely and am just not seeing?  Can anyone explain why it's doing this?


--
//www.freelists.org/webpage/oracle-l


_______________________________________________

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the 
foregoing.  Barclays Capital is the investment banking division of Barclays 
Bank PLC, a company registered in England (number 1026167) with its registered 
office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be 
sent from other members of the Barclays Group.
_______________________________________________
--
//www.freelists.org/webpage/oracle-l
References:
- why is the optimizer doing this?
  - From: Adams, Matthew (GE, Appl & Light)
RE: why is the optimizer doing this?

Other related posts: