Unexplained Plan Change

  • From: LB <moabrivers@xxxxxxxxx>
  • To: oracle-l <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 21 Jun 2010 10:53:36 -0600

I've got a query that ran this morning that for some reason decided to jump
ship off its normal plan and use a different plan.  This is a very
straightforward single table query.  When the query runs normally it uses
the correct index and returns results very quickly.  This table and its 2
indexes (one PK and one non-unique) have their respective statistics
gathered nightly at 1AM.  At 8AM this morning, Oracle decided that using the
PK index was the best choice and so it gobbled up 22 seconds doing single
block I/O.  Now I understand that gathering statistics can cause a SQL plan
to change but that plan change should remain effect all things being equal.
 If it jump shipped because of a statistical gathering change then it should
stay the course on the new plan.  It doesn't.  Explaining the query from
several different clients all result in the correct explain plan that uses
the non-unique index and not the primary key index.  I am unable to
reproduce any situation where the PK should be used for this query so I am a
bit confused as to why Oracle decided it was the best choice at that moment
in time.  I also understand that bind peeking was an issue in 10.2.0.3
resolved in 10.2.0.4.  Even explaining the plan with just the bind variables
results in the correct plan again being chosen.  The only recorded wait time
was for db file sequential reads (8 seconds on the index datafile, 14
seconds on the table datafile).

The optimizer settings haven't changed and its impossible for the driving
application to change its optimizer settings at the session level.  The
query uses bind variables (we are 10.2.0.4 64-bit on Linux) and I've
reconstructed the query's variables using v$sql_bind_capture.   As I'm not
the code architect behind the generation of this query, I cannot change the
query itself.


Here's an example of the query:

The PK index is on COL_A_I_PK and COL_F_PK
The non-unique index is on COL_A_I_PK, COL_B_I, COL_C_I, COL_D_I, COL_E_I

You will notice the bulk of predicate filter is repetitive with OR
statements on columns B, C, D, and E.

SELECT COUNT(*)
FROM
   (
   SELECT COL_A_I_PK,
      COL_B_I,
      COL_C_I,
      COL_D_I,
      COL_E_I,
      COL_F_PK,
    other_non_indexed_columns....
   FROM T
   WHERE COL_A_I_PK = :1
  AND other_non_indexed_column = :"SYS_B_0"
  AND ((COL_B_I = :2
  AND COL_C_I = :3
  AND COL_D_I = :4
  AND COL_E_I=:5)
   OR (COL_B_I = :6
  AND COL_C_I = :7
  AND COL_D_I = :8
  AND COL_E_I=:9)
   OR (COL_B_I = :10
  AND COL_C_I = :11
  AND COL_D_I = :12
  AND COL_E_I=:13)
   OR (COL_B_I = :14
  AND COL_C_I = :15
  AND COL_D_I = :16
  AND COL_E_I=:17)
   OR (COL_B_I = :18
  AND COL_C_I = :19
  AND COL_D_I = :20
  AND COL_E_I=:21)
   OR (COL_B_I = :22
  AND COL_C_I = :23
  AND COL_D_I = :24
  AND COL_E_I=:25)
   OR (COL_B_I = :26
  AND COL_C_I = :27
  AND COL_D_I = :28
  AND COL_E_I=:29)
   OR (COL_B_I = :30
  AND COL_C_I = :31
  AND COL_D_I = :32
  AND COL_E_I=:33)
   OR (COL_B_I = :34
  AND COL_C_I = :35
  AND COL_D_I = :36
  AND COL_E_I=:37)
   OR (COL_B_I = :38
  AND COL_C_I = :39
  AND COL_D_I = :40
  AND COL_E_I=:41)
   OR (COL_B_I = :42
  AND COL_C_I = :43
  AND COL_D_I = :44
  AND COL_E_I=:45) )
   )

Any thoughts on other places I might look to find out why it changed.
 Running a 10053 event trace seems a little too late at this point because
the correct plan is again being chosen.  This is the first time this query
ran today.


Thanks.

Other related posts: