RE: Enforcing SQL execution plan

  • From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
  • To: "fmhabash@xxxxxxxxx" <fmhabash@xxxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 2 Sep 2016 19:01:49 +0000



If you check through the plan you'll see that it's a concatenation - i.e. 
executed as two disjoint subqueries; moreover the concatenation is based on the 
"or_expand_nvl_predicate" mechanism so one subquery is protected by ":bind is 
null" and the other by ":bind is not null" - which means only one of the 
subqueries will actually run on any specific execution.

When the query runs quickly only the first subquery runs - and the first 
subquery plan uses a suitable index
When the query runs slowly only the second subquery runs - and the second 
subquery plan specifies a very bad choice of index



Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: fmhabash@xxxxxxxxx [fmhabash@xxxxxxxxx]
Sent: 02 September 2016 20:02
To: Jonathan Lewis; oracle-l@xxxxxxxxxxxxx
Subject: RE: Enforcing SQL execution plan

I saw that for the bad run, but I saw none for the good one. Hence, the inquiry.

Isn’t comparing the breakdown of the ET between the 2 runs a helpful direction 
in that it is going to hint for a potential cause for why an ET changes when 
the xplan does  not. It could be an increase in DBSF as a result of undo reads, 
session contention ‘read by other session’, slow disk IO, changes in CPU 
(internal to external to the DB), etc.


This is how I resolved similar issues where xplans do not change, but ET does …
Query with Unchanged Execution Plan is Slower than Previously (Doc ID 223806.1)
Why Is My Query Sometimes Slower Than Other Times with Higher Consistent Gets 
Although No Change in Execution Plan? (Doc ID 1558349.1)

This is based on Hameed’s assertion that the xplan did not change, just the ET 
did.

And I’m also trying to reconcile your statement with Hameed’s assertion above …

‘The problem appears to be an awful choice of index’. How would this happen 
without the xplan changing?

Also, Chris kept referring to a ‘bad plan’. Again, it is not, instead, it is a 
bad run.

This difference will dictate a different troubleshooting path.

Feedback appreciated

Thanks


----------------------------------------
Thank you

From: Jonathan Lewis<mailto:jonathan@xxxxxxxxxxxxxxxxxx>
Sent: Friday, September 2, 2016 2:34 PM
To: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: RE: Enforcing SQL execution plan


There shouldn't really be any need to ask that question since the tkprof 
summary of the trace file from a bad run is a few lines further down the email.

The problem appears to be an awful choice of index for the second half of a 
concatenation that is (presumably) only triggered occasionally, and perhaps 
requires an out of range (above high value) bind variable to cause it to be 
considered a good choice.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle

From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on behalf 
of fmhabash@xxxxxxxxx [fmhabash@xxxxxxxxx]
Sent: 02 September 2016 18:49
To: Hameed, Amir; oracle-l@xxxxxxxxxxxxx
Subject: RE: Enforcing SQL execution plan
So, how do the 2 runs compare when broken down by wait event class? What class 
is contributing the most to the bad run?


select st.SNAP_ID, st.sql_id, st.plan_hash_value, ss.begin_interval_time 
,EXECUTIONS_DELTA execs

, IOWAIT_DELTA

, CLWAIT_TOTAL

, CCWAIT_DELTA

, APWAIT_DELTA

, CPU_TIME_DELTA

, ELAPSED_TIME_DELTA

, round((IOWAIT_DELTA/ELAPSED_TIME_DELTA)*100,2) io_wt_perc

, round((CLWAIT_TOTAL/ELAPSED_TIME_DELTA)*100,2) clust_wt_perc

, round((CCWAIT_DELTA/ELAPSED_TIME_DELTA)*100,2) concur_wt_perc

, round((APWAIT_DELTA/ELAPSED_TIME_DELTA)*100,2) app_wt_perc

, round((CPU_TIME_DELTA/ELAPSED_TIME_DELTA)*100) cpu_wt_perc

from dba_hist_sqlstat st, dba_hist_snapshot ss

where

ss.snap_id = st.snap_id

and st.instance_number = (select instance_number from v$instance )

and EXECUTIONS_DELTA > 0

--and sql_id = '&sql_id'

and rownum < 1000

and begin_interval_time >= sysdate - 7

order by snap_id

;



----------------------------------------
Thank you

From: Hameed, Amir<mailto:Amir.Hameed@xxxxxxxxx>
Sent: Friday, September 2, 2016 1:17 PM
To: fmhabash@xxxxxxxxx<mailto:fmhabash@xxxxxxxxx>; 
oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: RE: Enforcing SQL execution plan

Thanks
The problem is that the plan is the same for the good run and the bad run, 
which might mean that because of bind-peeking, the skewing of data could 
potentially be causing the issue.

From: fmhabash@xxxxxxxxx [mailto:fmhabash@xxxxxxxxx]
Sent: Friday, September 02, 2016 1:12 PM
To: Hameed, Amir <Amir.Hameed@xxxxxxxxx>; oracle-l@xxxxxxxxxxxxx
Subject: RE: Enforcing SQL execution plan

Hameed …

From your description, it looks like this job did run acceptably in the past. 
If this is the case, why not grab one of the known good xplans and use it to 
create a baseline.

We stabilized most our EBS critical tasks using plan baselines. However, for 
some SQL, the baselines are becoming UNREPRODUCIBLE. Seeking MOS support, we 
are advised to apply 21463894.

Hope this helps as well.

----------------------------------------
Thank you

From: Hameed, Amir<mailto:Amir.Hameed@xxxxxxxxx>
Sent: Thursday, September 1, 2016 9:13 PM
To: oracle-l@xxxxxxxxxxxxx<mailto:oracle-l@xxxxxxxxxxxxx>
Subject: Enforcing SQL execution plan

Hi,
We run a certain job in the Oracle E-Business Suite (EBS) through a concurrent 
program. The environment information is listed below:
DB version – 11.2.0.4
EBS version – 11.5.10.2
OS – Solaris

The statement uses bind variables that are supplied as part of running the 
concurrent program. As far as I can tell, the statement always produces the 
following execution plan:

---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                        | Starts 
| E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                             |     17 
|        |       |    12 (100)|          |     17 |00:00:00.01 |     145 |
|   1 |  CONCATENATION                  |                             |     17 
|        |       |            |          |     17 |00:00:00.01 |     145 |
|*  2 |   FILTER                        |                             |     17 
|        |       |            |          |     17 |00:00:00.01 |     145 |
|   3 |    NESTED LOOPS                 |                             |     17 
|      1 |   197 |     6   (0)| 00:00:01 |     17 |00:00:00.01 |     145 |
|*  4 |     TABLE ACCESS BY INDEX ROWID | WSH_DELIVERY_DETAILS        |     17 
|      1 |   184 |     3   (0)| 00:00:01 |     17 |00:00:00.01 |      73 |
|*  5 |      INDEX UNIQUE SCAN          | WSH_DELIVERY_DETAILS_U1     |     17 
|      1 |       |     2   (0)| 00:00:01 |     17 |00:00:00.01 |      51 |
|*  6 |     TABLE ACCESS BY INDEX ROWID | WSH_DELIVERY_ASSIGNMENTS    |     17 
|      1 |    13 |     3   (0)| 00:00:01 |     17 |00:00:00.01 |      72 |
|*  7 |      INDEX RANGE SCAN           | WSH_DELIVERY_ASSIGNMENTS_N3 |     17 
|      1 |       |     2   (0)| 00:00:01 |     17 |00:00:00.01 |      53 |
|*  8 |   FILTER                        |                             |      0 
|        |       |            |          |      0 |00:00:00.01 |       0 |
|   9 |    NESTED LOOPS                 |                             |      0 
|      1 |   197 |     6   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|  10 |     NESTED LOOPS                |                             |      0 
|      1 |   197 |     6   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 11 |      TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_DETAILS        |      0 
|      1 |   184 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 12 |       INDEX RANGE SCAN          | WSH_DELIVERY_DETAILS_N1     |      0 
|      1 |       |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 13 |      INDEX RANGE SCAN           | WSH_DELIVERY_ASSIGNMENTS_N3 |      0 
|      1 |       |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 14 |     TABLE ACCESS BY INDEX ROWID | WSH_DELIVERY_ASSIGNMENTS    |      0 
|      1 |    13 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
---------------------------------------------------------------------------------------------------------------------------------------------------------

For the past few weeks, the concurrent program has been intermittently taking 
time to complete. When traced, one of the “bad” runs produced the following 
information:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    614.23   11147.25     941317   18000438          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    614.23   11147.25     941317   18000438          0           1

Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                    941317        1.00       3563.29
  Disk file operations I/O                       74        0.00          0.00
  gc current grant busy                           3        0.00          0.00
  gc cr grant 2-way                              44        0.08          0.09
  read by other session                     1706610        1.49       6482.34
  gc buffer busy acquire                      83710        1.00        431.39
  latch: cache buffers chains                173199        0.01          9.25

(only the main wait events are shown above)

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  
---------------------------------------------------
         1          1          1  CONCATENATION  (cr=18000438 pr=941317 pw=0 
time=2557323227 us)
         0          0          0   FILTER  (cr=0 pr=0 pw=0 time=2 us)
         0          0          0    NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us 
cost=6 size=197 card=1)
         0          0          0     TABLE ACCESS BY INDEX ROWID 
WSH_DELIVERY_DETAILS (cr=0 pr=0 pw=0 time=0 us cost=3 size=184 card=1)
         0          0          0      INDEX UNIQUE SCAN WSH_DELIVERY_DETAILS_U1 
(cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)(object id 165859)
         0          0          0     TABLE ACCESS BY INDEX ROWID 
WSH_DELIVERY_ASSIGNMENTS (cr=0 pr=0 pw=0 time=0 us cost=3 size=13 card=1)
         0          0          0      INDEX RANGE SCAN 
WSH_DELIVERY_ASSIGNMENTS_N3 (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 
card=1)(object id 165871)
         1          1          1   FILTER  (cr=18000438 pr=941317 pw=0 
time=2557323217 us)
         1          1          1    NESTED LOOPS  (cr=18000438 pr=941317 pw=0 
time=2557323212 us cost=6 size=197 card=1)
         1          1          1     NESTED LOOPS  (cr=18000361 pr=941296 pw=0 
time=2557119556 us cost=6 size=197 card=1)
         1          1          1      TABLE ACCESS BY INDEX ROWID 
WSH_DELIVERY_DETAILS (cr=18000336 pr=941295 pw=0 time=2557112429 us cost=3 
size=184 card=1)
  42599164   42599164   42599164       INDEX RANGE SCAN WSH_DELIVERY_DETAILS_N1 
(cr=1798095 pr=99233 pw=0 time=600382285 us cost=3 size=0 card=1)(object id 
917318)
         1          1          1      INDEX RANGE SCAN 
WSH_DELIVERY_ASSIGNMENTS_N3 (cr=25 pr=1 pw=0 time=7116 us cost=2 size=0 
card=1)(object id 165871)
         1          1          1     TABLE ACCESS BY INDEX ROWID 
WSH_DELIVERY_ASSIGNMENTS (cr=77 pr=21 pw=0 time=203643 us cost=3 size=13 card=1)


It seems that when it runs longer, it scans a lot of rows from index 
WSH_DELIVERY_DETAILS_N1. This index is defined on a date column and is not very 
selective.

As part of the troubleshooting exercise, I captured bind variables of the bad 
run from DBA_HIST_SQLBIND, defined them as variables with appropriate data type 
and then ran them through SQL*Plus. The problem that I am facing is that from 
SQL*Plus, by using this technique, it always produces the following plan, 
regardless of whether I use values from the good run or the bad run:
--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                        | Starts | 
E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                             |      1 | 
       |       |    12 (100)|          |      0 |00:00:00.01 |       4 |
|   1 |  CONCATENATION                 |                             |      1 | 
       |       |            |          |      0 |00:00:00.01 |       4 |
|*  2 |   FILTER                       |                             |      1 | 
       |       |            |          |      0 |00:00:00.01 |       0 |
|   3 |    NESTED LOOPS                |                             |      0 | 
     1 |   197 |     6   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_DETAILS        |      0 | 
     1 |   184 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  5 |      INDEX UNIQUE SCAN         | WSH_DELIVERY_DETAILS_U1     |      0 | 
     1 |       |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  6 |     TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_ASSIGNMENTS    |      0 | 
     1 |    13 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  7 |      INDEX RANGE SCAN          | WSH_DELIVERY_ASSIGNMENTS_N3 |      0 | 
     1 |       |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|*  8 |   FILTER                       |                             |      1 | 
       |       |            |          |      0 |00:00:00.01 |       4 |
|   9 |    NESTED LOOPS                |                             |      1 | 
     1 |   197 |     6   (0)| 00:00:01 |      0 |00:00:00.01 |       4 |
|* 10 |     TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_DETAILS        |      1 | 
     1 |   184 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       4 |
|* 11 |      INDEX UNIQUE SCAN         | WSH_DELIVERY_DETAILS_U1     |      1 | 
     1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 |
|* 12 |     TABLE ACCESS BY INDEX ROWID| WSH_DELIVERY_ASSIGNMENTS    |      0 | 
     1 |    13 |     3   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
|* 13 |      INDEX RANGE SCAN          | WSH_DELIVERY_ASSIGNMENTS_N3 |      0 | 
     1 |       |     2   (0)| 00:00:01 |      0 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------------------------------------------------------------

I have tried to capture hints through outline from the bad run and added them 
as hints to the statement that I am running from SQL*Plus but I am still not 
able to enforce the “bad” plan. The outline hints are shown below:

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      OPT_PARAM('_b_tree_bitmap_plans' 'false')
      OPT_PARAM('_fast_full_scan_enabled' 'false')
      OPT_PARAM('_index_join_enabled' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$1_1")
      USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(6) PREDICATE_REORDERS((23 22) (24 23) 
(26 24) (22 26)))
      OUTLINE_LEAF(@"SEL$1_2")
      OUTLINE(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1_1" "WDD"@"SEL$1" 
("WSH_DELIVERY_DETAILS"."DELIVERY_DETAIL_ID"))
      INDEX_RS_ASC(@"SEL$1_1" "WDA"@"SEL$1" 
("WSH_DELIVERY_ASSIGNMENTS"."DELIVERY_DETAIL_ID"))
      INDEX_RS_ASC(@"SEL$1_2" "WDD"@"SEL$1_2" 
("WSH_DELIVERY_DETAILS"."DATE_SCHEDULED"))
      INDEX(@"SEL$1_2" "WDA"@"SEL$1_2" 
("WSH_DELIVERY_ASSIGNMENTS"."DELIVERY_DETAIL_ID"))
      LEADING(@"SEL$1_1" "WDD"@"SEL$1" "WDA"@"SEL$1")
      LEADING(@"SEL$1_2" "WDD"@"SEL$1_2" "WDA"@"SEL$1_2")
      USE_NL(@"SEL$1_1" "WDA"@"SEL$1")
      USE_NL(@"SEL$1_2" "WDA"@"SEL$1_2")
      NLJ_BATCHING(@"SEL$1_2" "WDA"@"SEL$1_2")
      END_OUTLINE_DATA
  */

The SQL statement is also pasted below:
SELECT WDD.SOURCE_LINE_ID, WDD.SOURCE_HEADER_ID, WDD.INVENTORY_ITEM_ID,
  WDD.SHIPMENT_PRIORITY_CODE, WDD.SOURCE_CODE SOURCE_CODE,
  WDD.SOURCE_HEADER_NUMBER SOURCE_HEADER_NUMBER, WDD.ORGANIZATION_ID,
  WDD.MOVE_ORDER_LINE_ID, WDD.SHIP_FROM_LOCATION_ID, WDD.SHIP_METHOD_CODE,
  WDD.SHIPMENT_PRIORITY_CODE, WDD.DATE_SCHEDULED, WDD.REQUESTED_QUANTITY,
  WDD.REQUESTED_QUANTITY_UOM, WDD.PREFERRED_GRADE, WDD.REQUESTED_QUANTITY2,
  WDD.REQUESTED_QUANTITY_UOM2, WDD.PROJECT_ID, WDD.TASK_ID, WDD.SUBINVENTORY,
  WDD.SUBINVENTORY, WDD.RELEASED_STATUS, WDD.SHIP_MODEL_COMPLETE_FLAG,
  WDD.TOP_MODEL_LINE_ID, WDD.SHIP_SET_ID, WDA.DELIVERY_ID,
  WDD.LAST_UPDATE_DATE
FROM
WSH_DELIVERY_DETAILS WDD, WSH_DELIVERY_ASSIGNMENTS WDA WHERE
  WDD.DELIVERY_DETAIL_ID = :B15 AND WDA.DELIVERY_DETAIL_ID =
  WDD.DELIVERY_DETAIL_ID AND WDD.DATE_SCHEDULED IS NOT NULL AND
  WDD.DATE_REQUESTED >= NVL(:B14 , WDD.DATE_REQUESTED) AND WDD.DATE_REQUESTED
  <= NVL(:B13 , WDD.DATE_REQUESTED) AND WDD.DATE_SCHEDULED >= NVL(:B12 ,
  WDD.DATE_SCHEDULED) AND WDD.DATE_SCHEDULED <= NVL(:B11 , WDD.DATE_SCHEDULED)
   AND NVL(WDD.REQUESTED_QUANTITY,0) > 0 AND WDD.RELEASED_STATUS IN ('R','B',
  'X') AND NVL(WDD.SUBINVENTORY, -99) = DECODE(:B10 , NULL,
  NVL(WDD.SUBINVENTORY, -99), :B10 ) AND NVL(WDD.PROJECT_ID,0) = DECODE(:B9 ,
  0, NVL(WDD.PROJECT_ID,0), :B9 ) AND NVL(WDD.TASK_ID,0) = DECODE(:B8 , 0,
  NVL(WDD.TASK_ID,0), :B8 ) AND NVL(WDD.SHIP_SET_ID,0) = DECODE(:B7 , 0,
  NVL(WDD.SHIP_SET_ID,0), :B7 ) AND NVL(WDD.SHIPMENT_PRIORITY_CODE,-99) =
  DECODE(:B6 , NULL, NVL(WDD.SHIPMENT_PRIORITY_CODE,-99), :B6 ) AND
  WDD.ORGANIZATION_ID = NVL(:B5 , WDD.ORGANIZATION_ID) AND
  WDD.SHIP_FROM_LOCATION_ID = DECODE(:B4 , -1, WDD.SHIP_FROM_LOCATION_ID, :B4
  ) AND (( WDA.DELIVERY_ID IS NOT NULL AND ( :B3 <> 'N' OR WDA.DELIVERY_ID =
  :B2 OR :B1 <> 0 ) ) OR ( WDA.DELIVERY_ID IS NULL AND :B2 = 0 AND :B1 = 0 ) )

How can I enforce the bad execution plan from SQL*Plus? If I am able to enforce 
it then I already have bind values from that run and I will be able to 
reproduce the issue.

Thanks,
Amir




PNG image

Other related posts: