Re: Bad Execution Plan with Left Join and predicates in both tables

  • From: Lothar Flatz <l.flatz@xxxxxxxxxx>
  • To: thomas.aregger@xxxxxxxxx, oracle-l@xxxxxxxxxxxxx
  • Date: Wed, 12 Oct 2016 06:21:30 +0200

Hi Thomas,

I am afraid the two queries are not identical.
A left join means that for any customer i will have a result, wether or not a detail exists.
For that reason the plan needs to start with customer. An outer join always determines a direction.
Think about what your first query means:
I want to see *all *customers that have an last order of x or a detail with a last order of x. Which is kind of a contradiction. You don't want to see all customers.
Your second means:
I want to see customers that have an last order of x or a detail with a last order of x.
That is clearer and probably reflect what you really want.

I accept that the difference is subtle and I hope I could make it understandable enough.
It helps to look at it from the point of three valued logic.
An outer join to customer detail, if the detail does not exists would evaluate to null for cd.last_order. Null = 443867 does not evaluate to false but to null, which is just treated as false.
A small but important difference.

Bottom line:
You should probably stick with the second variation of your query.

Regards

Lothar


On 12.10.2016 00:29, Thomas Aregger wrote:

Hello

I have a simple query (running on Oracle 12.1.0.2) with a left join and a where clause checking that either a field in the left table is set or a field in the right table is set.

That's how the query looks like:

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID <http://C.ID> = CD.ID <http://CD.ID>
 WHERE (C.LAST_ORDER = 443867 OR CD.LAST_ORDER = 443867)
   AND C.STATUS <> 1
/

For some reason Oracle is not able to avoid a full table scan on CUSTOMER, which can be seen in the following execution plan (line 4):

----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4780 (100)| 1 |00:00:01.10 | 33302 |
| 1 | CONCATENATION | | 1 | | | 1 |00:00:01.10 | 33302 |
|* 2 | FILTER | | 1 | | | 0 |00:00:01.10 | 33295 |
| 3 | NESTED LOOPS OUTER | | 1 | 253K| 4778 (1)| 255K|00:00:01.06 | 33295 |
|* 4 | TABLE ACCESS FULL | CUSTOMER | 1 | 253K| 4776 (1)| 255K|00:00:00.32 | 16692 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL | 255K| 1 | 1 (0)| 3481 |00:00:00.53 | 16603 |
|* 6 | INDEX RANGE SCAN | I_CUSTDET_ID | 255K| 1 | 1 (0)| 3481 |00:00:00.34 | 15318 |
|* 7 | FILTER | | 1 | | | 1 |00:00:00.01 | 7 |
| 8 | NESTED LOOPS OUTER | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 7 |
|* 9 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 5 |
|* 10 | INDEX RANGE SCAN | I_CUST_LAST_ORDER | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 4 |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 12 | INDEX RANGE SCAN | I_CUSTDET_ID | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CD"."LAST_ORDER"=443867)
   4 - filter("C"."STATUS"<>3)
   6 - access("C"."ID"="CD"."ID")
   7 - filter(LNNVL("CD"."LAST_ORDER"=443867))
   9 - filter("C"."STATUS"<>3)
  10 - access("C"."LAST_ORDER"=443867)
  12 - access("C"."ID"="CD"."ID")


In the first half of the plan (under the first FILTER operation) I would expect Oracle to use the index on CUSTOMER_DETAIL.LAST_ORDER to access CUSTOMER_DETAIL and then join to the CUSTOMER table. For some reason Oracle is not doing that.

I rewrote the query a little bit:

SELECT *
  FROM CUSTOMER C
  LEFT JOIN CUSTOMER_DETAIL CD
    ON C.ID <http://C.ID> = CD.ID <http://CD.ID>
 WHERE (C.LAST_ORDER = 443867)
   AND C.STATUS <> 3
UNION
SELECT *
  FROM CUSTOMER C
  JOIN CUSTOMER_DETAIL CD
    ON C.ID <http://C.ID> = CD.ID <http://CD.ID>
 WHERE (CD.LAST_ORDER = 443867)
   AND C.STATUS <> 3
/


The rewritten query leads to the following execution plan:

------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 6 (100)| 1 |00:00:00.01 | 8 |
| 1 | SORT UNIQUE | | 1 | 2 | 6 (34)| 1 |00:00:00.01 | 8 |
| 2 | UNION-ALL | | 1 | | | 1 |00:00:00.01 | 8 |
| 3 | NESTED LOOPS OUTER | | 1 | 1 | 2 (0)| 1 |00:00:00.01 | 6 |
|* 4 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMER | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 4 |
|* 5 | INDEX RANGE SCAN | I_CUST_LAST_ORDER | 1 | 1 | 1 (0)| 1 |00:00:00.01 | 3 |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED | CUSTOMER_DETAIL | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 7 | INDEX RANGE SCAN | I_CUSTDET_ID | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
| 8 | NESTED LOOPS | | 1 | 1 | 2 (0)| 0 |00:00:00.01 | 2 |
| 9 | NESTED LOOPS | | 1 | 1 | 2 (0)| 0 |00:00:00.01 | 2 |
| 10 | TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER_DETAIL | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 11 | INDEX RANGE SCAN | I_CUSTDET_LAST_ORDER | 1 | 1 | 1 (0)| 0 |00:00:00.01 | 2 |
|* 12 | INDEX UNIQUE SCAN | I_CUST_ID | 0 | 1 | 1 (0)| 0 |00:00:00.01 | 0 |
|* 13 | TABLE ACCESS BY INDEX ROWID | CUSTOMER | 0 | 1 | 1 (0)| 0 |00:00:00.01 | 0 |
------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("C"."STATUS"<>3)
   5 - access("C"."LAST_ORDER"=443867)
   7 - access("C"."ID"="CD"."ID")
  11 - access("CD"."LAST_ORDER"=443867)
  12 - access("C"."ID"="CD"."ID")
  13 - filter("C"."STATUS"<>3)


The second plan is obviously a lot more efficient. Am I missing something, or is it just an opimizer limitation in the first version of the query?

Regards
Thomas


--




Other related posts: