Re: How does the FILTER operation work?

From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
To: Oracle Mailinglist <oracle-l@xxxxxxxxxxxxx>
Date: Thu, 24 May 2018 12:32:34 +0000

The "not exists" subquery is correlated - it may have to run once for each row
in photo, which is why the final cost of the query is roughly 21,211 * 380.  (I
don't know why it's not a lot closer, but there are indications that the
optimizer has a "self-caching" adjustment in the calculation in recent versions
of Oracle.)

It looks as if you have a uniqueness constraint on emplid since your "distinct"
hasn't shown up as a "/hash distinct" at any point in the plan.

The speed of execution may also be affected by the fact that your tablescan has
to scan a table which is growing because of the insert, so you may be applying
a lot of undo as you go so that you don't see the changes you have previously
made in the insert.

I can't explain the difference in the cost of the tablescan; is there anything
in the slightest bit "non-standard" about the table ?  (e.g. includes a LOB,
nested-table, has more than 255 columns).

For reference here's an old article I wrote about the effect of filter
subqueries - I don't think the workaround will help you though (that's assuming
my comment about uniqueness is correct). You might want try forcing the
optimizer to unnest the subquery (using the /*+ unnest */ hint in the
subquery), but the table definitions may make it illegal.

Regards
Jonathan Lewis

________________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx <oracle-l-bounce@xxxxxxxxxxxxx> on behalf
of Paul Houghton <Paul.Houghton@xxxxxxxxxxxxx>
Sent: 24 May 2018 13:06
To: Oracle Mailinglist
Subject: How does the FILTER operation work?

Oracle 12.1.0.2. Linux x86_64.

I was asked why the following SQL was running for so long (~6 minutes). I can
see a number of problems with it, and suggested that a way to tune it would be
only to select the required data. For my own interest I want to understand why
the cost of the insert is so high compared to the cost of the table scans.

If I remove the "not exists" (see below), the access of the temp table
disappears (as you would expect) and the insert cost goes right down, so it
must be the filter that is the issue. However, I don't understand why it is so
expensive. I can't find documentation about what it is doing. Surely it has to
compare 21211 rows against the 1 row (Actually 0, but I understand the
optimizer never says 0) returned by the not exists sub query. I would have
thought this means that the cost should be more like the cost of accessing the
data in the first place? Clearly the optimizer is correct as the query does
take a long time to run.

Also, why is the cost of the full table scan of the photo table so different
between the two queries?

The queries are below. I edited the table names, (There are two, photo and
temp) and the id, hopefully consistently.

Photo.photo is a blob. Emplid and emplid_search are varchar2(11). Oprid is
varchar2(30). TEMP has no indexes. Photo has an index on emplid.

Thanks in advance for your insights.

  INSERT INTO TEMP
  SELECT DISTINCT 'id',
    PHT.emplid
  FROM PHOTO PHT
  WHERE 1               =1
  AND PHT.PHOTO IS NOT NULL
  AND NOT EXISTS
    (SELECT 'x'
    FROM TEMP TMP
    WHERE PHT.EMPLID=TMP.EMPLID_SRCH
    AND TMP.OPRID  = 'id'
    )
  ;


-------------------------------------------------------------------------------------
  | Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)|
Time     |

-------------------------------------------------------------------------------------
  |   0 | INSERT STATEMENT         |          | 21210 |  3334K|  5802K  (2)|
00:03:47 |
  |   1 |  LOAD TABLE CONVENTIONAL | TEMP     |       |       |            |
      |
  |*  2 |   FILTER                 |          |       |       |            |
      |
  |*  3 |    TABLE ACCESS FULL     | PHOTO    | 21211 |  3334K|   313   (1)|
00:00:01 |
  |*  4 |    TABLE ACCESS FULL     | TEMP     |     1 |    17 |   380   (2)|
00:00:01 |

-------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):
  ---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT 0 FROM " TEMP" "TMP" WHERE
              "TMP"."EMPLID_SRCH"=:B1 AND "TMP"."OPRID"='id'))
   3 - filter("PHT"."PHOTO" IS NOT NULL)
   4 - filter("TMP"."EMPLID_SRCH"=:B1 AND "TMP"."OPRID"='id')

...

  INSERT INTO TEMP
  SELECT DISTINCT 'id',
    PHT.emplid
  FROM PHOTO PHT
  WHERE 1               =1
  AND PHT.UC_PBI_PHOTO IS NOT NULL;


-------------------------------------------------------------------------------------
  | Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)|
Time     |

-------------------------------------------------------------------------------------
  |   0 | INSERT STATEMENT         |          | 21211 |  3334K|  3659   (1)|
00:00:01 |
  |   1 |  LOAD TABLE CONVENTIONAL | TEMP     |       |       |            |
      |
  |*  2 |   TABLE ACCESS FULL      | PHOTO    | 21211 |  3334K|  3659   (1)|
00:00:01 |

-------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):
  ---------------------------------------------------

     2 - filter("PHT"."PHOTO" IS NOT NULL)
--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l

References:
- How does the FILTER operation work?
  - From: Paul Houghton

Re: How does the FILTER operation work?

Other related posts: