Re: Why index access is not happening in parallel

  • From: yudhi s <learnerdatabase99@xxxxxxxxx>
  • To: Pap <oracle.developer35@xxxxxxxxx>
  • Date: Sun, 25 Jun 2023 20:16:46 +0530

As Lothar also mentioned , you may see the parallelism for the index access
path along for your INSERT once you set the enable_parallel_dml hint for
that INSERT query. However I can't think of a reason why your standalone
"select" query is not doing the Index scan in parallel?

Tried with the exact same setup as I posted in my earlier example and I
have added Decode and a SUM function along with group by and order by
clause as its there in your query, still I see it's doing parallel
processing for the index range scan and table access by index rowid. See
below plan..

Do you see anything special in your outline section of the SELECT query
plan?

explain plan for select /*+parallel(10)  */ decode(object_name, 'abc','D',
'E'), object_name, sum(object_id)
  from x where object_id=123
  group by decode(object_name, 'abc','D', 'E'), object_name order by
object_name;


select plan_table_output  from
table(dbms_xplan.display('plan_table',null,'ADVANCED'));

Plan hash value: 1269364911

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name     | Rows  |
Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |          |     1 |
 30 |     5  (20)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                           |          |       |
  |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)                      | :TQ10002 |     1 |
 30 |     5  (20)| 00:00:01 |  Q1,02 | P->S | QC (ORDER) |
|   3 |    SORT GROUP BY                          |          |     1 |
 30 |     5  (20)| 00:00:01 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE                            |          |     1 |
 30 |     5  (20)| 00:00:01 |  Q1,02 | PCWP |            |
|   5 |      PX SEND RANGE                        | :TQ10001 |     1 |
 30 |     5  (20)| 00:00:01 |  Q1,01 | P->P | RANGE      |
|   6 |       HASH GROUP BY                       |          |     1 |
 30 |     5  (20)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        TABLE ACCESS BY INDEX ROWID BATCHED| X        |     1 |
 30 |     4   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   8 |         PX RECEIVE                        |          |     1 |
  |     3   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   9 |          PX SEND HASH (BLOCK ADDRESS)     | :TQ10000 |     1 |
  |     3   (0)| 00:00:01 |  Q1,00 | S->P | HASH (BLOCK|
|  10 |           PX SELECTOR                     |          |       |
  |            |          |  Q1,00 | SCWC |            |
|* 11 |            INDEX RANGE SCAN               | IDX11    |     1 |
  |     3   (0)| 00:00:01 |  Q1,00 | SCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   7 - SEL$1 / X@SEL$1
  11 - SEL$1 / X@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      USE_HASH_GBY_FOR_PUSHDOWN(@"SEL$1")
      GBY_PUSHDOWN(@"SEL$1")
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$1" "X"@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "X"@"SEL$1" ("X"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
      OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
      OPT_PARAM('_px_adaptive_dist_method' 'off')
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

  11 - access("OBJECT_ID"=123)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128], DECODE("OBJECT_NAME",'abc','D','E')[1],
SUM()[22]
   2 - (#keys=0) "OBJECT_NAME"[VARCHAR2,128],
DECODE("OBJECT_NAME",'abc','D','E')[1], SUM()[22]
   3 - (#keys=2; rowset=256) "OBJECT_NAME"[VARCHAR2,128],
DECODE("OBJECT_NAME",'abc','D','E')[1], SUM()[22]
   4 - (rowset=256) "OBJECT_NAME"[VARCHAR2,128],
DECODE("OBJECT_NAME",'abc','D','E')[1], SYS_OP_MSR()[25]
   5 - (#keys=2) "OBJECT_NAME"[VARCHAR2,128],
DECODE("OBJECT_NAME",'abc','D','E')[1], SYS_OP_MSR()[25]
   6 - (#keys=2; rowset=256) "OBJECT_NAME"[VARCHAR2,128],
DECODE("OBJECT_NAME",'abc','D','E')[1], SYS_OP_MSR()[25]
   7 - "OBJECT_ID"[NUMBER,22], "OBJECT_NAME"[VARCHAR2,128]
   8 - "X".ROWID[ROWID,10], "OBJECT_ID"[NUMBER,22]
   9 - (#keys=2) "X".ROWID[ROWID,10], "OBJECT_ID"[NUMBER,22]
  10 - "X".ROWID[ROWID,10], "OBJECT_ID"[NUMBER,22]
  11 - "X".ROWID[ROWID,10], "OBJECT_ID"[NUMBER,22]

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  parallel(10)

Note
-----
   - Degree of Parallelism is 10 because of hint

Query Block Registry:
---------------------

  <q o="2"
f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[X]]></t><s><![CDATA[SEL$1]]></s></h></f></q>

On Sun, Jun 25, 2023 at 2:00 PM Pap <oracle.developer35@xxxxxxxxx> wrote:

Below is the Note section of the INSERT and SELECT query. Both the tables
used in the SELECT query are Global temporary tables , but I believe that
should not cause this oddity in the parallel execution plan behaviour. and
Also "Select" query won't need any such parameter as 'enable parallel dml'
to be set to the execution to happen in parallel. Correct me if I'm wrong.
So, I am still unable to understand, why don't we have Index access being
parallelized here and how could we fix that?


*Insert query:- *
 Note
-----
   - PDML is disabled in current session
   - parallel query server generated this plan using optimizer hints from
coordinator


*Select query:- *
   Note
-----
   - dynamic statistics used: dynamic sampling (level=5)
   - Degree of Parallelism is 4 because of hint
   - Global temporary table session private statistics used

Also Lothar as you pointed out few key things, i was trying to relook into
the sql monitoring reports of both of my posted queries:-

In the select query too the "parallel execution" section of the sql
monitoring report(as shown one below and detail plan is here :-
https://gist.github.com/oracle9999/ff2073c222398416c8095d109c233765) , It
shows majority of time being spent on the "PX Coordinator" step(same as
INSERT query case). However if i see the  "SQL Plan Monitoring Details"
section it shows top activity is on the Index access and table access
section of TAB1. Whereas in case of the sql monitoring report of the
"INSERT query" both the sections were showing top contributors as "PX
Coordinator". Isn't this discrepancy?

 and thus I was thinking the figures which were collected in the
gv$active_session_history would be reliable. And that was showing the
majority of the samples were spending time on the "index access" and "table
access by index rowid" part of the plan in both the Insert and select query
cases. Correct me if I'm wrong.


Parallel Execution Details (DOP=8 , Servers Allocated=8)

=======================================================================================================================================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |
Application | Concurrency | Cluster  | Buffer | Read | Read  |
       Wait Events                     |
|                |       |         | Time(s) | Time(s) | Waits(s) |
 Waits(s)   |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes |
        (sample #)                     |

=======================================================================================================================================================================================
| PX Coordinator | QC    |         |     849 |     218 |      533 |
 2.41 |        0.11 |       96 |     1M |   1M |   9GB | enq: RO - fast
object reuse (2)                    |
|                |       |         |         |         |          |
      |             |          |        |      |       | gc cr grant 2-way
(25)                             |
|                |       |         |         |         |          |
      |             |          |        |      |       | gc cr multi block
request (1)                      |
|                |       |         |         |         |          |
      |             |          |        |      |       | gc cr request (1)
                                 |
|                |       |         |         |         |          |
      |             |          |        |      |       | gc current grant
2-way (23)                        |
|                |       |         |         |         |          |
      |             |          |        |      |       | gc current request
(1)                             |
|                |       |         |         |         |          |
      |             |          |        |      |       | cell list of
blocks physical read (3)              |
|                |       |         |         |         |          |
      |             |          |        |      |       | cell multiblock
physical read (5)                  |
|                |       |         |         |         |          |
      |             |          |        |      |       | cell single block
physical read (114)              |
|                |       |         |         |         |          |
      |             |          |        |      |       | cell single block
physical read: flash cache (424) |
|                |       |         |         |         |          |
      |             |          |        |      |       | cell single block
read request (3)                 |
| p000           | Set 1 |       1 |    0.01 |    0.00 |     0.00 |
      |             |          |     12 |    5 | 40960 |
                                 |
| p001           | Set 1 |       2 |    0.01 |    0.00 |     0.00 |
      |             |     0.00 |     12 |    4 | 32768 |
                                 |
| p002           | Set 1 |       3 |    0.00 |    0.00 |     0.00 |
      |        0.00 |          |      9 |    3 | 24576 |
                                 |
| p003           | Set 1 |       4 |    0.01 |    0.00 |     0.00 |
      |        0.00 |     0.00 |     12 |    4 | 32768 |
                                 |
| p004           | Set 1 |       5 |    0.01 |    0.00 |     0.00 |
      |        0.00 |          |     12 |    4 | 32768 |
                                 |
| p005           | Set 1 |       6 |    0.01 |    0.00 |     0.00 |
      |             |     0.00 |     12 |    4 | 32768 |
                                 |
| p006           | Set 1 |       7 |    0.01 |    0.00 |     0.00 |
      |             |          |     12 |    4 | 32768 |
                                 |
| p007           | Set 1 |       8 |    0.01 |    0.00 |     0.00 |
      |             |          |     12 |    4 | 32768 |
                                 |

=======================================================================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2978234190)

======================================================================================================================================================================================================================
| Id |                Operation                |        Name        |
 Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |
 Mem  | Activity |                  Activity Detail                   |
|    |                                         |                    |
(Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |
(Max) |   (%)    |                    (# samples)                     |

======================================================================================================================================================================================================================
|  0 | SELECT STATEMENT                        |                    |
    |       |       746 |     +4 |     1 |       23 |      |       |     .
|          |                                                    |
|  1 |   SORT GROUP BY                         |                    |
 33 |    2M |       746 |     +4 |     1 |       23 |      |       | 10240
|          |                                                    |
|  2 |    NESTED LOOPS                         |                    |
28007 |    2M |       746 |     +4 |     1 |    42428 |      |       |
. |          |                                                    |
|  3 |     PX COORDINATOR                      |                    |
    |       |       749 |     +1 |     9 |      699 |      |       |     .
|     0.27 | enq: RO - fast object reuse (2)                    |
|  4 |      PX SEND QC (RANDOM)                | :TQ10000           |
699 |     2 |        16 |     +2 |     8 |      699 |      |       |     .
|          |                                                    |
|  5 |       PX BLOCK ITERATOR                 |                    |
699 |     2 |        16 |     +2 |     8 |      699 |      |       |     .
|          |                                                    |
|  6 |        TABLE ACCESS STORAGE FULL        | RTNI               |
699 |     2 |        16 |     +2 |    31 |      699 |   32 | 256KB |     .
|          |                                                    |
|  7 |     TABLE ACCESS BY INDEX ROWID BATCHED | TAB1               |
 40 | 21849 |       747 |     +3 |   699 |    42428 | 791K |   6GB |     .
|    57.40 | gc cr grant 2-way (17)                             |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | gc cr multi block request (1)                      |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | gc cr request (1)                                  |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | gc current grant 2-way (16)                        |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | Cpu (86)                                           |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell list of blocks physical read (3)              |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell multiblock physical read (5)                  |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell single block physical read: flash cache (289) |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell single block read request (1)                 |
|  8 |      INDEX RANGE SCAN                   | TAB1_IX1           |
 272K | 16266 |       746 |     +4 |   699 |      56M | 366K |   3GB |
. |    42.33 | gc cr grant 2-way (8)                              |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | gc current grant 2-way (7)                         |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | gc current request (1)                             |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | Cpu (42)                                           |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell single block physical read (114)              |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell single block physical read: flash cache (135) |
|    |                                         |                    |
    |       |           |        |       |          |      |       |
|          | cell single block read request (2)                 |

======================================================================================================================================================================================================================

On Sun, Jun 25, 2023 at 1:36 PM Pap <oracle.developer35@xxxxxxxxx> wrote:

Thank you so much Lothar, Yudhi.

I think you guys are spot on. I don't see 'enable parallel dml' in the
procedure in which this query is getting executed from and also no such
hints.

And another thing which you posted and also i am seeing (even i ran your
sample SELECT query independently without INSERT). is also doing index
access in parallel( "PX SELECTOR" above "index range scan" and "PX SEND QC
(RANDOM)" above the "TABLE ACCESS BY INDEX ROWID BATCHED" line in the plan)
as its showing in the plan as below. So it's clear that the INSERT can be
made parallel. But I am still unable to understand why in the case of my
independent SELECT query which I posted in the thread initially , the index
access is getting serialized and if there is any way to make that
parallelized?

explain plan for select /*+parallel(10)  */ * from x where object_id=123 ;
select plan_table_output  from
table(dbms_xplan.display('plan_table',null,'ADVANCED'));

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2707744295


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes
| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |     1 |    30
|     4   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                       |          |       |
|            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001 |     1 |    30
|     4   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| X        |     1 |    30
|     4   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     BUFFER SORT                       |          |       |
|            |          |  Q1,01 | PCWC |            |
|   5 |      PX RECEIVE                       |          |     1 |
|     3   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000 |     1 |
|     3   (0)| 00:00:01 |  Q1,00 | S->P | HASH (BLOCK|
|   7 |        PX SELECTOR                    |          |       |
|            |          |  Q1,00 | SCWC |            |
|*  8 |         INDEX RANGE SCAN              | IDX11    |     1 |
|     3   (0)| 00:00:01 |  Q1,00 | SCWP |            |

-------------------------------------------------------------------------------------------------------------------------------

On Sun, Jun 25, 2023 at 1:21 PM Lothar Flatz <l.flatz@xxxxxxxxxx> wrote:

Hi,

true, that the most obvious reason. Did not think about it.
Parallel_dml could be enabled in the session too.

Thanks

Lothar

Am 25.06.2023 um 09:41 schrieb yudhi s:

I see , your hint is not having 'Enable_parallel_DML' in it . Did you
check if the INSERT is happening in the session which is having 'enable
parallel dml' set? I believe it has to be set explicitly for any DML to go
for parallel processing. See below example..

Also I see "TABLE ACCESS BY INDEX ROWID BATCHED '' is happening and is
showing parallel executions in the plan, so the thought which you initially
posted as "batched index access" (which is new in 12c) can not be
parallelized seems not right. Others can comment on it.

create  table t ( id number , c1 varchar2(130) )  ;
create table x as select object_id , object_name from dba_objects ;
create index indx11 on x(object_id);
explain plan for  insert into t select * from x ;
  select * from table(dbms_xplan.display) ;

  explain plan for insert /*+APPEND enable_parallel_dml */ into t select
/*+parallel(10)  */ * from x where object_id=123 ;
  select plan_table_output  from
table(dbms_xplan.display('plan_table',null,'ADVANCED'));



---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name     | Rows  |
Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

---------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                        |          |     1 |
 30 |     4   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                         |          |       |
  |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                   | :TQ10001 |     1 |
 30 |     4   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)     | T        |       |
  |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING      |          |     1 |
 30 |     4   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| X        |     1 |
 30 |     4   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                        |          |     1 |
  |     3   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX SEND HASH (BLOCK ADDRESS)     | :TQ10000 |     1 |
  |     3   (0)| 00:00:01 |  Q1,00 | S->P | HASH (BLOCK|
|   8 |         PX SELECTOR                     |          |       |
  |            |          |  Q1,00 | SCWC |            |
|*  9 |          INDEX RANGE SCAN               | IDX11    |     1 |
  |     3   (0)| 00:00:01 |  Q1,00 | SCWP |            |

---------------------------------------------------------------------------------------------------------------------------------

On Sun, Jun 25, 2023 at 2:13 AM Mladen Gogala <gogala.mladen@xxxxxxxxx>
wrote:

On 6/24/23 10:43, Pap wrote:

Hello Listers,
This Oracle database version 19C and its exadata. We have a customer
query in which one of the executions is happening in parallel(2) and
another execution is happening in parallel(4) in two different places. The
query is exactly the same , just that the place in which it's getting used
with PARALLEL(4) it's also happening with the APPEND hint.

People here are expecting the query which runs using parallel(4) with
Append should be faster as compared to the other one (i.e. with just
parallel(2)). But it's not happening that way. Want to understand why?

And also in this query the step no which is coming as top contributor
in ASH is table access RFFT using index access path - RFFT_IX7. This is a
big table partitioned on column C_KEY.  Other tables are small tables. So
just wondering if the higher parallel threads were not helping the query
because the index access is happening using BATCHED index range scan (i.e.
TABLE ACCESS BY LOCAL INDEX ROWID BATCHED). or are we misinterpreting the
execution plan here and something in the query is not making the index scan
to happen in parallel?

Below is the sql monitor details of the two plans. Below  are two sql
monitors, in one case it has processed 2million vs other 3million , however
if we see the processing speed i.e. ~67 seconds for 3million vs 38 seconds
for 2million. The throughput seems to be more or less the same only. The
Parallel-8+ Append seems to be comparatively slower though, which is odd.

https://gist.github.com/oracle9999/618251c1e48b315dc70c73e157443773

Regards
Pap

Well, the only way to find out why is the optimizer choosing
parallel(2) instead of parallel(4) is to take the 10053 trace. My guess
would be that some of the objects involved have a non-zero degree of
parallelism in their definition. That happens when indexes are rebuilt in
parallel. The best place to start reading about the 10053 trace is the
following:

https://jonathanlewis.wordpress.com/2014/05/23/10053-trace/

Second, range scan *never* happens in parallel. Maybe not even then.
If a range scan should retrieve so much data that parallel execution would
be beneficial, you're doing it wrong.

Lastly, Exadata is a data warehouse machine, so I assume that your DB
is a data warehouse. If so, you should check stuff like star schema and
snowflake schema (not to be confused with the RDBMS brand sold on the major
clouds) and maybe even read Ralph Kimball's  "Data Warehouse Toolkit".
Data warehousing philosophy is not the same as the OLTP philosophy. There
are tricks of the trade. I strongly advise Kimball's book.

Regards

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217https://dbwhisperer.wordpress.com



Other related posts: