Re: Is Parallelism happening at INSERT level?

  • From: Jonathan Lewis <jlewisoracle@xxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 6 May 2020 08:05:05 +0100

Ram,

As Lothar says, you are doing a parallel insert as well as a parallel
select.
You need only 4 slaves to run this one parallel 4 because of the
optimizer's choice for the plan, which is taking advantage of the 12c
PQ_REPLICATE() mechanism.

Each slave reads the WHOLE efgh table into an in-memory hash table and then
scans sections of the sce table to do the probe into its copy of the dfgh
table, then inserts surviving rows in batches.

If you were on 11g the plan would look more like:

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  | Bytes |
Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |          |       |       |
487 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                    |          |       |       |
       |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001 |   564K|    75M|
487  (11)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| T2       |       |       |
       |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |          |   564K|    75M|
487  (11)| 00:00:01 |  Q1,01 | PCWP |            |
|*  5 |      HASH JOIN RIGHT SEMI          |          |   564K|    75M|
487  (11)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                   |          | 56764 |   277K|
 42   (5)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX SEND BROADCAST           | :TQ10000 | 56764 |   277K|
 42   (5)| 00:00:01 |  Q1,00 | P->P | BROADCAST  |
|   8 |         PX BLOCK ITERATOR          |          | 56764 |   277K|
 42   (5)| 00:00:01 |  Q1,00 | PCWC |            |
|*  9 |          TABLE ACCESS FULL         | T1       | 56764 |   277K|
 42   (5)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |       PX BLOCK ITERATOR            |          |   567K|    73M|
438  (10)| 00:00:01 |  Q1,01 | PCWC |            |
|* 11 |        TABLE ACCESS FULL           | T0       |   567K|    73M|
438  (10)| 00:00:01 |  Q1,01 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------

And this WOULD run with 8 slaves - 2 sets of 4.

As for timing - your figures report 536 GB for sce, to be scanned in 1:51.
That's roughly 5 GB per second - with each slave doing 1.25GB per second
and allowing no time for any of the CPU required to do the hash join and
write the data.  Is that realistic ?  It seems a little unlikely.
Are any of the direct path reads and write also "temp" - since you've got 4
copies of 10GB (estimated) data, it seems likely that the hash join is
going to spill and end up doing a lot of temp space writes.


Regards
Jonathan Lewis





On Tue, May 5, 2020 at 11:34 PM Ram Raman <veeeraman@xxxxxxxxx> wrote:

Hi,

We are trying to tune a query to use parallelism at multiple levels. Here
is the query along with the plan gotten by using dbms_xplan:

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1qrcwrva8jgba, child number 0
-------------------------------------
insert /*+ ENABLE_PARALLEL_DML APPEND PARALLEL(4) */ into abcd select
/*+ parallel(4) */ * from user1.sce where esk in
(select esk from efgh)

Plan hash value: 1996703413


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name       | Rows  | Bytes |
Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |

----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                   |            |       |       |
2840K(100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR                    |            |       |       |
          |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10000   |    30M|    10G|
2840K  (1)| 00:01:51 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)|            |       |       |
          |          |       |       |  Q1,00 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |            |    30M|    10G|
2840K  (1)| 00:01:51 |       |       |  Q1,00 | PCWP |            |
|*  5 |      HASH JOIN RIGHT SEMI          |            |    30M|    10G|
2840K  (1)| 00:01:51 |       |       |  Q1,00 | PCWP |            |
|   6 |       TABLE ACCESS FULL            | efgh       |   193K|  1131K|
  36   (3)| 00:00:01 |       |       |  Q1,00 | PCWP |            |
|   7 |       PX BLOCK ITERATOR            |            |  1547M|   536G|
2839K  (1)| 00:01:51 |     1 |    32 |  Q1,00 | PCWC |            |
|*  8 |        TABLE ACCESS FULL           | sce |  1547M|   536G|
2839K  (1)| 00:01:51 |     1 |    32 |  Q1,00 | PCWP |            |

----------------------------------------------------------------------------------------------------------------------------------------------
......
Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   - Degree of Parallelism is 4 because of hint


Given that P->S happens in line id 2, is this statement using parallelism
for the INSERT part of it? I can see a parallelism of 4 reported at the
v$px_session table for this SQL. Should I see 8 rows if there is
parallelism of 4 for SELECT and INSERT each?

Dont know why it is saying the statement will complete in 1:51. It has
been running for an hour.  The waits are all on 'direct path read/writes'.
we have some bad IO, but still finding ways to tune the statement. ver is
12c.

Thanks
Ram

--



Other related posts: