RE: PDML and PQ tunning

From: Herring Dave - dherri <Dave.Herring@xxxxxxxxxx>
To: "Radoulov, Dimitre" <cichomitiko@xxxxxxxxx>
Date: Mon, 21 Nov 2011 16:41:51 +0000
In an ideal world you'd have some sort of test system where you could run the 
same query over and over, capturing statistics and checking the impact of 
various parameter changes to see exactly what is happening and why.  I've had 
clients before where there weren't any test systems, so I understand if you're 
stuck.

When you captured statistics below with Tanel's "snapper" script, did you limit 
the SIDs in any way?  It's important that with parallel operations you include 
all the PX slaves, not just the QC.

Can you run any of the queries on your own?  If so, enable event 10046 level 12 
beforehand at the session level to capture all waits for both your session (QC) 
and all the PX slaves.  That way you can find out exactly what waits there are 
for the PX slaves as well.  Also query V$PQ_TQSTAT immediately after the query 
finishes, in the same session.  Or if the system is idle outside of these 
queries you could also perform manual statspack snapshots before and after.  
Also, the xplans would be helpful in being able to see details in the 
parallelism columns - TQ|IN-OUT|PQ Distrib.  I believe under 9.2.0.6 we ran 
into a CPU patch that changed the PQ distribution for one particular query, 
causing a huge performance problem.

It'd be nice to know why there are waits on the PX qref latch, meaning there's 
potentially an imbalance in the speed of procedures vs. consumers, but if you 
can't get any details or doing testing then you might need to rely on 
increasing PEMS.  Just note that I believe under 9i if you don't set a value 
for the large pool then PEMS buffers will come out of the shared pool.  
PARALLEL_AUTOMATIC_TUNING may need to be set as well, I can't remember.

DAVID HERRING
DBA
Acxiom Corporation
EML   dave.herring@xxxxxxxxxx
TEL    630.944.4762
MBL   630.430.5988 
1501 Opus Pl, Downers Grove, IL 60515, USA
WWW.ACXIOM.COM  

The information contained in this communication is confidential, is intended 
only for the use of the recipient named above, and may be legally privileged. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this communication in error, 
please resend this communication to the sender and delete the original message 
or any copy of it from your computer system. Thank you.

-----Original Message-----
From: Radoulov, Dimitre [mailto:cichomitiko@xxxxxxxxx] 
Sent: Sunday, November 20, 2011 5:09 AM
To: Herring Dave - dherri
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: PDML and PQ tunning


Hi Dave,
thanks for replying!

Details in-line.


On 19/11/2011 17:36, Herring Dave - dherri wrote:
>
> As for your specific situation, you haven't shared a lot of detail.

I know, I did that on purpose because I suspect there is bug involved so 
I wanted to share only what seemed sufficient in order to briefly 
describe the issue.
For a full blown analysis I'll need to share everything.

> For PQs, are the underlying tables large enough to warrent parallelization?  
> Smaller tables can actually run slower with parallelization, due to all the 
> overhead of starting x number of PX slaves, dividing up the workload, 
> communication between PX slaves and the QC, then producing the results and 
> cleanup.  "Smaller" is obviously relative but from what I've seen in working 
> on DWs, any table smaller than 150 MB to 200 MB usually should be left at 
> degree 1.

Yes, this is a kind of *small* dwh environment (the whole database is 
about 200g) running on old machines.
The statements I'm currently analyzing join tables which are about 3 - 
4g each, the joins are rather simple (mostly 2 tables at a time).

> Have you compared xplans between serial and parallel executions?  Were the 
> serial executions using full scans?  Was more than 1 table joined and if so 
> were both large and had a parallel degree set?

Yes, I've compared the plans and in the case I'm talking about both 
plans use the same access type (FTS).
Both the serial and the parallel plans indicate hash join with FTS, I've 
tried serial, parallel 2 and 4.
In all cases the serial execution was the fastest.

>
> As for PDML, were you working with CTAS operations?  Or INSERT...SELECT?  For 
> the latter were you attempting to have the INSERT run in parallel or just the 
> SELECT?

Insert as select, both insert and select parallelized. In most cases 
even the single select was slower with parallel degree 2
compared to the serial one.

>
> For both PQ and PDML, were the PX qref latch waits the highest of all waits?  
> If not, what were the top 5 waits before and after the change for both types 
> of operations?

Good questions.  I don't have access to the trace files so for now 
extended SQL trace is not an option for now.
I've used Tanel Poder's snapper script to take a *quick look* at the 
wait events and I've got the following result
(just a small part, there is a single select statement running):

--------------------------------------------------
Active% | SQL_HASH_VAL | EVENT
--------------------------------------------------
    200% | 114550436    | PX qref latch

--  End of ASH snap 2, end=2011-11-18 11:24:45, seconds=20, samples_taken=19


--------------------------------------------------
Active% | SQL_HASH_VAL | EVENT
--------------------------------------------------
    200% | 114550436    | PX qref latch
      6% | 3463809334   | ON CPU

--  End of ASH snap 3, end=2011-11-18 11:25:05, seconds=20, samples_taken=18


--------------------------------------------------
Active% | SQL_HASH_VAL | EVENT
--------------------------------------------------
    200% | 114550436    | PX qref latch

--  End of ASH snap 4, end=2011-11-18 11:25:26, seconds=21, samples_taken=19


--------------------------------------------------
Active% | SQL_HASH_VAL | EVENT
--------------------------------------------------
    200% | 114550436    | PX qref latch

--  End of ASH snap 5, end=2011-11-18 11:25:46, seconds=20, samples_taken=19


I'll admit that I though this was a bug so I hoped to find it documented 
on MOS quickly and to not spend more time at this time.
Now I believe that that's not the case (i.e. I didn't find anything 
useful for now), so next week I'll try to set up a complete test case 
and get back here with the details.


Thanks again
Dimire
--
//www.freelists.org/webpage/oracle-l
Follow-Ups:
- Re: PDML and PQ tunning
  - From: Radoulov, Dimitre
References:
- PDML and PQ tunning
  - From: Radoulov, Dimitre
- RE: PDML and PQ tunning
  - From: Herring Dave - dherri
- Re: PDML and PQ tunning
  - From: Radoulov, Dimitre
RE: PDML and PQ tunning

Other related posts: