Re: Parallel Query Performance Issues

From: Jonathan Lewis <jlewisoracle@xxxxxxxxx>
To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Mon, 13 Jun 2022 19:46:27 +0100

If you look closely at the plan you will see that
a) the plan has changed completely - the optimizer is now using a star
transformation
b) the critical PQ distribution is still using hash/hash  distribution
c) the hash distribution is still massively skewed towards one PX serve
(p013)

Apart from changing a hidden parameter did you do anything else to get the
change in plan?  (Like testing with the Outline Information from the
original 2019 plan but with the relevant pq_distribute() hash/hash hint
changed to a broadcast/none; or testing with the Outline Information from
the original 2018 plan against the 2019 query.)

I mentioned the idea of rewriting the query to eliminate duplicates
earlier.  Reviewing the query I see that you have an inline "select
distinct" view, and an outer "select distinct" - but the plan has only one
"Hash  UNIQUE"  operation. Have you tried simply adding a no_merge hint to
the inline view?

In passing I see the inline view mixes "traditional" Oracle syntax with
ANSI syntax. This is a very bad idea and may be contributing to the
optimizer doing something very silly in with its join ordering and
arithmetic - whatever else I did to find a faster execution path I'd change
the code to be either traditional or ANSI, but not a mixture.

Regards
Jonathan Lewis

On Mon, 13 Jun 2022 at 11:02, Goti <aryan.goti@xxxxxxxxx> wrote:

Thanks Andy and Jonathan.

I did change _parallel_broadcast_enabled to TRUE to have "PX BROADCAST in
the plan. But still it doesn't improve the response time of the SQL. Can
you please help me to identify why the step 38 actual rows shows 495M
whereas Oracle estimates it to be 1 row. Below are the gist details.

https://gist.github.com/aryangoti/ec2804a7b832a7fe606ec0bf6a0681b7

Thanks,

Goti

On Thu, Jun 9, 2022 at 8:15 PM Andy Sayer <andysayer@xxxxxxxxx> wrote:

Just quick thoughts - replace the distincts with group by, this might
allow group by placement to happen for you.

The inner distinct doesn’t seem to be executed as a distinct, there might
be clues in the outline if it’s decided that it only need wants to do a
sort.

I’ll have a closer look when I can

Thanks,
Andy

On Thu, 9 Jun 2022 at 15:39, Goti <aryan.goti@xxxxxxxxx> wrote:

Thanks Jonathan for the quick response!

I tried for the first 2 workarounds and that didn't work as expected. I
will work on the 3rd and 4th action plan and update here.

Thanks,

Goti

On Thu, Jun 9, 2022 at 5:41 PM Jonathan Lewis <jlewisoracle@xxxxxxxxx>
wrote:

The two queries may return the same size result, but the 2019 report
generates and aggregates roughly 12 times as much data as the 2018 report.
Check the "Actual Rows" figures - the 2018 report hits 3M rows (and 3M
execs of the subsequent table probes) while the 2919 report hits 39M
rows/execs - and that's where a lot of time goes on CPU.

Strangely (almost) all the data is passed to one PX server (at
operation 13/14, I think) that blows it up through segement NL joins to get
most of the 39M rows that have to be "buffer sorted" (i.e. buffered, but
not actually sorted) which is where the temp space and I/O time goes.

Possible workarounds
- MAYBE if you tried parallel 7 rather than 8 the hash disrtibution at
operation MIGHT be better balanced;
- MAYBE if you set "_gby_hash_aggregation_enabled" to false and got a
SORT UNIQUE instead of a hash unique the distribution would work better.
- if you get the outline information for the plan you should be able to
find the pq_distribute hint controls the distribution at operation 14 and
change it from a hash distribution to a round-robin - this will probably
introduce a 2nd layer of aggregation/uniqueness, but two small, shared
stages may well do better than one very large operation.
- can you rewrite the query to eliminate duplication earlier. This may
require you to include inline non-mergeable views: ideally you want to
avoid generating 39M rows at any point and then executing 39M join steps as
that will still account for a lot of your time.

Regards
Jonathan Lewis

On Thu, 9 Jun 2022 at 12:15, Goti <aryan.goti@xxxxxxxxx> wrote:

Environment : 11.2.0.4 database running on Linux.

Need help to understand parallel query performance issues. Below are
the query details and its associated plans. The 2018_query does execute in
24 seconds and returns about 2.5K rows. The 2019_query is also expected to
process almost the same number of rows however it consumes a lot of TEMP
space and finally fails. The 2019_query without parallel completes in 45
minutes (Just by removing the parallel hint). The only difference between
both the queries is related to the predicate "opclf.year_number =
to_number('YYYY')". The stats are up to date for the tables are
partitions.

2019_query:
https://gist.github.com/aryangoti/a7704a8075f118f7d942e49acee1900d

2018_query:
https://gist.github.com/aryangoti/a7704a8075f118f7d942e49acee1900d

Stats and other details:
https://gist.github.com/aryangoti/a3797424ce0cb4fd87e194c05ad099b6

Thanks,

Goti

References:
- Parallel Query Performance Issues
  - From: Goti
- Re: Parallel Query Performance Issues
  - From: Jonathan Lewis
- Re: Parallel Query Performance Issues
  - From: Goti
- Re: Parallel Query Performance Issues
  - From: Andy Sayer
- Re: Parallel Query Performance Issues
  - From: Goti

Re: Parallel Query Performance Issues

Other related posts: