RE: Increased PGA requirements for SORTs in 19c?

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <mwf@xxxxxxxx>, "'Jonathan Lewis'" <jlewisoracle@xxxxxxxxx>
Date: Sat, 5 Dec 2020 13:43:40 -0500

and with no concurrency set for the union all, the union all with max runs in
about the same time for my baby dataset:

13:36:45 SQL> r

  1  select max(a), max(b)

  2     from

  3  (select count(distinct access_order) A, 0 B from test42

  4  union all

  5*  select 0 A, count(distinct action_x2) B from test42)

    MAX(A)     MAX(B)

---------- ----------

   4000000          6

Elapsed: 00:00:03.29

That’s it. I can stop anytime, really. I’m shutting my test database down right
now.

Anyway, usually you don’t get a hangover from my particular addiction.

Back to you, JL.

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Mark W. Farnham
Sent: Saturday, December 05, 2020 1:27 PM
To: mwf@xxxxxxxx; 'Jonathan Lewis'
Cc: 'Tefft, Michael J'; 'ORACLE-L'
Subject: RE: Increased PGA requirements for SORTs in 19c?

so this seems to work syntactically:

13:08:30 SQL> r

  1  with a as (select count(distinct access_order) "d_access_order" from
test42)

  2  ,

  3  b      as (select count(distinct action_x2) "d_action_x2" from test42)

  4* select a.*, b.* from a,b

d_access_order d_action_x2

-------------- -----------

       4000000           6

Elapsed: 00:00:03.35

13:08:35 SQL>

and it seems to be about the sum of the two separate count distincts and about
twice as fast as the simplest writing of count(distinct):

13:13:55 SQL> select count(distinct access_order) from test42;

COUNT(DISTINCTACCESS_ORDER)

---------------------------

                    4000000

Elapsed: 00:00:01.05

13:16:06 SQL> select count(distinct action_x2) from test42;

COUNT(DISTINCTACTION_X2)

------------------------

                       6

Elapsed: 00:00:01.38

13:16:23 SQL> select count(distinct access_order), count(distinct action_x2)
from test42;

COUNT(DISTINCTACCESS_ORDER) COUNT(DISTINCTACTION_X2)

--------------------------- ------------------------

                    4000000                        6

Elapsed: 00:00:06.87

13:16:57 SQL>

So in this very simple case it seems to be a useful thing to do when required.
I leave site specific details about pga requirements to the reader, because I
really have grown lazy.

I can still probably be bribed out of the woodwork if someone has something
interesting to be done 100% remote.

All the best, I hope the original poster can give this a try in situ. I am
supposing it scales linearly with number of aggregate columns in elapsed time
whereas the union all version might allow each component of the union all with
max version to run in parallel if on a version where you can run union all
components concurrently, but that would not be the lowest memory high water
mark for the query, right?

Anyway, good luck.

mwf

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Mark W. Farnham
Sent: Saturday, December 05, 2020 9:24 AM
To: 'Jonathan Lewis'
Cc: 'Tefft, Michael J'; 'ORACLE-L'
Subject: RE: Increased PGA requirements for SORTs in 19c?

(ergo my preemptive alibi. I make mistakes all the time, so it is less
embarrassing for me than you. (wink. and make sure you work that out as a
compliment since I realized as typing it someone could possibly interpret it
differently.)

I guess a series of “with” definitions (one for each column) might eliminate
the need for the union all and max function, but that might require a no_merge
hint. That’s also probably more friendly if there are additional columns for
aggregate group-by distinct counts. It gets complicated fast.

If you know the index structures any columns grouped together in an index might
be grouped together in a particular “with” but that would be a one off
operation for each query as a method.

I suspect timing each individual query with a single distinct column is
probably a decent “is this worth testing overall” litmus for the concept.

From: Jonathan Lewis [mailto:jlewisoracle@xxxxxxxxx] ;
Sent: Saturday, December 05, 2020 9:06 AM
To: Mark W. Farnham
Cc: Tefft, Michael J; ORACLE-L
Subject: Re: Increased PGA requirements for SORTs in 19c?

Mark,

You're braver than me - that's exactly the sort of thing but I didn't want to
try typing it in without having a database in front of me to check syntax,
spelling, and punctuation.

Regards

Jonathan Lewis

On Sat, 5 Dec 2020 at 13:59, Mark W. Farnham <mwf@xxxxxxxx> wrote:

something like

select max(a),max(b)

   from

   (select count(distinct a),0 “b” from tab

    union all

    select 0 “a”, count(distinct b) from tab)

so each select can use the fastest method to get the distinct count of its
column and the union all only has the number of columns required as its number
of rows to unionize. Then each independent distinct would need only whatever
temporary memory its plan might require (possibly nearly nothing) saving its
single row result set also in nearly nothing for which the union all to operate
on. You might need to cast the literal to match the type thrown by count.

direct from fingers, didn’t test syntax, just couldn’t work out your missing
word or words after “the” so checking if I understood your meaning.

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Jonathan Lewis
Sent: Thursday, December 03, 2020 7:49 AM
To: Tefft, Michael J
Cc: ORACLE-L
Subject: Re: Increased PGA requirements for SORTs in 19c?

Michael,

Thanks for the comments, possibly the non-appearance of the "hash group by"
relates to something in

https://jonathanlewis.wordpress.com/2018/07/14/quiz-night-33/

A thought I've had about multiple count(distinct) in a select query is that
there may be cases where you could benefit by rewriting the query first as a
union all of queries that each handle one of the count(distincts) - since each
would then be subject to the  and then aggregate across the union all.

Regards

Jonathan Lewis

On Thu, 3 Dec 2020 at 12:37, Tefft, Michael J <Michael.J.Tefft@xxxxxxxxxx>
wrote:

As a separate workaround, I was trying to force the queries to use hash
aggregation instead of sort aggregation. I was able to get one query to use the
HASH GROUP BY (which was also restored good performance) but was unable to get
another to bend to my will. This is why I was searching for hash aggregation
articles.

I did get results from the sort traces, in our non-production 19c environment.
I had not yet attempted to get them from our 12.2 environment, which is
production and is subject to change-control procedures. In the meantime I found
this solution.

I spent years as a data modeler; to see such columns defined as VARCHAR2(4000)
when they would never need to be such a size is…disappointing.

If I have time, I will try to construct a simple test case.

Mike

From: Jonathan Lewis [mailto:jlewisoracle@xxxxxxxxx] ;
Sent: Thursday, December 3, 2020 7:23 AM
To: Tefft, Michael J <Michael.J.Tefft@xxxxxxxxxx>
Cc: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Subject: Re: Increased PGA requirements for SORTs in 19c?

CAUTION: This email originated from outside of Snap-on. Do not click on links
or open attachments unless you have validated the sender, even if it is a known
contact. Contact the sender by phone to validate the contents.

That's very interesting,

Thanks for posting your conclusions.

Do you have a small model / test case that you could post for tother people to
play around with ?

Since you mention hash aggregation does that mean you didn't actually get any
results for the 10032 / 10033 traces which are for sorting only, not for
hashing ?

Regards

Jonathan Lewis

On Thu, 3 Dec 2020 at 11:50, Tefft, Michael J <Michael.J.Tefft@xxxxxxxxxx>
wrote:

Thanks to all for your advice.

We have found a workaround for this issue.

The query involves a view that uses a GROUP BY to 'pivot' a table's
VARCHAR2(4000) column into over 20 copies of this column. The data in the
column never has more than 33 characters. But the optimizer seems to be using
the maximum (declared) column size to determine its memory needs - which has
spilled over to TEMP demands as well.

This seems to be a change in behavior from 12.2 to 19c. The disparity in memory
allocation suggests that the previous version probably used the column
statistics to plan the memory/temp requirement: we observed roughly a 100x
increase in PGA+TEMP and the difference between 'declared' data size and
statistics of the column data size is also roughly 100x.

References:
- RE: Increased PGA requirements for SORTs in 19c?
  - From: Tefft, Michael J
- Re: Increased PGA requirements for SORTs in 19c?
  - From: Jonathan Lewis
- RE: Increased PGA requirements for SORTs in 19c?
  - From: Tefft, Michael J
- Re: Increased PGA requirements for SORTs in 19c?
  - From: Jonathan Lewis
- RE: Increased PGA requirements for SORTs in 19c?
  - From: Mark W. Farnham
- Re: Increased PGA requirements for SORTs in 19c?
  - From: Jonathan Lewis
- RE: Increased PGA requirements for SORTs in 19c?
  - From: Mark W. Farnham
- RE: Increased PGA requirements for SORTs in 19c?
  - From: Mark W. Farnham

RE: Increased PGA requirements for SORTs in 19c?

Other related posts: