Re: Estimations are as good as it can be, what else can be done here?

From: yudhi s <learnerdatabase99@xxxxxxxxx>
To: Jonathan Lewis <jlewisoracle@xxxxxxxxx>
Date: Thu, 9 Nov 2023 16:20:53 +0530

I ran the query manually with parallel(6) and below is the sql monitor. It
does finish in ~40 seconds overall. But yes need to execute it from the UI
by changing the view definition with parallel hint , to see howmuch extent
its helping the real production run which runs with a first_rows kind of
setup.

https://gist.github.com/databasetech0073/62b51587b3f590914e9a92218609672a

But again, if volume increase a bit the response time may again exceed and
goes beyond ~10 second reponse which is not good experience for any UI
user. So my doubt is, if we should really not expect these type of queries
(e.g. with 10 table joins +~2million matching rows +sorting on top) to
finish in <10 seconds and thus some different design approach should be
followed to cater such requirement? Like persisting the
transformed/aggregated results before hand in a materialized view or
anything else?

On Thu, 9 Nov, 2023, 2:21 am yudhi s, <learnerdatabase99@xxxxxxxxx> wrote:

Thank You Jonathan.

If I see the total "read bytes" , then out of ~4GB, the majority of the
"read bytes"(~2GB) were contributed by the bottom 6 lines in the execution
path. So should we look to tune this part of the query plan? Though the
"activity %" column in the sql monitor is not showing a major overall
contribution from this section. So I'm a bit confused here, when trying to
match the stats. And there too one of the index storage full access
paths(IDX_PBO) for table TAB_PBO three times as it's used three times in
the view definition as self outer joins, followed by the full scan on table
TAB_PP_TD which results in 2million satisfied rows. All the joins on
table TAB_PBO happen on column B_ID, but I think we also fetch the column
B_NAME from this table in the view definition, thus it's using the
composite index which is created on (B_name,B_ID ) and goes for "fast full
scan". Basically I am trying to see if creating/modifying some index will
help us, even if we read these 2million matching rows across all the
tables/joins.

I understand the points you shared in the last post are based on the "time
active" and "start active" column and that shows the sum of the "Time
Active" and "Start Active" coming as ~34 seconds and ~37 seconds  in
production run and manual run respectively. Still wondering, how did you
interpret all the "HASH JOIN RIGHT OUTER" to be the bottleneck here ?
But if we take the production run, the start active column for "sort
unique" shows as ~22, so doesn't it mean that the sorting started at 22nd
second and continued till next ~12 seconds(i.e. the "Times active"). That
way the sorting lasted for ~12 seconds. Is my understanding wrong here?

I am yet to try the parallel(6) option on the query though, but I am
hoping adding the parallel(6) either within the view definition or in the
main query will give a similar result. Correct me if I'm wrong.

======================================================================================================================================================================================================================
| Id |                 Operation                 |        Name        |
Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |
Mem  | Activity |                 Activity Detail                  |
|    |                                           |                    |
(Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |
(Max) |   (%)    |                   (# samples)                    |

======================================================================================================================================================================================================================
| 23 |              TABLE ACCESS STORAGE FULL    | TAB_PR_COMP        |
   1M | 19146 |         3 |     +4 |     1 |       1M |  892 | 858MB |
14MB |     3.03 | cell smart table scan (1)                        |
| 24 |              INDEX STORAGE FAST FULL SCAN | IDX_PBO            |
   8M | 11692 |         7 |     +6 |     1 |       8M | 2542 | 539MB |
. |    12.12 | Cpu (3)                                          |
|    |                                           |                    |
      |       |           |        |       |          |      |       |
  |          | cell multiblock physical read (1)                |
| 25 |             TABLE ACCESS STORAGE FULL     | TAB_PR_SITE        |
   1M |  9771 |         3 |    +12 |     1 |       1M |  457 | 439MB |
14MB |          |                                                  |
| 26 |           INDEX STORAGE FAST FULL SCAN    | IDX_PBO            |
   8M | 11692 |         5 |    +14 |     1 |       8M | 2192 | 147MB |
. |     9.09 | cell multiblock physical read (2)                |
|    |                                           |                    |
      |       |           |        |       |          |      |       |
  |          | cell single block physical read: flash cache (1) |
| 27 |          INDEX STORAGE FAST FULL SCAN     | IDX_PBO            |
   8M | 11692 |         5 |    +18 |     1 |       8M |      |       |
. |          |                                                  |
| 28 |         TABLE ACCESS STORAGE FULL         | TAB_PP_TD          |
   2M | 29829 |        12 |    +22 |     1 |       2M | 1664 |   1GB |
14MB |          |                                                  |

======================================================================================================================================================================================================================

On Thu, Nov 9, 2023 at 1:21 AM Jonathan Lewis <jlewisoracle@xxxxxxxxx>
wrote:

I've learned something new today - I hadn't realised that the SQL Monitor
would report "DONE (FIRST N ROWS)" if you interrupted a query after the
first fetch, I would have guessed that a ctrl-C from SQL*Plus would have
resulted in "DONE (ERROR)".  So the 'first N rows' really could be the
first N rows fetched out of 2M. Given that it's a sort unique (due to the
distinct combined with the "order by" and the optimizer getting clever) the
preceding hash join and entire 2M row sort must have completed before the
first row was sent to the user.

Looking at your manual run: It's a bit of an approximation, but if you
check the start time and active time on the three hash join right outers
(operations 2, 9 and 16) and the start and active time for the last "probe"
table of the hash join at operation 16 (i.e. operation 32) you can see they
all sum to 37 seconds. Your Elapsed time according to the Global Stats is
42 seconds, but the clock time is 58 seconds. So it's taking roughly 5
seconds to do the sorting, and 16 seconds to move the data,

In the production run it's taking virtually no time to transfer the 329
rows to the front end; so the target database time to beat is the 33
seconds (or 34 seconds summing picking up the equivalent 4 operations), of
which about 5 seconds is in the sorting.  So somewhere in the joins you
have to eliminate another 18 seconds if you want to come in under 10
seconds.

Apart from the suggestion to use parallel query (around degree 6 to 8,
probably) the thing you need to do is examine the logic of the query and
see if you can find some way to eliminate data early, aggregate early, or
join late (after aggregating).

Regards
Jonathan Lewis

P.S  My comment about the scalar subquery transformation was wrong - I
misread the SQL and thought you had two inline views in the Select list,
but they're in the From clause.

P.P.S.  The "hint" to use 11.2.0.4 optimizer features is a comment, not a
hint.  Which is it in the view definition.

On Wed, 8 Nov 2023 at 18:43, yudhi s <learnerdatabase99@xxxxxxxxx> wrote:

Thank you Jonathan.

The first sql monitor is from the actual query which triggers from the
applications UI screen which is why FIRST_N_ROWS optimization mode shows in
the query, i believe. And that query is on a VIEW(VW_PPTG) with one filter
on the predicate and "order by" on ~12 columns.

The second sql monitor which i posted , is by executing the query
manually. Also exposing the view-VW_PPTG definition outside and using
directly in the query (that is why if you see the second query, it looks
bigger with Joining ~10 tables). I did this way, as I was expecting some
difference , But i don't see much in overall response time.

So you are correct , the 329 rows from the first query sql monitor,
because it stops early because of first_row optimization as it is executed
from UI, in the first case.

The "optimizer_features_enable('11.2.0.4')" hint is in the view
definition and I have put it exactly the same way, when I ran it manually.
but i also saw if i remove it , then also i don't see much of the
difference in the execution time. This hint is there in the view
definition, so i believe the first query execution also must have executed
with the same hints inside the view definition.

Thus data duplicate issue is not there i believe.

Considering it really has to read those 2million and sort those , is it
possible to get the response time for those first_n_rows within <10 seconds
anyway?

I am yet to fully understand the blog below which you highlighted. I
will see if we can utilize something similar for tweaking the query in my
case and get some benefit.

https://jonathanlewis.wordpress.com/2010/05/18/double-trouble/

On Wed, Nov 8, 2023 at 5:34 PM Jonathan Lewis <jlewisoracle@xxxxxxxxx>
wrote:

Looking through the SQL Monitor output, it seems that you really do
want 2M rows generated and sorted, and the optimizer has done a couple of
clever things to make that happen as efficiently as possible.

A couple of discrepancies, though -
1 The first plan reports (FIRST N ROWS), the second reports (ALL ROWS)
in the Status.
2 The first plan reports 329 rows returned, the second reports 2M. Is
this because of "stopping early", or because of the choice of data.
3 If the latter why would there be so many duplicates when the query
was originally run  ***
4 The second query shows a comment (was it once a hint)
optimizer_features_enable('11.2.0.4'), but the plan shows scalar subquery
unnesting, which I think is a version 12 feature. Do you have some other
optimizer_features_enable set for the production query that might be
blocking a helpful transformation?

GIven point 3 it's worth asking if there's any scope for finding out
which table introduces the duplicates and replacing a reference to the
table-name appear with an inline view of the form (select distinct
column_list from tableX where ...)  The optimizer calls this "distinct
placement" and can do it automatically, but possibly it doesn't see the
opportunity here, or maybe it's just not possible.

The return of 329 rows (after reduction from 2M) is normal, while the
return of 2M rows is a very special (possibly unique) case. then you may be
able to find a way of using a "two-pass" approach to the problem. You call
for a distinct on 56 columns, and then order by 12 of them.  If most of
those 44 "extra" columns come from a single table you may be able to find a
way creating an inline view that does a select distinct on a much smaller
number of columns (including the rowid from the one "large column list"
table) and then does a second join to that table with a much reduced set of
rows.

There's a trivial example of the type of thing at: double trouble |
Oracle Scratchpad (wordpress.com)
<https://jonathanlewis.wordpress.com/2010/05/18/double-trouble/>
Your distinct, though, means it might not be possible to use this
pattern effectively.

Regards
Jonathan Lewis

On Wed, 8 Nov 2023 at 04:15, yudhi s <learnerdatabase99@xxxxxxxxx>
wrote:

Hello Listers,

It's version 19C of oracle.
We have a search query which is getting submitted from the UI and the
expected response time is <10 seconds to display the first ~200 odd rows
on
the screen, but it takes ~50 seconds for that. The sql monitor shows the
cardinality estimation is as good as it can be. The query has ~10 tables
used in the join conditions. I wanted to understand what else can be
done(like query modification, design change etc) to make this query finish
in <10 seconds response time.

The time seems equally distributed in the plan, just that the sorting
of the final result set before publishing is consuming ~20% of the
execution time, but we need those data sets to be sorted in that order
before showing up on the screen. So not sure what else can be done here,
to
improve this?

The main query is based on the top of a view, but I even executed it
manually by exposing the internal query which is used for creating the
view. Not seeing much of a difference.

Below I have published the query and its sql monitor.

https://gist.github.com/databasetech0073/046b198e977d200c70778a973c223ed9

Regards
Yudhi

Follow-Ups:
- Re: Estimations are as good as it can be, what else can be done here?
  - From: Andy Sayer
- Re: Estimations are as good as it can be, what else can be done here?
  - From: Jonathan Lewis

References:
- Estimations are as good as it can be, what else can be done here?
  - From: yudhi s
- Re: Estimations are as good as it can be, what else can be done here?
  - From: Jonathan Lewis
- Re: Estimations are as good as it can be, what else can be done here?
  - From: yudhi s
- Re: Estimations are as good as it can be, what else can be done here?
  - From: Jonathan Lewis
- Re: Estimations are as good as it can be, what else can be done here?
  - From: yudhi s

Re: Estimations are as good as it can be, what else can be done here?

Other related posts: