RE: Exadata Tuning Question+

  • From: "Matthew Parker" <dimensional.dba@xxxxxxxxxxx>
  • To: <iggy_fernandez@xxxxxxxxxxx>, "'Jonathan Lewis'" <jonathan@xxxxxxxxxxxxxxxxxx>, "'ORACLE-L'" <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 7 Nov 2014 21:05:18 -0800

The lag in the screen can be network, firewall, vpn, etc that is between you
and the database when transferring large amounts of data and the long run
times can be directly related when returning data to the screen. 

 

Real world example

I was on a project assisting a team porting data from one system to another.

There was a large consulting firm doing the project and had come up with
audit rules that required proof the data had been moved completely and that
certain rules had a been applied to the data. Their verification rules
basically required a specific designated set of queries to be run which
returned millions of rows or data per query and they had to take a screen
shot at the end of the query run which showed some complete rows of data in
case someone wanted to look them up in the source system and had the row
count  at the bottom.

 

I was involved when their query had already ran for over 8 hours but they
were trying to meet a schedule that as many schedules didn't take reality
into consideration and was still returning data to the screen.

After a little analysis and a modification to their query I could run a
basic count(*) version of their query in about 3 minutes.

Since they had an audit requirement and they just wanted me to run it for
them, I launched the query. About 15 minutes later, I decided this wasn't
going to work. Based on the order by in the query I was able to determine
how many rows had actually returned to my screen at that point and a simple
calculation showed that the query would complete in about 23 hours, even
though I knew from running it on the database server that it only took 3
minutes.

I put the SQL on the database server and ran the output to a file without
displaying it on the screen that took about 21 minutes to write the rows to
disk and had the output requirement they needed.

 

Also to understand one other component in the mix, since the developers in
some cases did their work from the Corporate office and/or India  I drove in
to the office to run the test there too. A 15 minute run and calculation
showed that removing the vpn into Corporate from my home, then the query was
projected to run for 13 hours versus the 23 hours or 21 minutes locally on
the database server.

 

Management was significantly unhappy to learn how much they were being
billed for people sitting in front of their screens watching data return to
meet an artificial audit requirement, especially since we found out the
offshore India development team was running the same queries and they would
run for days.

 

Audit requirement was modified.

 

Moral to the story, is you need to understand the components that sit
between you and the database server and something as simple as returning
data to the screen can cost you a significant amount of time. (The database
server or the query execution plan are not the problem in many cases.)

 

 

 

 

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Iggy Fernandez
Sent: Friday, November 7, 2014 6:20 PM
To: Jonathan Lewis; ORACLE-L
Subject: RE: Exadata Tuning Question+

 

Jonathan,

 

Actually, I didn't read the comment that "we could expect the server to take
a couple of seconds to populate the next array fetchsize of rows before
returning them" before I got onto my ornery mule to deliver a sermon :-) But
let me try and paint my way out of my corner.

 

I fell into a trap because the original posting did not include a query or
query plan. But perhaps I can still argue that the cause of the pauses is
equally likely to be screen buffering as anything else. The query plan is a
hash join of service_lookup (a large table with 21 million rows out of which
7,331,000 rows satisfy the query filters) to service_location (a large
lookup table with 3,175,000 rows satisfying the query filters) resulting in
7,331,000 rows. Each row of this result requires five scalar subqueries on
service_relationship. We don't know the size of the service_relationship
table but EXPLAIN PLAN thinks that the queries return 25 rows each so, in
the absence of more data, I could reasonably assume that
service_relationship is a small table. If I make such an assumption, I could
reasonably argue that Oracle should be able to perform fetches at a
sustained clip.

 

But now that we have the query in hand, we know that is  grossly inefficient
because it requires 35 million little subqueries--five for each of the 7
million rows in the result. It could be rewritten as three inner joins and a
pivot.

 

SELECT * FROM

(

  SELECT

    slv.sid,

    slv.service_location_id,

    sr.relationship_level,

    RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e",
sr.RELATED_SID || ',') ORDER BY sr.RELATED_SID), '//text()' ) AS
VARCHAR2(4000)) , ',' ) AS related_sid,

  FROM

    service_lookup slv

    LEFT JOIN service_location sl ON sl.service_location_id =
slv.service_location_id

    LEFT JOIN service_relationship sr ON sr.sid = slv.sid AND
sr.relationship_level IN ('1', '2', '3', '4', '5')

)

PIVOT (MAX(related_sid) FOR relationship_level IN (

  '1' AS RELATEDSERVICEINSTANCEIDLEVEL1,

  '2' AS RELATEDSERVICEINSTANCEIDLEVEL2,

  '3' AS RELATEDSERVICEINSTANCEIDLEVEL3,

  '4' AS RELATEDSERVICEINSTANCEIDLEVEL4,

  '5' AS RELATEDSERVICEINSTANCEIDLEVEL5

);

 

Instead of the hard-to-read series of XML functions, all that is needed is
LISTAGG(sr.related_sid, ',') WITHIN GROUP (ORDER BY sr.related_sid) AS
related_sid

 

Iggy

 

  _____  

From: jonathan@xxxxxxxxxxxxxxxxxx
To: iggy_fernandez@xxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx
Subject: RE: Exadata Tuning Question+
Date: Fri, 7 Nov 2014 23:49:07 +0000

 

I take it that this is a response to the other email I sent a few minutes
ago, saying: 

 

"While it doesn't make sense to return 21M rows to the screen, the output
(in general) shouldn't pause every couple of seconds, as this would
(usually) be a measure of much work the database had to do to acquire the
next batch of rows."

 

and then explaining why (probably) the OP is seeing those pauses, viz:

 

"each row from the sample code requires 5 parallel tablescans of a
reasonably large table, so we could expect the server to take a couple of
seconds to populate the next array fetchsize of rows before returning them."

 

 

 

   
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle 

  _____  

From: Iggy Fernandez [iggy_fernandez@xxxxxxxxxxx]
Sent: 07 November 2014 23:24
To: Jonathan Lewis; ORACLE-L
Subject: RE: Exadata Tuning Question+

Perhaps I'm stubborn as a mule that needs coffee but I don't see a problem.
The question was "However, when we display the output to the screen it takes
hours for it to finish and we see pauses in the display every second or two
while it is running. Trying to figure out what is causing these pauses?" The
OP is trying to bring the 7 million rows over the network to his laptop and
scroll them in a Putty screen or something similar, and should expect
serious buffering issues and time to completion. 

 

Iggy



  _____  

From: jonathan@xxxxxxxxxxxxxxxxxx
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Exadata Tuning Question+
Date: Fri, 7 Nov 2014 23:06:54 +0000

 

The 4 seconds you quoted is the time required to do the driving join, which
is (roughly) lines 26 - 36 of your plan.

The rest of the time would be the 35M parallel tablescans of
SERVICE_RELATIONSHIP which appear as the 5 groups of 5 lines in the earlier
part of the plan.

 

Was the suggested index on SERVICE_RELATIONSHIP(relationship_level, sid) ?
This would seem to be the only choice - perhaps with hinting - to change the
efficiency of the query as it is currently written - since the size of the
service_relationship table is (probably) in the range of tens - or perhaps
low hundreds - of thousands of blocks I would hope you could cache the
entire data set to avoid millions of single block random reads.  The XML
stuff could, of course, make all of my guesstimates meaningless, so it would
be nice to get some ideas of the typical sizes of the XML component, and the
number of rows of service_relationship you expect to get (compared to the
optimizer's estimate of 25) for a typical (relationship_level, sid)
combination.

 

I have to admit that I don't understand how your inline scalar subqueries in
the select list guarantee to return a single value for every driving row -
in fact I don't understand why they don't raise a syntax error relating to
columns not in group by - but that's probably because I don't do much with
XML.

 

 

If you can't optimise the work done by the inline subqueries, you'll have to
re-engineer them into a join (probably outer) of an extra tables.

 

 

 

   
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle 

  _____  

From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on
behalf of Ebadi, Abdul [Abdul.Ebadi@xxxxxxxxxx]
Sent: 07 November 2014 22:02
To: tim@xxxxxxxxx; ORACLE-L
Subject: RE: Exadata Tuning Question+

Tim/others. thanks for your replies!  Below is a slightly generalized
version of the query to give you an idea of the joins and an actual explain
plan:

 

 

SELECT  bunch of stuff....., 

        

        (

                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT(
XMLAGG(XMLELEMENT("e", sr1.RELATED_SID

                        ||

                        ',')

                ORDER BY sr1.RELATED_SID), '//text()' ) ) , ',' )

                FROM    service_relationship sr1

                WHERE   sr1.SID                    = slv.SID

                        AND sr1.RELATIONSHIP_LEVEL = '1'

                GROUP BY sr1.SID

        ) AS RELATEDSERVICEINSTANCEIDLEVEL1,

        (

                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT(
XMLAGG(XMLELEMENT("e", sr2.RELATED_SID

                        ||

                        ',')

                ORDER BY sr2.RELATED_SID), '//text()' ) ) , ',' )

                FROM    service_relationship sr2

                WHERE   sr2.SID                    = slv.SID

                        AND sr2.RELATIONSHIP_LEVEL = '2'

                GROUP BY sr2.SID

        ) AS RELATEDSERVICEINSTANCEIDLEVEL2,

        (

               SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT(
XMLAGG(XMLELEMENT("e", sr3.RELATED_SID

                        ||

                        ',')

                ORDER BY sr3.RELATED_SID), '//text()' ) ) , ',' )

                FROM    service_relationship sr3

                WHERE   sr3.SID                    = slv.SID

                        AND sr3.RELATIONSHIP_LEVEL = '3'

                GROUP BY sr3.SID

        ) AS RELATEDSERVICEINSTANCEIDLEVEL3,

        (

                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT(
XMLAGG(XMLELEMENT("e", sr4.RELATED_SID

                        ||

                        ',')

                ORDER BY sr4.RELATED_SID), '//text()' ) ) , ',' )

                FROM    service_relationship sr4

                WHERE   sr4.SID                    = slv.SID

                        AND sr4.RELATIONSHIP_LEVEL = '4'

                GROUP BY sr4.SID

        ) AS RELATEDSERVICEINSTANCEIDLEVEL4,

        (

                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT(
XMLAGG(XMLELEMENT("e", sr5.RELATED_SID

                        ||

                        ',')

                ORDER BY sr5.RELATED_SID), '//text()' ) ) , ',' )

                FROM    service_relationship sr5

                WHERE   sr5.SID                    = slv.SID

                        AND sr5.RELATIONSHIP_LEVEL = '5'

                GROUP BY sr5.SID

        ) AS RELATEDSERVICEINSTANCEIDLEVEL5

FROM    service_lookup slv

        LEFT JOIN service_location sl

        ON      sl.service_location_id = slv.service_location_id;

 

 

 

 

 

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------------------------

Plan hash value: 1570133209

 

----------------------------------------------------------------------------
----------------------------------------------------------

| Id  | Operation                        | Name                 | Rows  |
Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

----------------------------------------------------------------------------
----------------------------------------------------------

|   0 | SELECT STATEMENT                 |                      |  7331K|
5593M|  1877   (5)| 00:00:01 |        |      |            |

|   1 |  SORT GROUP BY                   |                      |     1 |
22 |   368   (6)| 00:00:01 |        |      |            |

|   2 |   PX COORDINATOR                 |                      |       |
|            |          |        |      |            |

|   3 |    PX SEND QC (RANDOM)           | :TQ10000             |    25 |
550 |   368   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |

|   4 |     PX BLOCK ITERATOR            |                      |    25 |
550 |   368   (6)| 00:00:01 |  Q1,00 | PCWC |            |

|*  5 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |
550 |   368   (6)| 00:00:01 |  Q1,00 | PCWP |            |

 

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------------------------

|   6 |  SORT GROUP BY                   |                      |     1 |
22 |   368   (6)| 00:00:01 |        |      |            |

|   7 |   PX COORDINATOR                 |                      |       |
|            |          |        |      |            |

|   8 |    PX SEND QC (RANDOM)           | :TQ20000             |    25 |
550 |   368   (6)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |

|   9 |     PX BLOCK ITERATOR            |                      |    25 |
550 |   368   (6)| 00:00:01 |  Q2,00 | PCWC |            |

|* 10 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |
550 |   368   (6)| 00:00:01 |  Q2,00 | PCWP |            |

|  11 |  SORT GROUP BY                   |                      |     1 |
22 |   368   (6)| 00:00:01 |        |      |            |

|  12 |   PX COORDINATOR                 |                      |       |
|            |          |        |      |            |

|  13 |    PX SEND QC (RANDOM)           | :TQ30000             |    25 |
550 |   368   (6)| 00:00:01 |  Q3,00 | P->S | QC (RAND)  |

|  14 |     PX BLOCK ITERATOR            |                      |    25 |
550 |   368   (6)| 00:00:01 |  Q3,00 | PCWC |            |

|* 15 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |
550 |   368   (6)| 00:00:01 |  Q3,00 | PCWP |            |

|  16 |  SORT GROUP BY                   |                      |     1 |
22 |   368   (6)| 00:00:01 |        |      |            |

 

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------------------------

|  17 |   PX COORDINATOR                 |                      |       |
|            |          |        |      |            |

|  18 |    PX SEND QC (RANDOM)           | :TQ40000             |    25 |
550 |   368   (6)| 00:00:01 |  Q4,00 | P->S | QC (RAND)  |

|  19 |     PX BLOCK ITERATOR            |                      |    25 |
550 |   368   (6)| 00:00:01 |  Q4,00 | PCWC |            |

|* 20 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |
550 |   368   (6)| 00:00:01 |  Q4,00 | PCWP |            |

|  21 |  SORT GROUP BY                   |                      |     1 |
22 |   368   (6)| 00:00:01 |        |      |            |

|  22 |   PX COORDINATOR                 |                      |       |
|            |          |        |      |            |

|  23 |    PX SEND QC (RANDOM)           | :TQ50000             |    25 |
550 |   368   (6)| 00:00:01 |  Q5,00 | P->S | QC (RAND)  |

|  24 |     PX BLOCK ITERATOR            |                      |    25 |
550 |   368   (6)| 00:00:01 |  Q5,00 | PCWC |            |

|* 25 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |
550 |   368   (6)| 00:00:01 |  Q5,00 | PCWP |            |

|  26 |  PX COORDINATOR                  |                      |       |
|            |          |        |      |            |

|  27 |   PX SEND QC (RANDOM)            | :TQ60002             |  7331K|
5593M|  1877   (5)| 00:00:01 |  Q6,02 | P->S | QC (RAND)  |

 

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------------------------------

|* 28 |    HASH JOIN RIGHT OUTER BUFFERED|                      |  7331K|
5593M|  1877   (5)| 00:00:01 |  Q6,02 | PCWP |            |

|  29 |     PX RECEIVE                   |                      |  3175K|
920M|   366   (3)| 00:00:01 |  Q6,02 | PCWP |            |

|  30 |      PX SEND HASH                | :TQ60000             |  3175K|
920M|   366   (3)| 00:00:01 |  Q6,00 | P->P | HASH       |

|  31 |       PX BLOCK ITERATOR          |                      |  3175K|
920M|   366   (3)| 00:00:01 |  Q6,00 | PCWC |            |

|  32 |        TABLE ACCESS STORAGE FULL | SERVICE_LOCATION     |  3175K|
920M|   366   (3)| 00:00:01 |  Q6,00 | PCWP |            |

|  33 |     PX RECEIVE                   |                      |  7331K|
3467M|  1507   (5)| 00:00:01 |  Q6,02 | PCWP |            |

|  34 |      PX SEND HASH                | :TQ60001             |  7331K|
3467M|  1507   (5)| 00:00:01 |  Q6,01 | P->P | HASH       |

|  35 |       PX BLOCK ITERATOR          |                      |  7331K|
3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWC |            |

|  36 |        TABLE ACCESS STORAGE FULL | SERVICE_LOOKUP       |  7331K|
3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWP |            |

----------------------------------------------------------------------------
----------------------------------------------------------

 

 

From: Tim Gorman [mailto:tim@xxxxxxxxx] 
Sent: Friday, November 07, 2014 2:36 PM
To: ORACLE-L; Ebadi, Abdul
Subject: Re: Exadata Tuning Question+

 

Abdul,

Although it might be a "best practice" to remove indexes to encourage full
scans in parallel, its important to remember that in some circumstances this
might cause a lot of SORT-MERGE or HASH joins, especially if the query
involves joins to lots of big tables.  Those SM or HA joins will spend a lot
of time reading and writing to the temporary tablespace, and that I/O on
temporary tablespaces do not have any special off-loading or smart-scan
optimizations.  Exadata only has off-loading and smart-scan optimizations
for I/O operations taking place in datafiles.

For tables this size, you can't increase the PGA size enough to cache the
entire SM or HA join in memory, so that's probably why you haven't seen any
benefit.

Chances are good that SQL Plan Monitor or
DBMS_XPLAN.DISPLAY_CURSOR(option=>'ALLSTATS ALL') would show you that the
time spent scanning the row-sources is negligible, but the time spent on
joins is taking the majority of elapsed time.

How many and what type of join operations?  Are they straight-forward INNER
joins, or OUTER joins?  Are there any FULL OUTER JOINs or sub-queries?

Hope this helps...

-Tim



On 11/7/14 13:59, Ebadi, Abdul wrote:

We have a half rack 4-node Exadata (X2 high capacity) running several DW
databases for us.  We have a query going against a 21 million row table with
several self-joins in it.  This query returns 7 million rows takes way too
long too run (hrs).

 

We have made sure it is running in parallel using cell offloading (full
storage scans) and when we put a count(*) around the query it returns in
only 4 seconds for 7 million rows returned. 

However, when we display the output to the screen it takes hours for it to
finish and we see pauses in the display every second or two while it is
running.

Trying to figure out what is causing these pauses?  The wait is PX Deq: type
waits when it runs with these pauses.  SQL Monitor doesn't tell us much
either except cell efficiency is negative 85%!

 

We have increased PGA size and didn't make much difference.  We are
considering putting TEMP tablespace on flash cache possibly.  Another DBA
added an index to it just to see (bad idea on Exadata) and did't improve it.
Before making any more change we would like to see some evidence for root
cause.

 

We were told for best practices on Exadata it is better to remove indexes
and hints (if possible) and let the machine full scan in parallel using
storage offloading.  Do you guys agree and are there other best practices on
Exadata also?

 

Any other suggestions on tuning this query and also general Exadata best
practices?

 

Thanks,

Abdul

 

Other related posts: