RE: Exadata Tuning Question+

From: Iggy Fernandez <iggy_fernandez@xxxxxxxxxxx>
To: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>, ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Sat, 8 Nov 2014 20:09:55 -0800
re:  the tablescan cost for SERVICE_RELATIONSHIP to be virtually the same as 
that for SERVICE_LOCATION  which was predicted to return 3.1M rows.

I missed that clue.
Iggy
From: jonathan@xxxxxxxxxxxxxxxxxx
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: Exadata Tuning Question+
Date: Sat, 8 Nov 2014 13:42:03 +0000











Given the information we have it is inevitable that we make some basic 
assumptions - usually along on the principle of Ockham's (or Occam's) razor.  
In this case we could assume that the table was (fairly) large because the 
execution plan shows the tablescan
 cost for SERVICE_RELATIONSHIP to be virtually the same as that for 
SERVICE_LOCATION  which was predicted to return 3.1M rows.



Of course we are assuming that the tables are being considered at the same 
degree of parallelism and that their rows are of roughly similar length - and 
neither assumption need be true.



Thanks for the interpretation of what all those xml functions were about.




   
Regards 

Jonathan Lewis

http://jonathanlewis.wordpress.com

@jloracle 






From: Iggy Fernandez [iggy_fernandez@xxxxxxxxxxx]

Sent: 08 November 2014 04:59

To: Jonathan Lewis; ORACLE-L

Subject: RE: Exadata Tuning Question+






I take it back. With incomplete information to draw upon, it's a good guess 
that SERVICE_RELATIONSHIP is a big table. Perhaps SID is the primary key of 
SERVICE_LOOKUP and, since the query wants a comma-delimited list of RELATED_SID 
values, perhaps
 SERVICE_RELATIONSHIP has 25X more rows than SERVICE_LOOKUP. 
Hence we have 35 million full table scans of a monster table compared to the 
single scan that is actually needed.



Iggy






From: iggy_fernandez@xxxxxxxxxxx

To: jonathan@xxxxxxxxxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx

Subject: RE: Exadata Tuning Question+

Date: Fri, 7 Nov 2014 18:20:10 -0800





Jonathan,



Actually, I didn't read the comment that "we could expect the server to take a 
couple of seconds to populate the next array fetchsize of rows before returning 
them" before I got onto my ornery mule to deliver a sermon :-) But let me try
 and paint my way out of my corner.



I fell into a trap because the original posting did not include a query or 
query plan. But perhaps I can still argue that the cause of the pauses is 
equally likely to be screen buffering as anything else. The
 query plan is a hash join of service_lookup (a large table with 21 million 
rows out of which 7,331,000 rows satisfy the query filters) to service_location 
(a large lookup table with 3,175,000 rows satisfying the query filters) 
resulting in 7,331,000 rows.
 Each row of this result requires five scalar subqueries on 
service_relationship. We don't know the size of the service_relationship table 
but EXPLAIN PLAN thinks that the queries return 25 rows each so, in the absence 
of
 more data, I could reasonably assume that service_relationship is a small 
table. If I make such an assumption,
I could reasonably argue that Oracle should be able to perform fetches at a 
sustained clip.



But now that we have the query in hand, we know that is  grossly inefficient 
because it requires 35 million little subqueries--five for each of the 7 
million rows in the result. It could be rewritten as three inner joins and a 
pivot.




SELECT * FROM
(
  SELECT
    slv.sid,
    slv.service_location_id,
    sr.relationship_level,
    RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr.RELATED_SID 
|| ',') ORDER BY sr.RELATED_SID), '//text()' ) AS VARCHAR2(4000)) , ',' ) AS 
related_sid,
  FROM
    service_lookup slv
    LEFT JOIN service_location sl ON sl.service_location_id = 
slv.service_location_id
    LEFT JOIN service_relationship sr ON sr.sid = slv.sid AND 
sr.relationship_level IN ('1', '2', '3', '4', '5')
)
PIVOT (MAX(related_sid) FOR relationship_level IN (
  '1' AS RELATEDSERVICEINSTANCEIDLEVEL1,
  '2' AS RELATEDSERVICEINSTANCEIDLEVEL2,
  '3' AS RELATEDSERVICEINSTANCEIDLEVEL3,
  '4' AS RELATEDSERVICEINSTANCEIDLEVEL4,
  '5' AS RELATEDSERVICEINSTANCEIDLEVEL5
);




Instead of the hard-to-read series of XML functions, all that is needed is 
LISTAGG(sr.related_sid, ',') WITHIN GROUP (ORDER BY sr.related_sid) AS 
related_sid



Iggy




From: jonathan@xxxxxxxxxxxxxxxxxx

To: iggy_fernandez@xxxxxxxxxxx; oracle-l@xxxxxxxxxxxxx

Subject: RE: Exadata Tuning Question+

Date: Fri, 7 Nov 2014 23:49:07 +0000








I take it that this is a response to the other email I sent a few minutes ago, 
saying:



"While it doesn't make sense to return 21M rows to the screen, the output (in 
general) shouldn't pause every couple of seconds, as this would (usually) be a 
measure of much work the database
 had to do to acquire the next batch of rows."




and then explaining why (probably) the OP is seeing those pauses, viz:



"each row from the sample code requires 5 parallel tablescans of a reasonably 
large table, so we could expect the server to take a couple of seconds to 
populate the next array fetchsize of
 rows before returning them."









   

Regards

Jonathan Lewis

http://jonathanlewis.wordpress.com

@jloracle 






From: Iggy Fernandez [iggy_fernandez@xxxxxxxxxxx]

Sent: 07 November 2014 23:24

To: Jonathan Lewis; ORACLE-L

Subject: RE: Exadata Tuning Question+






Perhaps I'm stubborn as a mule that needs coffee but I don't see a problem. The 
question was "However, when we display the output
 to the screen it takes hours for it to finish and we see pauses in the display 
every second or two while it is running. Trying to figure
 out what is causing these pauses?" The OP is trying to bring the 7 million 
rows over the network to his laptop and scroll them in a Putty screen or 
something similar, and should expect serious buffering issues and time to 
completion.




Iggy







From: jonathan@xxxxxxxxxxxxxxxxxx

To: oracle-l@xxxxxxxxxxxxx

Subject: RE: Exadata Tuning Question+

Date: Fri, 7 Nov 2014 23:06:54 +0000








The 4 seconds you quoted is the time required to do the driving join, which is 
(roughly) lines 26 - 36 of your plan.
The rest of the time would be the 35M parallel tablescans of 
SERVICE_RELATIONSHIP which appear as the 5 groups of 5 lines in the earlier 
part of the plan.



Was the suggested index on SERVICE_RELATIONSHIP(relationship_level,
 sid) ? This would seem to be the only choice - perhaps with hinting - to 
change the efficiency of the query as it is currently written - since the size 
of the service_relationship table is (probably) in the range of tens - or 
perhaps low hundreds - of thousands
 of blocks I would hope you could cache the entire data set to avoid millions 
of single block random reads.  The XML stuff could, of course, make all of my 
guesstimates meaningless, so it would be nice to get some ideas of the typical 
sizes of the XML component,
 and the number of rows of service_relationship you expect to get (compared to 
the optimizer's estimate of 25) for a typical (relationship_level, sid) 
combination.



I have to admit that I don't understand how your inline scalar subqueries in 
the select list guarantee to return a single value for every driving row - in
 fact I don't understand why they don't raise a syntax error relating to 
columns not in group by - but that's probably because I don't do much with XML.






If you can't optimise the work done by the inline subqueries, you'll have to 
re-engineer them into a join (probably outer) of an extra tables.









   

Regards

Jonathan Lewis

http://jonathanlewis.wordpress.com

@jloracle 






From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on behalf 
of Ebadi, Abdul [Abdul.Ebadi@xxxxxxxxxx]

Sent: 07 November 2014 22:02

To: tim@xxxxxxxxx; ORACLE-L

Subject: RE: Exadata Tuning Question+







Tim/others… thanks for your replies!  Below is a slightly generalized version 
of the query to give you an idea of the joins and an actual explain plan:
 
 
SELECT  bunch of stuff.....,

        
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( 
XMLAGG(XMLELEMENT("e", sr1.RELATED_SID
                        ||
                        ',')
                ORDER BY sr1.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr1
                WHERE   sr1.SID                    = slv.SID
                        AND sr1.RELATIONSHIP_LEVEL = '1'
                GROUP BY sr1.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL1,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( 
XMLAGG(XMLELEMENT("e", sr2.RELATED_SID
                        ||
                        ',')
                ORDER BY sr2.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr2
                WHERE   sr2.SID                    = slv.SID
                        AND sr2.RELATIONSHIP_LEVEL = '2'
                GROUP BY sr2.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL2,
        (
               SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( 
XMLAGG(XMLELEMENT("e", sr3.RELATED_SID
                        ||
                        ',')
                ORDER BY sr3.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr3
                WHERE   sr3.SID                    = slv.SID
                        AND sr3.RELATIONSHIP_LEVEL = '3'
                GROUP BY sr3.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL3,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( 
XMLAGG(XMLELEMENT("e", sr4.RELATED_SID
                        ||
                        ',')
                ORDER BY sr4.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr4
                WHERE   sr4.SID                    = slv.SID
                        AND sr4.RELATIONSHIP_LEVEL = '4'
                GROUP BY sr4.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL4,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( 
XMLAGG(XMLELEMENT("e", sr5.RELATED_SID
                        ||
                        ',')
                ORDER BY sr5.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr5
                WHERE   sr5.SID                    = slv.SID
                        AND sr5.RELATIONSHIP_LEVEL = '5'
                GROUP BY sr5.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL5
FROM    service_lookup slv
        LEFT JOIN service_location sl
        ON      sl.service_location_id = slv.service_location_id;
 
 
 
 
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1570133209
 
--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes 
| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                      |  7331K|  
5593M|  1877   (5)| 00:00:01 |        |      |            |
|   1 |  SORT GROUP BY                   |                      |     1 |    22 
|   368   (6)| 00:00:01 |        |      |            |
|   2 |   PX COORDINATOR                 |                      |       |       
|            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10000             |    25 |   550 
|   368   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR            |                      |    25 |   550 
|   368   (6)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 
|   368   (6)| 00:00:01 |  Q1,00 | PCWP |            |
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   6 |  SORT GROUP BY                   |                      |     1 |    22 
|   368   (6)| 00:00:01 |        |      |            |
|   7 |   PX COORDINATOR                 |                      |       |       
|            |          |        |      |            |
|   8 |    PX SEND QC (RANDOM)           | :TQ20000             |    25 |   550 
|   368   (6)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |
|   9 |     PX BLOCK ITERATOR            |                      |    25 |   550 
|   368   (6)| 00:00:01 |  Q2,00 | PCWC |            |
|* 10 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 
|   368   (6)| 00:00:01 |  Q2,00 | PCWP |            |
|  11 |  SORT GROUP BY                   |                      |     1 |    22 
|   368   (6)| 00:00:01 |        |      |            |
|  12 |   PX COORDINATOR                 |                      |       |       
|            |          |        |      |            |
|  13 |    PX SEND QC (RANDOM)           | :TQ30000             |    25 |   550 
|   368   (6)| 00:00:01 |  Q3,00 | P->S | QC (RAND)  |
|  14 |     PX BLOCK ITERATOR            |                      |    25 |   550 
|   368   (6)| 00:00:01 |  Q3,00 | PCWC |            |
|* 15 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 
|   368   (6)| 00:00:01 |  Q3,00 | PCWP |            |
|  16 |  SORT GROUP BY                   |                      |     1 |    22 
|   368   (6)| 00:00:01 |        |      |            |
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  17 |   PX COORDINATOR                 |                      |       |       
|            |          |        |      |            |
|  18 |    PX SEND QC (RANDOM)           | :TQ40000             |    25 |   550 
|   368   (6)| 00:00:01 |  Q4,00 | P->S | QC (RAND)  |
|  19 |     PX BLOCK ITERATOR            |                      |    25 |   550 
|   368   (6)| 00:00:01 |  Q4,00 | PCWC |            |
|* 20 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 
|   368   (6)| 00:00:01 |  Q4,00 | PCWP |            |
|  21 |  SORT GROUP BY                   |                      |     1 |    22 
|   368   (6)| 00:00:01 |        |      |            |
|  22 |   PX COORDINATOR                 |                      |       |       
|            |          |        |      |            |
|  23 |    PX SEND QC (RANDOM)           | :TQ50000             |    25 |   550 
|   368   (6)| 00:00:01 |  Q5,00 | P->S | QC (RAND)  |
|  24 |     PX BLOCK ITERATOR            |                      |    25 |   550 
|   368   (6)| 00:00:01 |  Q5,00 | PCWC |            |
|* 25 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 
|   368   (6)| 00:00:01 |  Q5,00 | PCWP |            |
|  26 |  PX COORDINATOR                  |                      |       |       
|            |          |        |      |            |
|  27 |   PX SEND QC (RANDOM)            | :TQ60002             |  7331K|  
5593M|  1877   (5)| 00:00:01 |  Q6,02 | P->S | QC (RAND)  |
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 28 |    HASH JOIN RIGHT OUTER BUFFERED|                      |  7331K|  
5593M|  1877   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  29 |     PX RECEIVE                   |                      |  3175K|   
920M|   366   (3)| 00:00:01 |  Q6,02 | PCWP |            |
|  30 |      PX SEND HASH                | :TQ60000             |  3175K|   
920M|   366   (3)| 00:00:01 |  Q6,00 | P->P | HASH       |
|  31 |       PX BLOCK ITERATOR          |                      |  3175K|   
920M|   366   (3)| 00:00:01 |  Q6,00 | PCWC |            |
|  32 |        TABLE ACCESS STORAGE FULL | SERVICE_LOCATION     |  3175K|   
920M|   366   (3)| 00:00:01 |  Q6,00 | PCWP |            |
|  33 |     PX RECEIVE                   |                      |  7331K|  
3467M|  1507   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  34 |      PX SEND HASH                | :TQ60001             |  7331K|  
3467M|  1507   (5)| 00:00:01 |  Q6,01 | P->P | HASH       |
|  35 |       PX BLOCK ITERATOR          |                      |  7331K|  
3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWC |            |
|  36 |        TABLE ACCESS STORAGE FULL | SERVICE_LOOKUP       |  7331K|  
3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------
 
 


From: Tim Gorman [mailto:tim@xxxxxxxxx]


Sent: Friday, November 07, 2014 2:36 PM

To: ORACLE-L; Ebadi, Abdul

Subject: Re: Exadata Tuning Question+


 

Abdul,



Although it might be a "best practice" to remove indexes to encourage full 
scans in parallel, its important to remember that in some circumstances this 
might cause a lot of SORT-MERGE or HASH joins, especially if the query involves 
joins to lots of big tables. 
 Those SM or HA joins will spend a lot of time reading and writing to the 
temporary tablespace, and that I/O on temporary tablespaces do not have any 
special off-loading or smart-scan optimizations.  Exadata only has off-loading 
and smart-scan optimizations
 for I/O operations taking place in datafiles.



For tables this size, you can't increase the PGA size enough to cache the 
entire SM or HA join in memory, so that's probably why you haven't seen any 
benefit.



Chances are good that SQL Plan Monitor or 
DBMS_XPLAN.DISPLAY_CURSOR(option=>'ALLSTATS ALL') would show you that the time 
spent scanning the row-sources is negligible, but the time spent on joins is 
taking the majority of elapsed time.



How many and what type of join operations?  Are they straight-forward INNER 
joins, or OUTER joins?  Are there any FULL OUTER JOINs or sub-queries?



Hope this helps...



-Tim







On 11/7/14 13:59, Ebadi, Abdul wrote:









We have a half rack 4-node Exadata (X2 high capacity) running several DW 
databases for us.  We have a query going against a 21 million row table
 with several self-joins in it.  This query returns 7 million rows takes way 
too long too run (hrs).
 
We have made sure it is running in parallel using cell offloading (full storage 
scans) and when we put a count(*) around the query it returns in
 only 4 seconds for 7 million rows returned. 
However, when we display the output to the screen it takes hours for it to 
finish and we see pauses in the display every second or two while it
 is running.
Trying to figure out what is causing these pauses?  The wait is PX Deq: type 
waits when it runs with these pauses.  SQL Monitor doesn’t tell us
 much either except cell efficiency is negative 85%!
 
We have increased PGA size and didn’t make much difference.  We are considering 
putting TEMP tablespace on flash cache possibly.  Another DBA added
 an index to it just to see (bad idea on Exadata) and did’t improve it.  Before 
making any more change we would like to see some evidence for root cause.
 
We were told for best practices on Exadata it is better to remove indexes and 
hints (if possible) and let the machine full scan in parallel using
 storage offloading.  Do you guys agree and are there other best practices on 
Exadata also?
 
Any other suggestions on tuning this query and also general Exadata best 
practices?
 
Thanks,
Abdul
Follow-Ups:
- RE: Exadata Tuning Question+
  - From: Iggy Fernandez
References:
- Re: Exadata Tuning Question+
  - From: Tim Gorman
- RE: Exadata Tuning Question+
  - From: Ebadi, Abdul
- RE: Exadata Tuning Question+
  - From: Jonathan Lewis
- RE: Exadata Tuning Question+
  - From: Iggy Fernandez
- RE: Exadata Tuning Question+
  - From: Jonathan Lewis
- RE: Exadata Tuning Question+
  - From: Iggy Fernandez
- RE: Exadata Tuning Question+
  - From: Iggy Fernandez
- RE: Exadata Tuning Question+
  - From: Jonathan Lewis
RE: Exadata Tuning Question+

Other related posts: