Fw: Random Slowdowns on Hash Joins

From: Charlotte Hammond <charlottejanehammond@xxxxxxxxx>
To: oracle-l@xxxxxxxxxxxxx
Date: Thu, 6 Nov 2008 14:24:28 -0800 (PST)
Hi Stéphane  & All,

Sorry - should have mentioned the waits - nothing really significant on I/O.   
I've put the tkprof output for one of the main queries during a normal run and 
a slow run below 
(I hope the formatting isn't too bad).

As you can see the CPU time on the slow run FETCH is much larger.  

In the explain plan this time appears to be concentrated on the hash joins.  
The raw trace files show lines like
FETCH 
#10:c=80000,e=87368,p=1,cr=11,cu=0,mis=0,r=1,dep=0,og=1,tim=18141452357101 so 
it does seem to be genuine CPU time.

As mentioned the server is lightly loaded (but not completely idle) at the 
time. (The application also seems to do a lot of unnecessary parsing but that 
doesn't appear to affect the time much)


*** NORMAL RUN ***

SELECT query:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    11498      0.67       0.55          0          0          0           0
Execute  11498     15.40      15.25          0          0          0           0
Fetch    11497    134.96     133.06          2     174679          0       11497
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    34493    151.03     148.87          2     174679          0       11497

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 353

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=11 pr=2 pw=0 time=23437 us)
      4   HASH JOIN  (cr=11 pr=2 pw=0 time=23087 us)
      7    TABLE ACCESS BY INDEX ROWID ACCT_TRANS (cr=6 pr=0 pw=0 time=161 us)
      7     INDEX RANGE SCAN ATB3_I (cr=3 pr=0 pw=0 time=103 us)(object id 8072)
     13    VIEW  index$_join$_001 (cr=5 pr=2 pw=0 time=22373 us)
     13     HASH JOIN  (cr=5 pr=2 pw=0 time=22335 us)
     13      INLIST ITERATOR  (cr=2 pr=0 pw=0 time=103 us)
     13       INDEX RANGE SCAN BTB2_I (cr=2 pr=0 pw=0 time=89 us)(object id 
10243)
     80      INDEX FAST FULL SCAN BTB_PK (cr=3 pr=2 pw=0 time=17326 us)(object 
id 10244)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   11498        0.00          0.02
  db file sequential read                         2        0.01          0.01
  SQL*Net message from client                 11498        0.00          7.05
  latch: shared pool                              1        0.00          0.00

*** SLOW RUN ***
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    11497      0.76       0.61          0          0          0           0
Execute  11497     15.87      15.37          0          0          0           0
Fetch    11496    726.43     709.89          1     174669          0       11496
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    34490    743.06     725.89          1     174669          0       11496
 
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 287
 
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=11 pr=1 pw=0 time=87366 us)
      4   HASH JOIN  (cr=11 pr=1 pw=0 time=86957 us)
      7    TABLE ACCESS BY INDEX ROWID ACCT_TRANS (cr=6 pr=0 pw=0 time=130 us)
      7     INDEX RANGE SCAN ATB3_I (cr=3 pr=0 pw=0 time=83 us)(object id 8072)
     13    VIEW  index$_join$_001 (cr=5 pr=1 pw=0 time=86298 us)
     13     HASH JOIN  (cr=5 pr=1 pw=0 time=86244 us)
     13      INLIST ITERATOR  (cr=2 pr=0 pw=0 time=140 us)
     13       INDEX RANGE SCAN BTB2_I (cr=2 pr=0 pw=0 time=111 us)(objectid 
10243)
     80      INDEX FAST FULL SCAN BTB_PK (cr=3 pr=1 pw=0 time=15315 us)(object 
id 10244)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   11497        0.00          0.02
  db file sequential read                         1        0.01          0.01
  SQL*Net message from client                 11497        0.00          9.39
  latch: shared pool                              2        0.00          0.00
 



Thanks again!   
Charlotte

 


----- Original Message ----
From: Stephane Faroult <sfaroult@xxxxxxxxxxxx>
To: charlottejanehammond@xxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Sent: Wednesday, November 5, 2008 10:32:41 PM
Subject: Re: Random Slowdowns on Hash Joins

Charlotte,

Since you insist one the fact that the elapsed time is much higher, I presume 
that the CPU time is about the same. What do the trace file says about waits? 
That's probably where you'll find the difference.
What kind of storage have you at the back? If it's a shared SAN, it's quite 
possible that some other application flushes the cache - same nominal number of 
physical I/Os for Oracle, but very different time since instead of hitting the 
SAN cache you really have to get the data from disk. It should show somewhere. 
If someone is randomly computing statistics in your back in another database on 
an other machine that shares the SAN storage, it can happen ...

HTH

Stéphane Faroult

Charlotte Hammond wrote:
> Hi All!  We're running a batch job on test. After each run everything is 
> reset using an RMAN restore. So each time it runs with exactly the same data, 
> execution plans etc.
>  However sometimes (occassionally) its about 2-3 times as slow as normal. 
>Comparing a slow trace file with a normal trace file I find that execution 
>plans are the same (as expected) and the biggest difference is that HASH JOINs 
>all have (say) e=80000 instead of (say) e=20000.  Hash Joins are small but 
>very frequent in this job, so this does add up to the time difference 
>observed. But why is this slow down happening?
>  The "server" is a Sun Solaris LDOM on a T2 processor with 32 CMTs 
>masquerading as 32 virtual processors. This may not be relevant but that's 
>where my suspicion is at the moment. The load is fairly consistent during each 
>test - a few (maybe 6 or 7) application connections doing minor bits of work 
>concurrently but the overall load is low. The batch job is single threaded (no 
>parallel queries etc.)  Any ideas or suggestions for further diagnostics? 
>(This is Oracle 10.2.0.3 EE on Solaris 10)
> (BTW - it's frustrating trying to get it to be slow when you want it, it just 
> sort of happens from time to time unpredictably)
>  Many thanks!
> Charlotte 
> 
>      --
> //www.freelists.org/webpage/oracle-l
> 
> 
> 
>  


-- Stephane Faroult
RoughSea Ltd <http://www.roughsea.com>
Coming speaking engagements:
Beijing, Shanghai 
<http://www.oracle.com/global/cn/education/promotions/stephanefaroult_cele_seminar_cn.htm>,
 Hong-Kong 
<http://www.oracle.com/education/images/apac_newsletter/seminar/hk_stephanefaroult.html>
 and Singapore 
<http://www.oracle.com/education/images/apac_newsletter/seminar/sg_stephanefaroult.html>.
 


      
--
//www.freelists.org/webpage/oracle-l
Fw: Random Slowdowns on Hash Joins

Other related posts: