Hi Stéphane & All, Sorry - should have mentioned the waits - nothing really significant on I/O. I've put the tkprof output for one of the main queries during a normal run and a slow run below (I hope the formatting isn't too bad). As you can see the CPU time on the slow run FETCH is much larger. In the explain plan this time appears to be concentrated on the hash joins. The raw trace files show lines like FETCH #10:c=80000,e=87368,p=1,cr=11,cu=0,mis=0,r=1,dep=0,og=1,tim=18141452357101 so it does seem to be genuine CPU time. As mentioned the server is lightly loaded (but not completely idle) at the time. (The application also seems to do a lot of unnecessary parsing but that doesn't appear to affect the time much) *** NORMAL RUN *** SELECT query: call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 11498 0.67 0.55 0 0 0 0 Execute 11498 15.40 15.25 0 0 0 0 Fetch 11497 134.96 133.06 2 174679 0 11497 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 34493 151.03 148.87 2 174679 0 11497 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 353 Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=11 pr=2 pw=0 time=23437 us) 4 HASH JOIN (cr=11 pr=2 pw=0 time=23087 us) 7 TABLE ACCESS BY INDEX ROWID ACCT_TRANS (cr=6 pr=0 pw=0 time=161 us) 7 INDEX RANGE SCAN ATB3_I (cr=3 pr=0 pw=0 time=103 us)(object id 8072) 13 VIEW index$_join$_001 (cr=5 pr=2 pw=0 time=22373 us) 13 HASH JOIN (cr=5 pr=2 pw=0 time=22335 us) 13 INLIST ITERATOR (cr=2 pr=0 pw=0 time=103 us) 13 INDEX RANGE SCAN BTB2_I (cr=2 pr=0 pw=0 time=89 us)(object id 10243) 80 INDEX FAST FULL SCAN BTB_PK (cr=3 pr=2 pw=0 time=17326 us)(object id 10244) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 11498 0.00 0.02 db file sequential read 2 0.01 0.01 SQL*Net message from client 11498 0.00 7.05 latch: shared pool 1 0.00 0.00 *** SLOW RUN *** call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 11497 0.76 0.61 0 0 0 0 Execute 11497 15.87 15.37 0 0 0 0 Fetch 11496 726.43 709.89 1 174669 0 11496 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 34490 743.06 725.89 1 174669 0 11496 Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 287 Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=11 pr=1 pw=0 time=87366 us) 4 HASH JOIN (cr=11 pr=1 pw=0 time=86957 us) 7 TABLE ACCESS BY INDEX ROWID ACCT_TRANS (cr=6 pr=0 pw=0 time=130 us) 7 INDEX RANGE SCAN ATB3_I (cr=3 pr=0 pw=0 time=83 us)(object id 8072) 13 VIEW index$_join$_001 (cr=5 pr=1 pw=0 time=86298 us) 13 HASH JOIN (cr=5 pr=1 pw=0 time=86244 us) 13 INLIST ITERATOR (cr=2 pr=0 pw=0 time=140 us) 13 INDEX RANGE SCAN BTB2_I (cr=2 pr=0 pw=0 time=111 us)(objectid 10243) 80 INDEX FAST FULL SCAN BTB_PK (cr=3 pr=1 pw=0 time=15315 us)(object id 10244) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 11497 0.00 0.02 db file sequential read 1 0.01 0.01 SQL*Net message from client 11497 0.00 9.39 latch: shared pool 2 0.00 0.00 Thanks again! Charlotte ----- Original Message ---- From: Stephane Faroult <sfaroult@xxxxxxxxxxxx> To: charlottejanehammond@xxxxxxxxx Cc: oracle-l@xxxxxxxxxxxxx Sent: Wednesday, November 5, 2008 10:32:41 PM Subject: Re: Random Slowdowns on Hash Joins Charlotte, Since you insist one the fact that the elapsed time is much higher, I presume that the CPU time is about the same. What do the trace file says about waits? That's probably where you'll find the difference. What kind of storage have you at the back? If it's a shared SAN, it's quite possible that some other application flushes the cache - same nominal number of physical I/Os for Oracle, but very different time since instead of hitting the SAN cache you really have to get the data from disk. It should show somewhere. If someone is randomly computing statistics in your back in another database on an other machine that shares the SAN storage, it can happen ... HTH Stéphane Faroult Charlotte Hammond wrote: > Hi All! We're running a batch job on test. After each run everything is > reset using an RMAN restore. So each time it runs with exactly the same data, > execution plans etc. > However sometimes (occassionally) its about 2-3 times as slow as normal. >Comparing a slow trace file with a normal trace file I find that execution >plans are the same (as expected) and the biggest difference is that HASH JOINs >all have (say) e=80000 instead of (say) e=20000. Hash Joins are small but >very frequent in this job, so this does add up to the time difference >observed. But why is this slow down happening? > The "server" is a Sun Solaris LDOM on a T2 processor with 32 CMTs >masquerading as 32 virtual processors. This may not be relevant but that's >where my suspicion is at the moment. The load is fairly consistent during each >test - a few (maybe 6 or 7) application connections doing minor bits of work >concurrently but the overall load is low. The batch job is single threaded (no >parallel queries etc.) Any ideas or suggestions for further diagnostics? >(This is Oracle 10.2.0.3 EE on Solaris 10) > (BTW - it's frustrating trying to get it to be slow when you want it, it just > sort of happens from time to time unpredictably) > Many thanks! > Charlotte > > -- > //www.freelists.org/webpage/oracle-l > > > > -- Stephane Faroult RoughSea Ltd <http://www.roughsea.com> Coming speaking engagements: Beijing, Shanghai <http://www.oracle.com/global/cn/education/promotions/stephanefaroult_cele_seminar_cn.htm>, Hong-Kong <http://www.oracle.com/education/images/apac_newsletter/seminar/hk_stephanefaroult.html> and Singapore <http://www.oracle.com/education/images/apac_newsletter/seminar/sg_stephanefaroult.html>. -- //www.freelists.org/webpage/oracle-l