Matt, From my friend Andrew Zitelli, who has investigated this thoroughly: "These are bugs. I believe on some Linux platforms only a single patch is required to fix this. The two related patches are 7522002 and 8342329. I see that both are available for Linux x86 11.1.0.7.0. This may be incompatible with the PSU's and require one-off patches if the user is already running 11.1.0.7.1 or later. These are generic bugs but 8342329 afflicts primarily Solaris, HP/UX, AIX and some Linux." I hope this helps, Cary Millsap Method R Corporation http://method-r.com http://carymillsap.blogspot.com On Mon, Jul 19, 2010 at 6:05 PM, Matt McClernon <mccmx@xxxxxxxxxxx> wrote: > > RHEL 4 Update 8 x86_64 (Kernel 2.6.9-89)Oracle 11.1.0.7 EE > > I'm seeing some strange behaviour in a 10046 trace file which appears to be > a bug. The CPU time is significantly greater than the Elapsed time for one > of our key database transactions: > ********************************************************************************begin > epp_domain.domain_check(connection_id => :conid,session_id => :sesid, cltrid > => :cltri,domains => :doms,response => :res,chkdata => :chkd); end; > call count cpu elapsed disk query current > rows------- ------ -------- ---------- ---------- ---------- ---------- > ----------Parse 0 0.00 0.00 0 0 > 0 0Execute 1919 11.66 7.30 0 107099 > 213114 1919Fetch 0 0.00 0.00 0 0 > 0 0------- ------ -------- ---------- ---------- > ---------- ---------- ----------total 1919 11.66 7.30 > 0 107099 213114 1919 > > Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user > id: 63 > > Elapsed times include waiting on following events: Event waited on > Times Max. Wait Total > Waited ---------------------------------------- Waited ---------- > ------------ SQL*Net message to client 1919 0.00 > 0.00 log file sync 1815 0.32 > 10.91 SQL*Net message from client 1919 0.41 > 21.25 library cache: mutex X 1282 0.01 > 0.24 latch free 39 0.00 > 0.01 latch: row cache objects 11 0.01 > 0.04 latch: cache buffers chains 5 0.00 > 0.00 > > ******************************************************************************** > > > I've had a read through Chapter 7 of Optimizing Oracle Performance to try > to identify the cause of the missing time but none of the causes seem to fit > this anomaly (at least not on this scale). > > So I started digging into the raw 10046 and found some of the EXEC database > calls for that transaction had elapsed times of zero despite a CPU time of > about 10,000. Here is a snippet of the raw trace file showing this > behaviour: > EXEC > #9:c=10998,e=44097,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545724361EXEC > #9:c=11999,e=30131,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545961372EXEC > #9:c=10999,e=17654,p=3,cr=95,cu=184,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546021160EXEC > #9:c=15997,e=22003,p=1,cr=99,cu=108,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546054541EXEC > #9:c=9998,e=0,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546056298EXEC > #9:c=11999,e=15062,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546103366EXEC > #9:c=10998,e=15593,p=0,cr=89,cu=155,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546136343EXEC > #9:c=10998,e=19331,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546209544EXEC > #9:c=12998,e=22715,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546232848 > > This 'e=0' pattern occurs approximately every 20 executions of that > transaction.If I check the 10046 trace I can see many hundreds of lines of > recursive SQL (associated with an EXEC call that has e=0) all with the same > "tim=" value. It's as if time stops briefly periodically. > > So I searched for lines with e=0 where c<>0 and found EXEC calls for > recursive SQL which also seemed to be missing time: > > EXEC > #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3207244213,tim=1278652525564956EXEC > #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2502456972,tim=1278652525564956EXEC > #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1940757184,tim=1278652525586956EXEC > #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1906887687,tim=1278652525586956EXEC > #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3034500768,tim=1278652525586956EXEC > #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=328706489,tim=1278652525586956EXEC > #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1515982702,tim=1278652525586956EXEC > #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1278652525586956EXEC > #10:c=1000,e=0,p=0,cr=5,cu=12,mis=0,r=1,dep=1,og=1,plh=1021020385,tim=1278652525586956 > > To give you an idea of the extent of this 'pause', there are 94 lines with > the same tim= value in the 10046 file: > > $ grep 1278652525586956 epp4MATT.trc |wc -l94 > > Is this a bug in gettimeofday..? or am I missing something here..? any > help appreciated. > > > > _________________________________________________________________ > http://clk.atdmt.com/UKM/go/197222280/direct/01/ > Do you have a story that started on Hotmail? Tell us now-- > //www.freelists.org/webpage/oracle-l > > >