Re: 10046 or gettimeofday bug..?

From: Cary Millsap <cary.millsap@xxxxxxxxxxxx>
To: mccmx@xxxxxxxxxxx
Date: Mon, 19 Jul 2010 21:47:56 -0500
Matt,

From my friend Andrew Zitelli, who has investigated this thoroughly: "These
are bugs. I believe on some Linux platforms only a single patch is required
to fix this. The two related patches are 7522002 and 8342329. I see that
both are available for Linux x86 11.1.0.7.0. This may be incompatible with
the PSU's and require one-off patches if the user is already running
11.1.0.7.1 or later. These are generic bugs but 8342329 afflicts primarily
Solaris, HP/UX, AIX and some Linux."

I hope this helps,

Cary Millsap
Method R Corporation
http://method-r.com
http://carymillsap.blogspot.com


On Mon, Jul 19, 2010 at 6:05 PM, Matt McClernon <mccmx@xxxxxxxxxxx> wrote:

>
> RHEL 4 Update 8 x86_64 (Kernel 2.6.9-89)Oracle 11.1.0.7 EE
>
> I'm seeing some strange behaviour in a 10046 trace file which appears to be
> a bug.  The CPU time is significantly greater than the Elapsed time for one
> of our key database transactions:
> ********************************************************************************begin
> epp_domain.domain_check(connection_id => :conid,session_id => :sesid, cltrid
> => :cltri,domains => :doms,response => :res,chkdata => :chkd); end;
> call     count       cpu    elapsed       disk      query    current
>  rows------- ------  -------- ---------- ---------- ---------- ----------
>  ----------Parse        0      0.00       0.00          0          0
>  0           0Execute   1919     11.66       7.30          0     107099
> 213114        1919Fetch        0      0.00       0.00          0          0
>          0           0------- ------  -------- ---------- ----------
> ---------- ----------  ----------total     1919     11.66       7.30
>  0     107099     213114        1919
>
> Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user
> id: 63
>
> Elapsed times include waiting on following events:  Event waited on
>                     Times   Max. Wait  Total
> Waited  ----------------------------------------   Waited  ----------
>  ------------  SQL*Net message to client                    1919        0.00
>          0.00  log file sync                                1815        0.32
>         10.91  SQL*Net message from client                  1919        0.41
>         21.25  library cache: mutex X                       1282        0.01
>          0.24  latch free                                     39        0.00
>          0.01  latch: row cache objects                       11        0.01
>          0.04  latch: cache buffers chains                     5        0.00
>          0.00
>
> ********************************************************************************
>
>
> I've had a read through Chapter 7 of Optimizing Oracle Performance to try
> to identify the cause of the missing time but none of the causes seem to fit
> this anomaly (at least not on this scale).
>
> So I started digging into the raw 10046 and found some of the EXEC database
> calls for that transaction had elapsed times of zero despite a CPU time of
> about 10,000.  Here is a snippet of the raw trace file showing this
> behaviour:
> EXEC
> #9:c=10998,e=44097,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545724361EXEC
> #9:c=11999,e=30131,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545961372EXEC
> #9:c=10999,e=17654,p=3,cr=95,cu=184,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546021160EXEC
> #9:c=15997,e=22003,p=1,cr=99,cu=108,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546054541EXEC
> #9:c=9998,e=0,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546056298EXEC
> #9:c=11999,e=15062,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546103366EXEC
> #9:c=10998,e=15593,p=0,cr=89,cu=155,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546136343EXEC
> #9:c=10998,e=19331,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546209544EXEC
> #9:c=12998,e=22715,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546232848
>
> This 'e=0' pattern occurs approximately every 20 executions of that
> transaction.If I check the 10046 trace I can see many hundreds of lines of
> recursive SQL (associated with an EXEC call that has e=0) all with the same
> "tim=" value. It's as if time stops briefly periodically.
>
> So I searched for lines with e=0 where c<>0 and found EXEC calls for
> recursive SQL which also seemed to be missing time:
>
> EXEC
> #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3207244213,tim=1278652525564956EXEC
> #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2502456972,tim=1278652525564956EXEC
> #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1940757184,tim=1278652525586956EXEC
> #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1906887687,tim=1278652525586956EXEC
> #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3034500768,tim=1278652525586956EXEC
> #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=328706489,tim=1278652525586956EXEC
> #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1515982702,tim=1278652525586956EXEC
> #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1278652525586956EXEC
> #10:c=1000,e=0,p=0,cr=5,cu=12,mis=0,r=1,dep=1,og=1,plh=1021020385,tim=1278652525586956
>
> To give you an idea of the extent of this 'pause', there are 94 lines with
> the same tim= value in the 10046 file:
>
> $ grep 1278652525586956 epp4MATT.trc |wc -l94
>
> Is this a bug in gettimeofday..?  or am I missing something here..?  any
> help appreciated.
>
>
>
> _________________________________________________________________
> http://clk.atdmt.com/UKM/go/197222280/direct/01/
> Do you have a story that started on Hotmail? Tell us now--
> //www.freelists.org/webpage/oracle-l
>
>
>
Follow-Ups:
- RE: 10046 or gettimeofday bug..?
  - From: Matt McClernon
References:
- 10046 or gettimeofday bug..?
  - From: Matt McClernon
Re: 10046 or gettimeofday bug..?

Other related posts: