Re: seconds_in_wait

  • From: Jay Hostetter <hostetter.jay@xxxxxxxxx>
  • To: Stefan Koehler <contact@xxxxxxxx>
  • Date: Wed, 23 Sep 2015 11:39:45 -0400

Stefan,

I first saw the issue in this database around 5:41 pm on 9/17 (give or
take a few minutes based on the monitoring script). Here is a glimpse of
the trace file at that time:


*** 2015-09-13 21:20:56.785
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(2486888)usec at (1442193656785390) whereas (1000000) is allowed

*** 2015-09-13 21:20:59.796
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(2302344)usec at (1442193659796615) whereas (1000000) is allowed

*** 2015-09-13 21:21:02.152
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(1707628)usec at (1442193662152364) whereas (1000000) is allowed
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time stalled at
1442529387154649

*** 2015-09-17 17:36:10.826
kstmchkdrift (kstmhighrestimecntkeeper:highres): Stall, backward drift
ended at 1442529387194058 drift: 39409
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3615)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3609)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3603)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3597)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3591)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3585)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3579)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3573)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(3567)microsecs at 1442529386


The "Time moved backward" messages continue until 9/20. At which point I
see this:

kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(7)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Time moved backward by
(1)microsecs at 1442529386
kstmchkdrift (kstmhighrestimecntkeeper:lowres): Stall, backward drift ended
at 1442529387 drift: 1
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(7254467)usec at (1442798339931054) whereas (1000000) is allowed

*** 2015-09-20 20:28:34.040
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(1116348)usec at (1442798930408237) whereas (1000000) is allowed

*** 2015-09-20 20:28:35.358
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(1153249)usec at (1442798931725990) whereas (1000000) is allowed

*** 2015-09-20 20:28:42.326
kstmchkdrift (kstmhighrestimecntkeeper:highres): Time jumped forward by
(1365489)usec at (1442798938694604) whereas (1000000) is allowed


No other messages appear after 9/20.

I see note 1185093.1 talks about a bug for time drift, but the fix appears
to simply disable the message.

Thank you,
Jay


On Wed, Sep 23, 2015 at 8:43 AM, Stefan Koehler <contact@xxxxxxxx> wrote:

Hi Jay,

I mean that I had waited around 122 seconds. If there was an issue with
the SGA time or system time, I would expect the astronomical number I see
for my log file parallel write wait event.

Ah ok, i just missed the negation in your sentence. However you can not
reproduce it at will and it is a sporadic issue. So it is even harder to
trace
and capture.


Yes, they are running on VMWare, on different guests.

Can you please check the VTKM trace file (in DIAG directory) for large
time drifts then - especially at the time window when you hit this
seconds_in_wait issue?

Best Regards
Stefan Koehler

Freelance Oracle performance consultant and researcher
Homepage: http://www.soocs.de
Twitter: @OracleSK

Jay Hostetter <hostetter.jay@xxxxxxxxx> hat am 23. September 2015 um
13:23 geschrieben:

Stefan,

>Do you mean that it shows 122 seconds, but you have waited much less?

I mean that I had waited around 122 seconds. If there was an issue
with the SGA time or system time, I would expect the astronomical number I
see
for my log file parallel write wait event.

>If this is the current (correct) time and the query still returns the
wrong seconds_in_wait at this point in time then it may likely be caused by
>the stored start time of the wait event (which is also based on epoch
time).

I agree. I believe it is this particular wait event.

>Are these two databases virtualized in some kind of way (e.g. VMware,
etc.)?

Yes, they are running on VMWare, on different guests.

Thank you,
Jay

Other related posts: