Re: intermittent long "log file sync" waits

From: Chris Stephens <cstephens16@xxxxxxxxx>
To: oracle-l <Oracle-L@xxxxxxxxxxxxx>
Date: Fri, 7 Feb 2020 09:57:02 -0600

while i work on a way to enable 10005:4 for new sessions associate w/
problem workload (doesn't look like i can use logon trigger/execute
immediate for this), we have shut down 2 of 3 instances and are again
testing to see if that changes anything.

@Priit i think that would eliminate any remastering/RAC related sync
actions.

On Fri, Feb 7, 2020 at 7:53 AM Chris Stephens <cstephens16@xxxxxxxxx> wrote:

Crystal clear explanation for everything Tanel. I owe you a beer(s)
if/when we meet in real life.

I'm going to see if we can't catch the x$trace post/wait data as well as
psnapper on FG process and lgwr today. Will post any updates.

It is odd that a reboot of the nodes seems to fix the issue for a limited
time. I also keep going back to the slow writes to /tmp knowing GI makes
use of /tmp/.oracle.

you are correct: not exadata and not production.

thanks again! hope you have the time to get that big announcement out on
twitter today! love all those hacking sessions. i've learned to just block
out an extra hour or two after scheduled end time. :)

On Thu, Feb 6, 2020 at 10:43 PM Tanel Poder <tanel@xxxxxxxxxxxxxx> wrote:

Yeah, the "FG wakeup overhead per log file sync" derived metric
definitely shows that your foregrounds wait for log file syncs way longer
than the corresponding LGWR write duration. That metric divides the "redo
synch time overhead (usec)" with "redo synch writes" to get an average
extra latency per foreground log file sync. Normally it's just
microseconds, but in the output you sent it was close to 900
*milliseconds* per sync in some cases.

I've slightly edited one of the snapper output sections and removed some
less interesting lines:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID @INST, USERNAME  , TYPE, STATISTIC                               ,
      DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,
AVERAGES

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1717  @3, GEN3_MGOWE, STAT, *user commits                            ,
            6*,        .49,         ,             ,          ,
,        .08 per execution
  1717  @3, GEN3_MGOWE, STAT, *messages sent                           ,
            6*,        .49,         ,             ,          ,
,        .13 per execution
  1717  @3, GEN3_MGOWE, STAT, calls to kcmgas                         ,
          6,        .49,         ,             ,          ,           ,
   .27 per execution
  1717  @3, GEN3_MGOWE, STAT, redo entries                            ,
         64,       5.21,         ,             ,          ,           ,
  1.89 per execution
  1717  @3, GEN3_MGOWE, STAT, redo size                               ,
      22532,      1.83k,         ,             ,          ,           ,
5.63k bytes per user commit
  1717  @3, GEN3_MGOWE, STAT, redo synch time                         ,
       1190,      96.85,         ,             ,          ,           ,
  3.35 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch time (usec)                  ,
   11902840,    968.69k,         ,             ,          ,           ,
33.53k per execution
*  1717  @3, GEN3_MGOWE, STAT, redo synch time overhead (usec)         ,
      7933229,    645.63k,         ,             ,          ,           ,
274.72ms FG wakeup overhead per log file sync *
  1717  @3, GEN3_MGOWE, STAT, redo synch time overhead count (  2ms)  ,
          2,        .16,         ,             ,          ,           ,
   .06 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch time overhead count (inf)    ,
          4,        .33,         ,             ,          ,           ,
   .01 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch writes                       ,
          6,        .49,         ,             ,          ,           ,
    .1 per execution
  1717  @3, GEN3_MGOWE, STAT, redo synch long waits                   ,
          4,        .33,         ,             ,          ,           ,
   .02 per execution
  1717  @3, GEN3_MGOWE, STAT, redo write info find                    ,
          6,        .49,         ,             ,          ,           ,
   .08 per execution
  1717  @3, GEN3_MGOWE, STAT, undo change vector size                 ,
       5664,     460.95,         ,             ,          ,           ,
101.03 per execution
  1717  @3, GEN3_MGOWE, STAT, table fetch by rowid                    ,
          6,        .49,         ,             ,          ,           ,
  3.33 per execution
  1717  @3, GEN3_MGOWE, STAT, rows fetched via callback               ,
          6,        .49,         ,             ,          ,           ,
   .35 per execution
*  1717  @3, GEN3_MGOWE, WAIT, log file sync                           ,
     12202261,   993.06ms,    99.3%, [WWWWWWWWWW],         6,        .49,
   2.03s average wait*
  1717  @3, GEN3_MGOWE, WAIT, SQL*Net message to client               ,
         98,     7.98us,      .0%, [          ],        60,       4.88,
1.63us average wait
  1717  @3, GEN3_MGOWE, WAIT, SQL*Net message from client             ,
      94111,     7.66ms,      .8%, [W         ],        60,       4.88,
1.57ms average wait

--  End of Stats snap 247, end=2020-02-05 17:41:57, seconds=12.3

LGWR (as seen in another trace file) was almost completely idle during
the problem time.

Also, from the wait events and stats we can see that LGWR did all the
work by itself, LG* slaves weren't active (other than polling the current
SCN very briefly in one snapshot).

  1954  @3, (LG00)    , WAIT, LGWR worker group idle ,      12000000,
962.4ms,    96.2%, [WWWWWWWWWW],         0,          0,        12s average
wait
  2198  @3, (LG01)    , WAIT, LGWR worker group idle ,      12000000,
962.4ms,    96.2%, [WWWWWWWWWW],         0,          0,        12s average
wait

Since we don't see any FG metrics like *redo synch polls* and *redo
synch poll writes* it looks like you're using the plain old post/wait
mechanism with LGWR. This looks like a case of missed post to me.

As I understand this is not production, so you could use X$TRACE to dig
deeper into POST/WAIT messages. X$TRACE is actually always enabled anyway
(just not all events) so it should be safe for prod too, at least I haven't
crashed anything with it yet.

You'd need to enable KST event 10005 at level 4 for just one test session
and LGWR. I've used the Oracle PID below - 64 is the Oracle PID (not SPID)
that my session is using. Would need to do the same for LGWR.

SQL> ALTER TRACING ENABLE "10005:4:64";

Tracing altered.

And since X$TRACE is a circular buffer, you'd need to quickly dump the
X$TRACE buckets to disk or just CTAS it somewhere whenever you see a long
log file sync. You could use some PL/SQL loop that polls ASH or V$SESSION
and does such an INSERT-SELECT from X$TRACE whenever it sees anyone waiting
for "log file sync" for longer than 1 sec. You could also make the X$TRACE
buffers bigger to have a bit more time to react.

The output would give you a timeline of session wait events *and* POSTS
sent and received. You can use the "relative_time_sec" column to see when
the POSTs were actually sent by your FG and received by LGWR ... and vice
versa.

SQL> @xt sid=509

  relative_time_sec   event file.c         FUNCTION
sid data
------------------- ------- -------------- ------------------------
-------
------------------------------------------------------------------------------------------------------------------------------------
   634363269.049747   10005 ksl2.c@3349    kslwtbctx
509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1
0/0x0 wait_id=2764 seq_num=2766 snap_id=1
   634363279.784638   10005 ksl2.c@3492    kslwtectx
509 KSL WAIT END [SQL*Net message from client] 1952673792/0x74637000 1/0x1
0/0x0 wait_id=2764 seq_num=2766 snap_id=1
   634363279.784641   10005 ksl2.c@3496    kslwtectx
509 KSL WAIT END wait times (usecs) - snap=10734887, exc=10734887,
tot=10734887
   634363279.784850   10005 ksl2.c@19892   ksl_update_post_stats
509 *KSL POST SENT postee=20 num=3 loc='ksa2.h LINE:292 ID:ksasnd'
flags=0x0 id1=0 id2=0*
   634363279.784983   10005 ksl2.c@3349    kslwtbctx
509 KSL WAIT BEG [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765
seq_num=2767 snap_id=1
   634363279.785058   10005 ksl2.c@3492    kslwtectx
509 KSL WAIT END [Disk file operations I/O] 8/0x8 0/0x0 8/0x8 wait_id=2765
seq_num=2767 snap_id=1
   634363279.785060   10005 ksl2.c@3496    kslwtectx
509 KSL WAIT END wait times (usecs) - snap=77, exc=77, tot=77
   634363279.785207   10005 ksl2.c@3349    kslwtbctx
509 *KSL WAIT BEG [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0
wait_id=2766 seq_num=2768 snap_id=1*
   634363279.785888   10005 ksl2.c@18157   ksliwat
509 KSL FACILITY WAIT fac#=3 time_waited_usecs=678
   634363279.785895   10005 ksl2.c@18276   ksliwat
509 *KSL POST RCVD poster=22 num=8332 loc='ksl2.h LINE:4384 ID:kslpsr'
id1=162 id2=0 name=EV type=0 fac#=3 posted=0x3 may_be_posted=1*
   634363279.785904   10005 ksl2.c@3492    kslwtectx
509 *KSL WAIT END [log file sync] 4172/0x104c 143746492/0x89165bc 0/0x0
wait_id=2766 seq_num=2768 snap_id=1*
   634363279.785905   10005 ksl2.c@3496    kslwtectx
509 KSL WAIT END wait times (usecs) - snap=685, exc=685, tot=685
   634363279.785948   10005 ksl2.c@3349    kslwtbctx
509 KSL WAIT BEG [SQL*Net message to client] 1952673792/0x74637000 1/0x1
0/0x0 wait_id=2767 seq_num=2769 snap_id=1
   634363279.785952   10005 ksl2.c@3492    kslwtectx
509 KSL WAIT END [SQL*Net message to client] 1952673792/0x74637000 1/0x1
0/0x0 wait_id=2767 seq_num=2769 snap_id=1
   634363279.785953   10005 ksl2.c@3496    kslwtectx
509 KSL WAIT END wait times (usecs) - snap=6, exc=6, tot=6
   634363279.785964   10005 ksl2.c@3349    kslwtbctx
509 KSL WAIT BEG [SQL*Net message from client] 1952673792/0x74637000 1/0x1
0/0x0 wait_id=2768 seq_num=2770 snap_id=1

The *KSL POST SENT postee=20* shows you the Oracle PID of LGWR in the
above example. My FG session sent a wakeup message to LGWR.

The *KSL POST RCVD postee=22* shows that the "OK" reply came back from
another process (Oracle PID 22). This was the LG00 log writer slave in my
case. The LGnn slaves can do post-back to the foregrounds for scalability
(so that LGWR itself wouldn't spend so much time in the semaphore
operation system calls).

At the end of the day, this would still just confirm that LGWR misses a
post for some reason, you'd still need to find which bug causes it.

This doesn't seem to be Exadata, right? LGWR and commit processing with
Exafusion & Smart Fusion Block Transfer has gotten more complex (and
faster), the GC layer doesn't have to wait for log file sync completion
before shipping a modified block to remote instance, but commits on that
remote instance still have to confirm that the log write on the original
instance had completed. So there are additional potential
synchronization/wait points in that configuration. This feature should be
exadata-only, but maybe some artifacts from that implementation affect the
non-exa config too.

One thing of interest, FG snapper output shows a short "IPC group service
call" wait event right after a long log file sync ended and that's related
to the Exafusion KSIPC layer.

P.S. For people wondering, I talk about X$TRACE and advanced wait event &
post-wait topics in this hacking session:
https://www.youtube.com/watch?v=mkmvZv58W6w

--
Tanel Poder
https://tanelpoder.com/seminar

On Thu, Feb 6, 2020 at 1:55 PM Chris Stephens <cstephens16@xxxxxxxxx>
wrote:

At Sayan's excellent suggestion here is the full snapper output from
foreground sessions and lg* processes:

https://gist.github.com/cs2018ncsa/e9b0d442247f1434ceef21124bef18c3

On Thu, Feb 6, 2020 at 10:17 AM Chris Stephens <cstephens16@xxxxxxxxx>
wrote:

resending w/ inline snapper foreground output of max "FG wakeup
overhead per log file sync" time while i wait for moderator to approve
previous email w/ attachments of lgwr and foreground snapper output from
problem interval occurring later in the evening last night. like i said in
previous email, i'm not exactly sure how to zero in on issue but "FG
wakeup.." times look extremely suspicious to me.

FG:

*** 2020-02-05T17:44:15.593425-06:00
  1474  @3, GEN3_MGOWE, WAIT, SQL*Net message from client
                ,      12000000,   953.02ms,    95.3%, [WWWWWWWWWW],
  0,          0,        12s average wait
  1717  @3, GEN3_MGOWE, STAT, Requests to/from client
                ,            54,       4.29,         ,             ,
   ,           ,        .96 per execution
  1717  @3, GEN3_MGOWE, STAT, opened cursors cumulative
                ,            22,       1.75,         ,             ,
   ,           ,         .7 per execution
  1717  @3, GEN3_MGOWE, STAT, user commits
                 ,             5,         .4,         ,             ,
   ,           ,        .11 per execution
  1717  @3, GEN3_MGOWE, STAT, user calls
                 ,            74,       5.88,         ,             ,
   ,           ,        1.3 per execution
  1717  @3, GEN3_MGOWE, STAT, recursive calls
                ,            21,       1.67,         ,             ,
   ,           ,   251.62us recursive CPU per recursive call
  1717  @3, GEN3_MGOWE, STAT, session logical reads
                ,           167,      13.26,         ,             ,
   ,           ,     20.64k total buffer visits
  1717  @3, GEN3_MGOWE, STAT, CPU used when call started
                 ,             2,        .16,         ,             ,
   ,           ,      760ms total CPU used when call started
  1717  @3, GEN3_MGOWE, STAT, CPU used by this session
                 ,             2,        .16,         ,             ,
   ,           ,      760ms total CPU used by this session
  1717  @3, GEN3_MGOWE, STAT, DB time
                ,          1016,      80.69,         ,             ,
   ,           ,       12.8 per execution
  1717  @3, GEN3_MGOWE, STAT, non-idle wait time
                 ,          1234,         98,         ,             ,
   ,           ,      12.93 per execution
  1717  @3, GEN3_MGOWE, STAT, non-idle wait count
                ,           194,      15.41,         ,             ,
   ,           ,       3.72 per execution
  1717  @3, GEN3_MGOWE, STAT, process last non-idle time
                 ,            10,        .79,         ,             ,
   ,           ,      1.33M per execution
  1717  @3, GEN3_MGOWE, STAT, messages sent
                ,             6,        .48,         ,             ,
   ,           ,        .14 per execution

References:
- Re: intermittent long "log file sync" waits
  - From: Chris Stephens
- Re: intermittent long "log file sync" waits
  - From: Tanel Poder
- Re: intermittent long "log file sync" waits
  - From: Tanel Poder
- Re: intermittent long "log file sync" waits
  - From: Tanel Poder
- Re: intermittent long "log file sync" waits
  - From: Chris Stephens

Re: intermittent long "log file sync" waits

Other related posts: