Re: intermittent long "log file sync" waits

From: Andrew Kerber <andrew.kerber@xxxxxxxxx>
To: cstephens16@xxxxxxxxx
Date: Tue, 28 Jan 2020 10:19:19 -0600

Hmm. I would immediately question what else is going on when the log file
sync waits are longest. Anything in the alert log? It seems weird that the
IO waits arent very high, so it might be waiting on something else, though
I am not sure what. Perhaps the destination for the archive logs is very
busy, so your redo logs cant switch until the log switch is complete?

On Tue, Jan 28, 2020 at 10:11 AM Chris Stephens <cstephens16@xxxxxxxxx>
wrote:

3-node Oracle 19.3 RAC
Centos 7

We have a SQLAlchemy/Python based application workload that is running the
exact same steps with widely varying response times which appear to be
related to varying "log file sync" wait times.

Here is a profile of a "fast" run:

CALL-NAME                        DURATION       %   CALLS      MEAN
MIN        MAX
------------------------------  ---------  ------  ------  --------
--------  ---------
SQL*Net message from client     53.197782   91.8%  10,092  0.005271
0.000177  28.568493
EXEC                             3.759177    6.5%   9,816  0.000383
0.000000   0.239592
row cache lock                   0.233153    0.4%     541  0.000431
0.000113   0.000941
PARSE                            0.140399    0.2%   4,867  0.000029
0.000000   0.006620
DLM cross inst call completion   0.137330    0.2%     956  0.000144
0.000004   0.000505
library cache lock               0.100171    0.2%     215  0.000466
0.000151   0.002133
library cache pin                0.079729    0.1%     216  0.000369
0.000056   0.000710
FETCH                            0.058253    0.1%   1,062  0.000055
0.000000   0.004148
log file sync                    0.048217    0.1%     149  0.000324
0.000259   0.000505
CLOSE                            0.045416    0.1%   4,929  0.000009
0.000000   0.000073
20 others                        0.135624    0.2%  11,854  0.000011
0.000000   0.001700
------------------------------  ---------  ------  ------  --------
--------  ---------
TOTAL (30)                      57.935251  100.0%  44,697  0.001296
0.000000  28.568493

Here is a profile of a "slow" run:

CALL-NAME                         DURATION       %   CALLS      MEAN
MIN         MAX
------------------------------  ----------  ------  ------  --------
--------  ----------
SQL*Net message from client     131.186118   61.0%  10,092  0.012999
0.000212  106.789360
log file sync                    79.291166   36.8%     150  0.528608
0.000264    2.986575
EXEC                              3.728402    1.7%   9,816  0.000380
0.000000    0.221403
row cache lock                    0.248868    0.1%     542  0.000459
0.000111    0.001036
PARSE                             0.164267    0.1%   4,867  0.000034
0.000000    0.004652
DLM cross inst call completion    0.146981    0.1%     957  0.000154
0.000005    0.001188
library cache lock                0.104354    0.0%     218  0.000479
0.000160    0.000728
library cache pin                 0.082504    0.0%     202  0.000408
0.000157    0.000672
FETCH                             0.056687    0.0%   1,062  0.000053
0.000000    0.003969
CLOSE                             0.043590    0.0%   4,929  0.000009
0.000000    0.000180
20 others                         0.142044    0.1%  11,866  0.000012
0.000000    0.001792
------------------------------  ----------  ------  ------  --------
--------  ----------
TOTAL (30)                      215.194981  100.0%  44,701  0.004814
0.000000  106.789360

looking at even histogram for that event:

SQL> @evh "log file sync"
       EVH_EVENT    EVH_WAIT_TIME_MILLI    WAIT_COUNT    EVH_EST_TIME
                  LAST_UPDATE_TIME
________________ ______________________ _____________ _______________
______________________________________
log file sync                < 1               200051         100.026
27-JAN-20 11.39.57.344734 PM -06:00
log file sync                < 2                  165           0.248
28-JAN-20 12.18.10.429089 AM -06:00
log file sync                < 4                  150            0.45
27-JAN-20 11.18.31.158102 PM -06:00
log file sync                < 8                  199           1.194
27-JAN-20 11.19.14.209947 PM -06:00
log file sync               < 16                  253           3.036
28-JAN-20 08.03.17.851328 AM -06:00
log file sync               < 32                  472          11.328
27-JAN-20 11.20.22.746033 PM -06:00
log file sync               < 64                  728          34.944
28-JAN-20 01.13.37.364541 AM -06:00
log file sync              < 128                  691          66.336
27-JAN-20 11.31.37.400504 PM -06:00
log file sync              < 256                  414          79.488
28-JAN-20 12.18.10.423987 AM -06:00
log file sync              < 512                  405          155.52
28-JAN-20 03.27.50.540383 AM -06:00
log file sync             < 1024                  459         352.512
27-JAN-20 11.35.14.378363 PM -06:00
log file sync             < 2048                  482         740.352
28-JAN-20 01.18.20.556248 AM -06:00
log file sync             < 4096                  576        1769.472
27-JAN-20 11.21.05.084998 PM -06:00
log file sync             < 8192                   89         546.816
27-JAN-20 11.57.36.436460 AM -06:00
log file sync            < 16384                   60          737.28
25-JAN-20 07.48.31.460408 AM -06:00
log file sync            < 32768                   39         958.464
27-JAN-20 11.59.09.869286 AM -06:00
log file sync            < 65536                   27        1327.104
25-JAN-20 09.49.13.856563 AM -06:00

The weird thing is that I don't see corresponding log I/O waits (awaits)
in iostat output.

I have a ticket open w/ oracle but does anyone have any suggestions to
discover root cause and/or solution?

--
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'

References:
- intermittent long "log file sync" waits
  - From: Chris Stephens

Re: intermittent long "log file sync" waits

Other related posts: