Hmm. I would immediately question what else is going on when the log file
sync waits are longest. Anything in the alert log? It seems weird that the
IO waits arent very high, so it might be waiting on something else, though
I am not sure what. Perhaps the destination for the archive logs is very
busy, so your redo logs cant switch until the log switch is complete?
On Tue, Jan 28, 2020 at 10:11 AM Chris Stephens <cstephens16@xxxxxxxxx>
wrote:
3-node Oracle 19.3 RAC
Centos 7
We have a SQLAlchemy/Python based application workload that is running the
exact same steps with widely varying response times which appear to be
related to varying "log file sync" wait times.
Here is a profile of a "fast" run:
CALL-NAME DURATION % CALLS MEAN
MIN MAX
------------------------------ --------- ------ ------ --------
-------- ---------
SQL*Net message from client 53.197782 91.8% 10,092 0.005271
0.000177 28.568493
EXEC 3.759177 6.5% 9,816 0.000383
0.000000 0.239592
row cache lock 0.233153 0.4% 541 0.000431
0.000113 0.000941
PARSE 0.140399 0.2% 4,867 0.000029
0.000000 0.006620
DLM cross inst call completion 0.137330 0.2% 956 0.000144
0.000004 0.000505
library cache lock 0.100171 0.2% 215 0.000466
0.000151 0.002133
library cache pin 0.079729 0.1% 216 0.000369
0.000056 0.000710
FETCH 0.058253 0.1% 1,062 0.000055
0.000000 0.004148
log file sync 0.048217 0.1% 149 0.000324
0.000259 0.000505
CLOSE 0.045416 0.1% 4,929 0.000009
0.000000 0.000073
20 others 0.135624 0.2% 11,854 0.000011
0.000000 0.001700
------------------------------ --------- ------ ------ --------
-------- ---------
TOTAL (30) 57.935251 100.0% 44,697 0.001296
0.000000 28.568493
Here is a profile of a "slow" run:
CALL-NAME DURATION % CALLS MEAN
MIN MAX
------------------------------ ---------- ------ ------ --------
-------- ----------
SQL*Net message from client 131.186118 61.0% 10,092 0.012999
0.000212 106.789360
log file sync 79.291166 36.8% 150 0.528608
0.000264 2.986575
EXEC 3.728402 1.7% 9,816 0.000380
0.000000 0.221403
row cache lock 0.248868 0.1% 542 0.000459
0.000111 0.001036
PARSE 0.164267 0.1% 4,867 0.000034
0.000000 0.004652
DLM cross inst call completion 0.146981 0.1% 957 0.000154
0.000005 0.001188
library cache lock 0.104354 0.0% 218 0.000479
0.000160 0.000728
library cache pin 0.082504 0.0% 202 0.000408
0.000157 0.000672
FETCH 0.056687 0.0% 1,062 0.000053
0.000000 0.003969
CLOSE 0.043590 0.0% 4,929 0.000009
0.000000 0.000180
20 others 0.142044 0.1% 11,866 0.000012
0.000000 0.001792
------------------------------ ---------- ------ ------ --------
-------- ----------
TOTAL (30) 215.194981 100.0% 44,701 0.004814
0.000000 106.789360
looking at even histogram for that event:
SQL> @evh "log file sync"
EVH_EVENT EVH_WAIT_TIME_MILLI WAIT_COUNT EVH_EST_TIME
LAST_UPDATE_TIME
________________ ______________________ _____________ _______________
______________________________________
log file sync < 1 200051 100.026
27-JAN-20 11.39.57.344734 PM -06:00
log file sync < 2 165 0.248
28-JAN-20 12.18.10.429089 AM -06:00
log file sync < 4 150 0.45
27-JAN-20 11.18.31.158102 PM -06:00
log file sync < 8 199 1.194
27-JAN-20 11.19.14.209947 PM -06:00
log file sync < 16 253 3.036
28-JAN-20 08.03.17.851328 AM -06:00
log file sync < 32 472 11.328
27-JAN-20 11.20.22.746033 PM -06:00
log file sync < 64 728 34.944
28-JAN-20 01.13.37.364541 AM -06:00
log file sync < 128 691 66.336
27-JAN-20 11.31.37.400504 PM -06:00
log file sync < 256 414 79.488
28-JAN-20 12.18.10.423987 AM -06:00
log file sync < 512 405 155.52
28-JAN-20 03.27.50.540383 AM -06:00
log file sync < 1024 459 352.512
27-JAN-20 11.35.14.378363 PM -06:00
log file sync < 2048 482 740.352
28-JAN-20 01.18.20.556248 AM -06:00
log file sync < 4096 576 1769.472
27-JAN-20 11.21.05.084998 PM -06:00
log file sync < 8192 89 546.816
27-JAN-20 11.57.36.436460 AM -06:00
log file sync < 16384 60 737.28
25-JAN-20 07.48.31.460408 AM -06:00
log file sync < 32768 39 958.464
27-JAN-20 11.59.09.869286 AM -06:00
log file sync < 65536 27 1327.104
25-JAN-20 09.49.13.856563 AM -06:00
The weird thing is that I don't see corresponding log I/O waits (awaits)
in iostat output.
I have a ticket open w/ oracle but does anyone have any suggestions to
discover root cause and/or solution?