Re: High Buffer busy waits

From: Stefan Koehler <contact@xxxxxxxx>
To: learnerdatabase99@xxxxxxxxx, Oracle L <oracle-l@xxxxxxxxxxxxx>
Date: Wed, 2 Aug 2023 08:40:06 +0200 (CEST)

Hello Yudhi,
Tanel, Priit and all the others already provided you good hints how to
troubleshoot - so let me just focus on one of your open questions.

But is there any way possible through which we can be more certain , it's
really the same bug which is impacting us?

Bug #33973908 refers to an issue with "DBWR write coalescing" - so you have
several options to dive deeper here:

1) Disable "DBWR write coalescing" (as mentioned in the MOS ID) and check if
issue occurs again (e.g. by setting "_db_writer_coalesce_write_limit" /
"_db_writer_coalesce_area_size")
2) Profile the C-stack of DBWR(s) with low overhead, e.g. by using perf/0xtools
(xCPU) and check if you can spot something in the stack that is related to
"DBWR write coalescing"
3) Profile off-CPU C-stack of DBWR(s) with low overhead, e.g. by using eBPF/BCC
(offcputime) and check if you can spot something in the stack that is related
to "DBWR write coalescing"

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher
Website: www.soocs.de
Twitter: @OracleSK

yudhi s <learnerdatabase99@xxxxxxxxx> hat am 01.08.2023 22:56 CEST
geschrieben:

Thank you so much.

Actually for this database, there exists an active-active HA setup
(bi-directional sync is happening using golden gate replication). And we saw
when the application with the same workload is running in that
other/secondary database but we don't see such high buffer busy waits during
the peak period.

We have fast_start_mttr_target value set as zero in both the databases.

And further checking the latest one entry from dba_registry_sqlpatch , in the
database in which we are getting buffer busy waits is "MERGE ON DATABASE RU
19.15.0.0.0 OF 29213893 30664385"
whereas the other/secondary database which takes that workload without any
issue is "MERGE ON DATABASE RU 19.19.0.0.0 OF 35233847 33526630".So it's
likely to be related to the highlighted bug related to buffer cache- which
Priit mentioned. DBWR not picking up writes for some time (Doc ID 33973908.8)

But is there any way possible through which we can be more certain , it's
really the same bug which is impacting us?

Actually we are now live on the secondary database on which the issue is not
occurring but mostly it will come again when we will move back to the primary
database. However, we have AWR data and that event trend for
dba_hist_system_event is saying we had a higher spike in " log file switch
(checkpoint incomplete) " exactly during the same issue period. So wondering,
if there exists any other waits which trend could be verified during the time
to prove the point of the mentioned bug? And the bug suggests "the DB writer
not picking up write for some time", so does it mean that the wait chain
which we are seeing is abnormal, as its showing us the ultimate blocker as
"idleblocker 1,2395,34375 (oracle@XXXXXXXXXXX (DBW0)" , so the idlekeyword
itself pointing to the bug/idle DBW0 here?

--
//www.freelists.org/webpage/oracle-l

Follow-Ups:
- shared pool size and wait event 'direct path read temp'
  - From: Flora Deng
- Re: High Buffer busy waits
  - From: yudhi s

References:
- Re: High Buffer busy waits
  - From: yudhi s

Re: High Buffer busy waits

Other related posts: