I was observing a foreground process yesterday which was running a series of
batched updates from Java in a single thread and was running very slowly.
Each element in the batch was updating a single row via a unique scan.
The execution time of this feed was reported as having tripled since moving to
Performance was atrocious. For example, from AWR over a period of 15-16 hours,
an average size batch of a couple of hundred elements was averaging anywhere
between 8 and over 100 seconds per execution per hour, the vast majority of
time in cluster related waits. Averages hide a whole bunch of detail of course
but a useful indicator.
I was observing from GV$SESSION and GV$ASH and the source of the cluster waits
seems to be related to KTSJ slave activity and there was strong correlation of
the “two” (java update plus multiple active KTSJ slaves) working on the same
datafile/blocks – series of the two doing gc buffer busy release, gc buffer
busy acquire, gc current block busy with the occasional cell single block
physical read. Blocking session information on some of the gc waits
occasionally pointing at the other (update blocked by KTSJ or vice versa)
Reading all the responses oracle-l thread from May 2020 on KTSJ was the best
source of information I could find:
And a couple of bug references leading from there, not all of which relevant to
my version (19.6) but giving indications of what might be going on:
* blocks are not marked as free in assm after delete - 12.2 and later (Doc
* performance degradation by w00 processes after applying july 2020 dbru
(Doc ID 32075777.8) superseded by
* force full repair enabled by fix control and populate repair list even if
_assm_segment_repair_bg=false (Doc ID 32234161.8)
With mention of the parameter _assm_segment_repair_bg.
Per the explanations in the oracle-l thread, seems to be foreground session
doing something which then prompts background session to check/fix the ASSM
information. But in my case, this fixing is causing significant contention back
to the foreground session.
I ran snapper on some of the KTSJ slaves and of the ASSM fix related stats,
ASSM bg: slave fix state was consistently around 5000 in a 5 second period.
That is not a statistic I have any context to judge the value of.
This is a monthly feed so doesn’t happen every day but when it does it sits in
a critical path. It’s finished now so there’s not a lot I can look at now if
it’s not in ASH. Obviously a next step is to try to reproduce this in a test
I just wondered whether anyone had done any further investigation into this
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10