Re: High response time with top 'Other' and 'application' wait class

From: Lok P <loknath.73@xxxxxxxxx>
To: Tanel Poder <tanel@xxxxxxxxxxxxxx>
Date: Tue, 29 Nov 2022 13:52:57 +0530

Thank You Tanel.

_serial_direct_read is set as default 'AUTO'. The blocks for table CB in
dba_tables is showing as - 124. So yes, its really a tiny reference data
table.

Also i checked dba_hist_active_sess_history there were no "cell multiblock
physical read" noted against that sql in past too. All i can see is "cell
smart table scan" ,"enq: KO - fast object checkpoint","reliable message".
But during the issue period it just flooded with more "enq: KO - fast
object checkpoint" and "reliable message" wait events. As you said it may
be that the bug still persists.

On Tue, 29 Nov, 2022, 10:57 am Tanel Poder, <tanel@xxxxxxxxxxxxxx> wrote:

Oh I just noticed that the CB table is just 41 rows. How big is it in
segment size (blocks) and does DBA_TABLES.BLOCKS also show a tiny number?

Do you have *_serial_direct_read* set to *always* by any chance? (that
would explain why direct path read/smart scans are attempted even on tiny
tables).

Could still be some bug too, sometimes bugs that get fixed in a version,
aren't completely fixed to cater for all scenarios and complexity levels...

But if you created a nice index to avoid the FTS completely and got index
range scan/index unique scan instead, the problem would likely be gone.
Even without bugs, doing smart scan on a 41 row table is gonna have much
higher overhead than just doing a single buffer get into a single-block
index in the buffer cache.

--
Tanel Poder
https://learn.tanelpoder.com

On Mon, Nov 28, 2022 at 9:55 PM Lok P <loknath.73@xxxxxxxxx> wrote:

Hello ,
It's version 19C of the oracle database. We got complaints from the
application team of sudden slowness and thus app traffic automatically
redirected to another active DB replica.While looking into the DB side, we
do observe high wait events from wait class 'Other' and 'Application' in
OEM. It lasted for exactly 2-3 minutes as the application team is
complaining about. And then fetching the "ash wait chain" and "ash top" ,
it's pointing to significantly high wait event like "reliable message"
(~75%), "enq: KO - fast object checkpoint","gcs drm freeze in enter server
mode" from same wait class 'other' and 'application'.

Below is the ASH top and ASH wait chain from the issue duration of
~5minutes interval. And the select query which is coming on top. And table
CB is holding just ~41 rows in it.
https://gist.github.com/oraclelearner/44394ab8206fc7bd51041eb3d45bdf9f

And also the top query which is showing in "ash top" is a SELECT query
and it does have a FTS in it (which is suggesting why the checkpoint
waits), however, this select query doesn't have any plan change observed in
it or not any high execution. And also we are not seeing any specific
blocking session for this. The "wait chain" showing LMON/lock monitor
process in one of the lines also does point to some event like "Wait for
msg sends to complete ''. Not yet able to figure out how these are related
to this issue. Still trying to understand what could have caused such a
scenario?

This SELECT query executes 100's of thousands of times in ~15minutes
window. So is it possible that this query(or say its underlying table CB)
might have thrashed out from buffer cache to disk and fetched data from
disk rather than cache for that time and this causing all these sort of
issues? And considering this table is very small (~1MB in size and holding
just ~41 rows), should we consider putting it as inmemory by changing
"inmemory_clause_default" to 'BASE_LEVEL' and "INMEMORY_SIZE" to 16GB ?

References:
- High response time with top 'Other' and 'application' wait class
  - From: Lok P
- Re: High response time with top 'Other' and 'application' wait class
  - From: Tanel Poder

Re: High response time with top 'Other' and 'application' wait class

Other related posts: