Sudden occurrence of SHARED_POOL LATCH waits - DB up since 4/3/2018

  • From: Chris Taylor <christopherdtaylor1994@xxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 31 May 2018 11:29:02 -0500

version 12.1
Linux x86-64

EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
latch: shared pool 402,269 103.4K 256.95 29.5 Concurrency
DB CPU   100.3K   28.6
latch free 362,927 42.2K 116.26 12.0 Other
Background:
We've got a database with 4TB of RAM and a 260GB shared pool.  Everything
was running normally until 2 nights ago when a regularly scheduled job that
does TONS of DDL suddenly blew up the shared pool.

Trying to query x$ksmsp for shared pool sizes is also causing huge amounts
of latch waits.

When these jobs aren't running, the db is working normally.  Starting just
one of the jobs brings back the shared pool latching problem immediately.

I'm thinking the shared pool is heavily fragmented but am looking for
additional queries or views I can use to diagnose shared pool problems.
We're not running into any shared pool errors as far as memory allocations.

This query hangs and causes huge contention:

select ksmchcls "ChnkClass",
    sum(ksmchsiz) "SumChunkTypeMem",
    Max(ksmchsiz) "LargstChkofThisTyp",
    count(1) "NumOfChksThisType",
    round((sum(ksmchsiz)/tot_sp_mem.totspmem),2)*100||'%' "PctTotSPMem"
from x$ksmsp,
    (select sum(ksmchsiz) TotSPMem from x$ksmsp) tot_sp_mem
group by ksmchcls, tot_sp_mem.TotSPMem
order by 5;


This query runs fine and seems to indicate there is plenty of reserve space:

select free_space, round(avg_free_size,2) as
avg_free_size,free_count,max_free_size,used_space, round(avg_used_size,2)
as avg_used_size, used_count, max_used_size, requests, request_misses,
last_miss_size,
request_failures, last_failure_size, aborted_request_threshold,
aborted_requests
from v$shared_pool_reserved
/

FREE_SPACE AVG_FREE_SIZE FREE_COUNT MAX_FREE_SIZE USED_SPACE AVG_USED_SIZE
USED_COUNT MAX_USED_SIZE REQUESTS REQUEST_MISSES LAST_MISS_SIZE
REQUEST_FAILURES LAST_FAILURE_SIZE ABORTED_REQUEST_THRESHOLD
ABORTED_REQUESTS
13435398168 236538.7 16023 27316040 319665968 5627.92 40777 1048552 10903867
0 0 0 0 2147483647 0

The Jobs in question do PARTITION EXCHANGES for many, many sites with temp
tables that are built as part of the job.

The DB has been up since: 04/03/2018 07:12:23 and the last 2 days are the
first time this has been seen and is repeatable by restarting any one of
the jobs (there are 8 total)

Thinking about flushing the shared pool (or restarting db off hours) as a
test to see if it resolves the issue (like throwing a grenade into a
anthill resolves a problem with ants...not the best approach but might be
effective)

Any thoughts/comments/insults?

Thanks,
Chris

Other related posts: