Short spikes in latch contention? Do you use SGA_TARGET / ASMM? Shrinking shared pool can flush out some of your library cache contents and cause this trouble. I understand you have AWR licenses so you can query DBA_HIST_SGASTAT to see whether you had SGA resize ops happening at the same time with the latch contention spikes. You can easily visualize memory usage from DBA_HIST_SGASTAT with PerfSheet if you like, like I've done here: http://blog.tanelpoder.com/2009/06/04/ora-04031-errors-and-monitoring-shared-pool-subpool-memory-utilization-with-sgastatxsql/ Another reason for spikes in library cache latch contention are crappy monitoring tools, which poll V$SQL or V$SQLAREA too often. Queries against these views walk through entire library cache (unless you look up a specific hash value). And in order to walk through entire library cache you need to get and hold library cache latches a lot. Depending which version of enterprise manager version you use, the OEM itself may be the problem. Since 10.2 EM should be able to use V$SQLSTATS for real time top SQL activity monitoring, instead of V$SQL. V$SQLSTATS is a separate array in shared pool and reading it does not require library cache latches (each bucket in the stats array is protected by a separate mutex instead). You can use LatchProfx for systematic troubleshooting of latching issues... or just write a PL/SQL loop which dumps v$latchholder contents to a table every couple of seconds and you will see the library cache latch holding session IDs from there once the problem happens again. Tanel. On Thu, Oct 15, 2009 at 2:26 AM, Martin Berger <martin.a.berger@xxxxxxxxx>wrote: > Charles, > have you read Tanel Poders > http://blog.tanelpoder.com/2009/09/13/kgl-simulator-shared-pool-simulator-and-buffer-cache-simulator-what-are-these/ > especially these quotes might match in your env: > > As these allocations are dynamic, then when your application suddenly > starts making lots of hard parses due newly released code, causing memory > shortage, then the simulator can make things even worse as it can make > relatively big allocations for itself too! > and > > Note that I’m talking about big systems here, ... > > It's just out of a kind of reflex, every time I hear library latch I check > Tanels blog, latchprofX and others are just too helpful. :) > > might it help, > Martin > > > Am 14.10.2009 um 15:42 schrieb Charles Schultz: > > Good day, list, > *The Symptoms* > We have observed a gradual rise in library cache latch waits in our > Production database. Oracle Enterprise Manager shows these as small, > short-lived "spikes" in the Top Activity Monitor. Our experience is that > these spikes generally grow in intensity and duration (impacting more users > for a longer period of time). We have observed that turning off OEM and the > agent helps to alleviate these latch waits. > > *The System* > Oracle EE 10.2.0.2 on Solaris 10. Upgrading to 10.2.0.4 this weekend. > Hardware is Sun F15k with 44 CPUs and 96GB of RAM. > > *The Support* > In our first SR with Oracle, the "resolution" was to stop using OEM. This > is a very poor response in my opinion, for it does not address the root > problem at all; it offers no insight as to what is causing the latch waits > in the first place, nor is it a very effective way to treat a problem as it > merely assuages the situation, not eliminating it. > > In our second SR, I have an analyst who wants me to turn on event 10046 > level 12. Good and dandy, I would have done that myself if not for the fact > that we often get hit with mutex hangs triggered by rowsource execution > statistics. Besides, the analyst is simply fishing for something, he has > given no indication that he really understands the situation. He has > identified several large, cpu-intensive queries; how does cpu-intensive > relate to latch waits, especially considering the fact that our box is only > about 12% cpu utilized. I admit, perhaps I am especially dense today and he > is spot on. Perhaps. > > *The Synopsis* > Since it is library cache latches that we are talking about, I looked at > the shared pool fragmentation. Since fragmentation generally gets worse as > time progresses, it is my theory that perhaps the libcache latch requests > are taking longer and longer to fulfill as the there are more (and smaller) > chunks of memory to search. I am currently running a couple diagnostic > scripts (forgot which Metalink note I got them from, but it is a memory > diagnostic suite). We know from past experience that flushing the shared > pool has a marginal affect; which makes sense, as x$kssmp shows us that a > small fraction of the shared pool is flushable. We also know from past > experience that if we let the latch waits continue to get worse, the system > approaches a critical mass and we have to bounce it. And nobody likes to > bounce the Production system. =) > > So, one big (obvious) caveat. We are upgrading to 10.2.0.4 this weekend > which means we will inherit a ton of bug fixes, as well as those wonderful > new bugs that come along with it. Unfortunately, the PSU (10.2.0.4.1 bundle) > came out too late for our regression tests, but I have scheduled that for > future regression tests. If memory fragmentation actually is the root > problem, I have my doubts as to whether 10.2.0.4 is actually going to help > us or not. If anyone has any insights as to whether or not this might be > true, I am all ears. > > > -- Tanel Poder http://blog.tanelpoder.com