RE: Spotting the real cause of a Production slowdown (10g)

Thanks to all for the many replies on this - it gave us some really good
ideas. Unfortunately, we are in the middle of such a problem now, and I
highly doubt that the literal sql statements are causing our latch waits
and high cpu usage right now. Additionally, the shared_pool is not
changing that much (relative to past events, anywhere from 2.5gb to 3gb,
sga_target = 10g). CPU is being consumed by many active processes, all
doing "normal" things that usually do not consume this much CPU. At
least, that is what we believe at this point in time.
 
Paul, thanks for the hash_plan_value query - it definitely helps
disprove that we have a truckload of literal statements (in other words,
I do not think literal statements is our problem, at least not in large
numbers). Niall, I am still seeing a number of statements that are not
sampled into v$active_session_history, but those that are are showing a
lot of library cache waits.

________________________________

From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Schultz, Charles
Sent: Thursday, April 20, 2006 8:15 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Spotting the real cause of a Production slowdown (10g)



We have observed several critical slowdowns of our Production system
since upgrading to 10g. The symptoms include a rapid rise in library
cache latch waits (with associated, but less notable, rises in shared
pool and library cache pin waits), CPU pegged at 100% and CPU load in
triple-digits, and massive shared pool resizing evidenced by
v$sga_resize_ops (1.7gb added in 15 seconds). Obviously, trying to run
queries against the normal v$ views is rather slow (only adds to the
latching problem). We sent a hanganalyze and systemstate dumps to Oracle
Support, but after a couple months of trying to figure things out, we
have word that the initial Support Engineer may have misdiagnosed our
problem.

Based on what little I know and researched (reading a number of notes
here, Metalinks, google, oraFAQ, etc), a large number of literal sql
statements might be able to cause such symptoms. It is just hard for me
to imagine the number of literal statements that would consume 1.7gb of
shared pool in 15 seconds. Perhaps it is not quantity, but quality as
well, each statement consuming a large chunk of real estate. The only
problem with this conjecture is that I do not find evidence of this in
v$sql (looking at number of executions, number of literals,
sharable_mem).

But the puzzle is even more elusive. EM reports extremely (abnormally)
large number of executions for seemingly random sql statements. In one
case, we had 3 statements that were each executed 35 million times in
one hour (2 of those statements had bind variables, the other was a
"simple" select). I can never catch these large numbers in v$sql, so I
have to assume they age out by the time we start to look. I have asked
the application folks to see if these queries are related, and why the
execution count is a couple orders of magnitude greater than normal.
Still waiting for resolution at that end. It seems the general consensus
that these larger numbers are symptomatic as well, that they only start
to escalate when the system is severely slow and the middle-tier starts
to timeout. NOTE: that is the working hypothesis, we are trying to prove
it.

So, my question is two-fold. Do you experts out there have advice as to
how we can reactively figure out what already happened? And how to
proactively avoid this from happening again? Because of the automatic
AWR snapshots only happening each hour, it is hard for us to get more
granular when looking at a historical view. Perhaps we should
temporarily increase that?

Any and all advice is gladly welcome. I am hoping to learn some of the
tricks and glean from your experience. 

Thanks in advance, 

PS - after reading about Kyle Hailey and Roger Sanders work, anyone have
access to a DMA for dummies? =) 

charles schultz 
oracle dba 
aits - adsd 
university of illinois 


Other related posts: