Charles I'm coming a little late to this discussion, but it does sound like this is being caused by lots of statements (or more properly, a single statement executed multiple times) using literal values where bind variables would be more appropriate. I recently came across this problem at a client site. They were sending out tens of thousands of emails using dynamically created dbms_job statements which each had a literal value within the dbms_job call. In 10g, even a simple select statement can take out 20-30 latches during a hard parse. I used the following script (gleaned from asktom.oracle.com) to find the offending sql: create table t1 as select sql_text from v$sqlarea; alter table t1 add sql_text_wo_constants varchar2(1000); create or replace function remove_constants( p_query in varchar2 ) return varchar2 as l_query long; l_char varchar2(1); l_in_quotes boolean default FALSE; begin for i in 1 .. length( p_query ) loop l_char := substr(p_query,i,1); if ( l_char = '''' and l_in_quotes ) then l_in_quotes := FALSE; elsif ( l_char = '''' and NOT l_in_quotes ) then l_in_quotes := TRUE; l_query := l_query || '''#'; end if; if ( NOT l_in_quotes ) then l_query := l_query || l_char; end if; end loop; l_query := translate( l_query, '0123456789', '@@@@@@@@@@' ); for i in 0 .. 8 loop l_query := replace( l_query, lpad('@',10-i,'@'), '@' ); l_query := replace( l_query, lpad(' ',10-i,' '), ' ' ); end loop; return upper(l_query); end; / update t1 set sql_text_wo_constants = remove_constants(sql_text); select sql_text_wo_constants, count(*) from t1 group by sql_text_wo_constants having count(*) > 100 order by 2 / This will replace literals with '@' Ray > -----Original Message----- > From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] > On Behalf Of Schultz, Charles > Sent: Friday, April 21, 2006 7:59 AM > To: Alex Gorbachev > Cc: oracle-l@xxxxxxxxxxxxx > Subject: RE: Spotting the real cause of a Production slowdown (10g) > > That is insightful, thank you. > > In light of trying to prove it (so that we can officially point our > finger and resolve this issue so it does not happen again), it sounds > like we are going to have to focus on the AWR to see if the workload is > actually higher or not. Again, hypothetically, if the latch waits are > merely a nuisance, they make the analysis that much harder because they > in turn cause some of the cpu load. Is it possible to use AWR and > attempt to exclude latch waits and their side effects? Perhaps we need > to concentrate on periods of time before latch waits become an issue? > > Again, thanks for this thought. Another direction for us to pursue. > > -----Original Message----- > From: Alex Gorbachev [mailto:gorbyx@xxxxxxxxx] > Sent: Thursday, April 20, 2006 4:50 PM > To: Schultz, Charles > Cc: oracle-l@xxxxxxxxxxxxx > Subject: Re: Spotting the real cause of a Production slowdown (10g) > > Imagine that you have a bit more workload at some point and it leads to > higher CPU consumption. It also makes latching issue (since concurrency > increases) more proversive. Processes spinning more on latches consume > more CPU and keep other latches longer... and since CPU is becoming a > "rare" resource (well, bottleneck), taking a latch becoming more and > more difficult and longer and longer... > catch-twenty-two. > > 2006/4/20, Schultz, Charles <sac@xxxxxxxxxxxxx>: > > > > Thanks to all for the many replies on this - it gave us some really > > good ideas. Unfortunately, we are in the middle of such a problem now, > > > and I highly doubt that the literal sql statements are causing our > > latch waits and high cpu usage right now. Additionally, the > > shared_pool is not changing that much (relative to past events, > > anywhere from 2.5gb to 3gb, sga_target = 10g). CPU is being consumed > by many active processes, all doing "normal" > > things that usually do not consume this much CPU. At least, that is > > what we believe at this point in time. > > > > Paul, thanks for the hash_plan_value query - it definitely helps > > disprove that we have a truckload of literal statements (in other > > words, I do not think literal statements is our problem, at least not > in large numbers). > > Niall, I am still seeing a number of statements that are not sampled > > into v$active_session_history, but those that are are showing a lot of > > > library cache waits. > > > > ________________________________ > > > > From: oracle-l-bounce@xxxxxxxxxxxxx > > [mailto:oracle-l-bounce@xxxxxxxxxxxxx] > > On Behalf Of Schultz, Charles > > Sent: Thursday, April 20, 2006 8:15 AM > > > > To: oracle-l@xxxxxxxxxxxxx > > Subject: Spotting the real cause of a Production slowdown (10g) > > > > > > > > > > We have observed several critical slowdowns of our Production system > > since upgrading to 10g. The symptoms include a rapid rise in library > > cache latch waits (with associated, but less notable, rises in shared > > pool and library cache pin waits), CPU pegged at 100% and CPU load in > > triple-digits, and massive shared pool resizing evidenced by > > v$sga_resize_ops (1.7gb added in > > 15 seconds). Obviously, trying to run queries against the normal v$ > > views is rather slow (only adds to the latching problem). We sent a > > hanganalyze and systemstate dumps to Oracle Support, but after a > > couple months of trying to figure things out, we have word that the > > initial Support Engineer may have misdiagnosed our problem. > > > > Based on what little I know and researched (reading a number of notes > > here, Metalinks, google, oraFAQ, etc), a large number of literal sql > > statements might be able to cause such symptoms. It is just hard for > > me to imagine the number of literal statements that would consume > > 1.7gb of shared pool in 15 seconds. Perhaps it is not quantity, but > > quality as well, each statement consuming a large chunk of real > > estate. The only problem with this conjecture is that I do not find > > evidence of this in v$sql (looking at number of executions, number of > literals, sharable_mem). > > > > But the puzzle is even more elusive. EM reports extremely (abnormally) > > > large number of executions for seemingly random sql statements. In one > > > case, we had 3 statements that were each executed 35 million times in > > one hour (2 of those statements had bind variables, the other was a > > "simple" select). I can never catch these large numbers in v$sql, so I > > > have to assume they age out by the time we start to look. I have asked > > > the application folks to see if these queries are related, and why the > > > execution count is a couple orders of magnitude greater than normal. > > Still waiting for resolution at that end. It seems the general > > consensus that these larger numbers are symptomatic as well, that they > > > only start to escalate when the system is severely slow and the > > middle-tier starts to timeout. NOTE: that is the working hypothesis, > we are trying to prove it. > > > > So, my question is two-fold. Do you experts out there have advice as > > to how we can reactively figure out what already happened? And how to > > proactively avoid this from happening again? Because of the automatic > > AWR snapshots only happening each hour, it is hard for us to get more > > granular when looking at a historical view. Perhaps we should > temporarily increase that? > > > > Any and all advice is gladly welcome. I am hoping to learn some of the > > > tricks and glean from your experience. > > > > Thanks in advance, > > > > PS - after reading about Kyle Hailey and Roger Sanders work, anyone > > have access to a DMA for dummies? =) > > > > > > charles schultz > > oracle dba > > aits - adsd > > university of illinois > > > > > -- > Best regards, > Alex Gorbachev > -- > //www.freelists.org/webpage/oracle-l > > -- //www.freelists.org/webpage/oracle-l