RE: Spotting the real cause of a Production slowdown (10g)

From: "Ray Feighery" <ray@xxxxxxxxxxxxxxxxxxxxx>
To: <oracle-l@xxxxxxxxxxxxx>
Date: Fri, 21 Apr 2006 09:43:39 +1000
Charles

I'm coming a little late to this discussion, but it does sound like this is
being caused by lots of statements (or more properly, a single statement
executed multiple times) using literal values where bind variables would be
more appropriate.

I recently came across this problem at a client site. They were sending out
tens of thousands of emails using dynamically created dbms_job statements
which each had a literal value within the dbms_job call. 

In 10g, even a simple select statement can take out 20-30 latches during a
hard parse.

I used the following script (gleaned from asktom.oracle.com) to find the
offending sql:

create table t1 as select sql_text from v$sqlarea;

alter table t1 add sql_text_wo_constants varchar2(1000);

create or replace function 
remove_constants( p_query in varchar2 ) return varchar2
as
    l_query long;
    l_char  varchar2(1);
    l_in_quotes boolean default FALSE;
begin
    for i in 1 .. length( p_query )
    loop
        l_char := substr(p_query,i,1);
        if ( l_char = '''' and l_in_quotes )
        then
            l_in_quotes := FALSE;
        elsif ( l_char = '''' and NOT l_in_quotes )
        then
            l_in_quotes := TRUE;
            l_query := l_query || '''#';
        end if;
        if ( NOT l_in_quotes ) then
            l_query := l_query || l_char;
        end if;
    end loop;
    l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );
    for i in 0 .. 8 loop
        l_query := replace( l_query, lpad('@',10-i,'@'), '@' );
        l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );
    end loop;
    return upper(l_query);
end;
/
update t1 set sql_text_wo_constants = remove_constants(sql_text);

select sql_text_wo_constants, count(*)
  from t1
 group by sql_text_wo_constants
having count(*) > 100
 order by 2
/

This will replace literals with '@'

Ray

> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> On Behalf Of Schultz, Charles
> Sent: Friday, April 21, 2006 7:59 AM
> To: Alex Gorbachev
> Cc: oracle-l@xxxxxxxxxxxxx
> Subject: RE: Spotting the real cause of a Production slowdown (10g)
> 
> That is insightful, thank you.
> 
> In light of trying to prove it (so that we can officially point our
> finger and resolve this issue so it does not happen again), it sounds
> like we are going to have to focus on the AWR to see if the workload is
> actually higher or not. Again, hypothetically, if the latch waits are
> merely a nuisance, they make the analysis that much harder because they
> in turn cause some of the cpu load. Is it possible to use AWR and
> attempt to exclude latch waits and their side effects? Perhaps we need
> to concentrate on periods of time before latch waits become an issue?
> 
> Again, thanks for this thought. Another direction for us to pursue.
> 
> -----Original Message-----
> From: Alex Gorbachev [mailto:gorbyx@xxxxxxxxx]
> Sent: Thursday, April 20, 2006 4:50 PM
> To: Schultz, Charles
> Cc: oracle-l@xxxxxxxxxxxxx
> Subject: Re: Spotting the real cause of a Production slowdown (10g)
> 
> Imagine that you have a bit more workload at some point and it leads to
> higher CPU consumption. It also makes latching issue (since concurrency
> increases) more proversive. Processes spinning more on latches consume
> more CPU and keep other latches longer... and since CPU is becoming a
> "rare" resource (well, bottleneck), taking a latch becoming more and
> more difficult and longer and longer...
> catch-twenty-two.
> 
> 2006/4/20, Schultz, Charles <sac@xxxxxxxxxxxxx>:
> >
> > Thanks to all for the many replies on this - it gave us some really
> > good ideas. Unfortunately, we are in the middle of such a problem now,
> 
> > and I highly doubt that the literal sql statements are causing our
> > latch waits and high cpu usage right now. Additionally, the
> > shared_pool is not changing that much (relative to past events,
> > anywhere from 2.5gb to 3gb, sga_target = 10g). CPU is being consumed
> by many active processes, all doing "normal"
> > things that usually do not consume this much CPU. At least, that is
> > what we believe at this point in time.
> >
> > Paul, thanks for the hash_plan_value query - it definitely helps
> > disprove that we have a truckload of literal statements (in other
> > words, I do not think literal statements is our problem, at least not
> in large numbers).
> > Niall, I am still seeing a number of statements that are not sampled
> > into v$active_session_history, but those that are are showing a lot of
> 
> > library cache waits.
> >
> >  ________________________________
> >
> > From: oracle-l-bounce@xxxxxxxxxxxxx
> > [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> > On Behalf Of Schultz, Charles
> > Sent: Thursday, April 20, 2006 8:15 AM
> >
> > To: oracle-l@xxxxxxxxxxxxx
> > Subject: Spotting the real cause of a Production slowdown (10g)
> >
> >
> >
> >
> > We have observed several critical slowdowns of our Production system
> > since upgrading to 10g. The symptoms include a rapid rise in library
> > cache latch waits (with associated, but less notable, rises in shared
> > pool and library cache pin waits), CPU pegged at 100% and CPU load in
> > triple-digits, and massive shared pool resizing evidenced by
> > v$sga_resize_ops (1.7gb added in
> > 15 seconds). Obviously, trying to run queries against the normal v$
> > views is rather slow (only adds to the latching problem). We sent a
> > hanganalyze and systemstate dumps to Oracle Support, but after a
> > couple months of trying to figure things out, we have word that the
> > initial Support Engineer may have misdiagnosed our problem.
> >
> > Based on what little I know and researched (reading a number of notes
> > here, Metalinks, google, oraFAQ, etc), a large number of literal sql
> > statements might be able to cause such symptoms. It is just hard for
> > me to imagine the number of literal statements that would consume
> > 1.7gb of shared pool in 15 seconds. Perhaps it is not quantity, but
> > quality as well, each statement consuming a large chunk of real
> > estate. The only problem with this conjecture is that I do not find
> > evidence of this in v$sql (looking at number of executions, number of
> literals, sharable_mem).
> >
> > But the puzzle is even more elusive. EM reports extremely (abnormally)
> 
> > large number of executions for seemingly random sql statements. In one
> 
> > case, we had 3 statements that were each executed 35 million times in
> > one hour (2 of those statements had bind variables, the other was a
> > "simple" select). I can never catch these large numbers in v$sql, so I
> 
> > have to assume they age out by the time we start to look. I have asked
> 
> > the application folks to see if these queries are related, and why the
> 
> > execution count is a couple orders of magnitude greater than normal.
> > Still waiting for resolution at that end. It seems the general
> > consensus that these larger numbers are symptomatic as well, that they
> 
> > only start to escalate when the system is severely slow and the
> > middle-tier starts to timeout. NOTE: that is the working hypothesis,
> we are trying to prove it.
> >
> > So, my question is two-fold. Do you experts out there have advice as
> > to how we can reactively figure out what already happened? And how to
> > proactively avoid this from happening again? Because of the automatic
> > AWR snapshots only happening each hour, it is hard for us to get more
> > granular when looking at a historical view. Perhaps we should
> temporarily increase that?
> >
> > Any and all advice is gladly welcome. I am hoping to learn some of the
> 
> > tricks and glean from your experience.
> >
> > Thanks in advance,
> >
> > PS - after reading about Kyle Hailey and Roger Sanders work, anyone
> > have access to a DMA for dummies? =)
> >
> >
> > charles schultz
> > oracle dba
> > aits - adsd
> > university of illinois
> >
> 
> 
> --
> Best regards,
> Alex Gorbachev
> --
> //www.freelists.org/webpage/oracle-l
> 
> 



--
//www.freelists.org/webpage/oracle-l
References:
- RE: Spotting the real cause of a Production slowdown (10g)
  - From: Schultz, Charles
RE: Spotting the real cause of a Production slowdown (10g)

Other related posts: