[oracle-L] [Long] How deal with load-average-spike ?

  • From: "Peter Smith" <goodjobfastcar@xxxxxxxxx>
  • To: Oracle-L@xxxxxxxxxxxxx
  • Date: Fri, 15 Dec 2006 14:44:11 -0800

People,

I'm trying to write a simple document on how a DBA should handle
a host which is experiencing a load-average-spike (and
negatively impacting end-user response time).

Do any of you have some common sense you would like to share?

Here are some ideas which come to my mind:

-Try to characterize the current load using:
 -the 'ps' command.
 -the Targets -- Hosts -- Performance Page in EM

Load-type-1: Is the load-average-spike caused by a single process or a small
number of processes which are 'hogging' the CPU?

Load-type-2: Is the load-average-spike caused by Oracle background
processes which are 'hogging' the CPU?

Load-type-3: Is the load-average-spike caused by Oracle foreground
processes which are 'hogging' the CPU?

Load-type-4: Is the load-average-spike caused by an abnormally large number
of
processes which have recently been added to the process list?

If we are dealing with Load-type-1, we need to trace the processes
to a user or business-process which spawned the process(es).
Then, we implement procedures to rein-in or disconnect this user.
This is a difficult situation because risks of doing the "wrong thing"
are high.  You don't want to disconnect a user who is conducting a
legitimate business-critical task.

If we are dealing with Load-type-2, we need to drill down into the Oracle
background processes to find out what is keeping it overly busy.
EM can help you with this:
Home -- Targets -- Databases -- TheBusyDatabase -- Active Sessions with High
CPU.

In this screen you can see some clues about what is driving each of the busy
sessions.
When you drill down into Oracle background processes, one of the first
layers you
encounter are sessions.

Your goal is to gather enough information to pinpoint the
application(s) which are driving these sessions.  Then, you interact
with an administrative interface on the application to search for
end-users who might be responsible.  Or you might then find
information about the application's configuration or state which is
abnormal and needs to be fixed.  For example, sometimes public facing
websites can be pushed hard by mal-ware Perl scripts operated by
delinquents to generate high volumes of traffic (to crash the site).
A poorly constructed application might filter (or magnify) some of
this traffic to a host for processing.

One level below each session are individual SQL statements.  EM offers
functionality to look at each SQL statement in a session.  If you are
confident that the load on the DB is legitimate, an obvious next place
to look for clues are the SQL statements.  Tuning SQL statements is a
vast subject but EM will quickly help you find SQL statements which
consume relatively large amounts of CPU.  Also EM contains a 'SQL
Tuning Advisor'.  It is sophisticated enough to help with a wide
variety of poorly constructed SQL statements or schemas.  For example,
it could tell you if a new index would speed up a query and if so
how much that index would slow down a corresponding insert.

If we are dealing with Load-type-3, we need to...[ complete later ].

If we are dealing with Load-type-4, we need to search for characteristics
of the process set which yield clues about the deluge.

Some simple questions might be a good start:

-Are most of the processes owned by one userid?
-Are most of the processes linked to the same executable?
-Do we have source code for the underlying executable?
-When were most of the processes created?
-Are they all consuming CPU cycles?
-Are they connected to anything:
 -reading/writing files?
 -connected to the DB?
 -network connected to anything?

The goal here is to find the application or person who started all these
processes.
Another goal is to find out if they are obstructed by anything and thus
prevented
from dying since they cannot finish their jobs.


So, that is my initial take on the high-load-average-spike scenario.
Do any of you have any comments which would be useful for the
DBA who needs a general set of guidelines for dealing with this situation?

...Peter
--
Peter Smith
GoodJobFastCar@xxxxxxxxx
http://GoodJobFastCar.com

Other related posts:

  • » [oracle-L] [Long] How deal with load-average-spike ?