RE: cpu average load

  • From: "Cary Millsap" <cary.millsap@xxxxxxxxxx>
  • To: <Oracle-L@xxxxxxxxxxxxx>
  • Date: Sat, 4 Dec 2004 11:47:54 -0600


One mechanism to consider in dealing with your issue is the =
This idea is intimately related to what I tried to say last night in =

To influence the behavior you desire, a good chargeback scheme will =
your user community with a "bill" that itemizes BY PROGRAM which tasks =
consuming the most capacity.

One of the thousands of reasons I hate the system-wide statistics game =
that, by its very nature, it PREVENTS a company from placing performance
accountability upon the right shoulders. Basically, when you're looking =
system-wide statistics (utilization statistics, cache hit ratios, even =
rates, ...), it encourages--no, not strong enough--it FORCES the
accountability almost completely onto the DBA's shoulders.

Accountability for performance must be shared among developers and =
too, or it becomes impossible to improve performance in (I would =
70% or more of cases in which there's a problem.

When you can show people the costs of their individual decisions, you =
influence their behavior. You can do this by analyzing the "invoice" of
capacity consumed per business task. You cannot do this by regarding =
like "system-wide utilization."

Think about Life for a minute... How differently do people behave when =
share a resource versus when they're charged individually with the care =
of a
given resource? Example: Have you ever heard the shout "Rental car!!" =
someone has scraped the muffler on the pavement after running over the =
of a hill in downtown San Francisco too fast? How many people have you =
heard to celebrate knocking a hole in their own car? The problem with =
cars is that accountability is in the wrong place. An individual doesn't
care that rental rates will go up a thousandth of a cent per day for
everybody because he did something reckless. The accountability is so
diluted by being so widely shared that there's really NO accountability
left. There's no motivation left to avoid reckless behavior. The =
behavior you get is then recklessness because being reckless is easier =
to some people, more fun) than being careful. It's the path of least
resistance. Hey, choose your favorite analogy. If you don't like rental =
stories, try stories about split-bill dinners where water-drinking
salad-eaters pay more because their wine-drinking friends ate steak, or
hotel bath towels that you wipe your muddy shoes with, or federal income
taxes that pay for programs you don't agree with....

A company creates the same kind of environment when it bases its =
management process upon system-wide metrics. When you can show Joe User =
HE is responsible for $17,000 of your company's resources yesterday =
he ran six sales reports without a date range, you can motivate Joe to =
the problem. When you can show Susan Developer that SHE is responsible =
73% of your IT spend because she wrote code that parses inside a for() =
in six programs, you can motivate Susan to fix the problem.

But when all you regard is system-wide metrics, you can call from the =
top of
your hill that things are bad and all you'll get from Joe and Susan is a =
on the back and maybe a little friendly empathy when they ask you what
you're going to do about it.

Cary Millsap
Hotsos Enterprises, Ltd.
* Nullius in verba *

Upcoming events:
- Performance Diagnosis 101: 1/4 Calgary
- SQL Optimization 101: 12/13 Atlanta
- Hotsos Symposium 2005: March 6-10 Dallas
- Visit for schedule details...

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx =
On Behalf Of Janine Sisk
Sent: Saturday, December 04, 2004 9:58 AM
To: oracle-l@xxxxxxxxxxxxx
Subject: Re: cpu average load

FWIW, I do think there is some value in measuring things like CPU load.=20
  We host database-backed websites, and often our first and most=20
reliable indicator that something is wrong with a site is the load on=20
the system going way up.  By the time the site's users complain to the=20
site owner, and they complain to us, the problem has usually been going=20
on for hours.  Since these situations are usually caused by someone=20
trying to download all the content on the site, by the time we know=20
about it and can stop it a significant amount of content has already=20
been "slurped", lots of users have been annoyed, and sometimes our=20
client ends up with a big bandwidth bill.  Another common cause is that=20
some of our clients do their own programming and they can write some=20
real whopper queries at times;  again, by the time the complaints reach=20
us the problem has usually been ongoing for some time.  We are able to=20
deal with these situations as quickly as possible by keeping an eye on=20
the CPU load and investigating whenever it rises alarmingly.

Of course, the first step of this is to figure out what a normal CPU=20
load is for each server.  I think the only way you can really do that=20
is to keep an eye on it for a while when things are running normally,=20
and establish a baseline.  It does not really matter what the number=20
is, as long as things are humming along and everyone using the system=20
is happy with it's performance.

I think my situation is a bit different than most of you;  my users are=20
not in-house, and my Oracle instances are, indirectly through the web=20
sites, hanging out there for anyone to poke at.  So my experience may=20
not apply to the rest of you, and maybe not even to Paula's situation,=20
but I offer it anyway as one more perspective on the question.




Other related posts: