RE: cpu average load

  • From: "Cary Millsap" <cary.millsap@xxxxxxxxxx>
  • To: <Oracle-L@xxxxxxxxxxxxx>
  • Date: Sat, 4 Dec 2004 00:39:53 -0600

Paula,

My answer to the final question in your note is, with due respect: No. I
think you're looking at your system upside-down.

A system's value cannot be measured solely by how much capacity it =
consumes.
Value is a function of both cost and benefit. System resource =
consumption
statistics convey only cost, and system-wide resource consumption =
statistics
can convey only costs for which you have no hope of ever properly =
allocating
back to some tangible benefit.

You can measure the value your system provides to you only by measuring =
the
performance (both accuracy and speed) of the tasks your business =
requires
from it. The more important tasks deserve more attention (and more =
system
capacity) than the less important tasks. If your business wants only for
your system-wide statistics to be happy, then I submit that your =
business is
working for your system instead of the other way around.

I don't suggest that you start tracing 100s of processes. I do suggest =
that
it wastes time if you attempt to measure a system's efficiency in any =
way
that doesn't begin with prioritizing the tasks that your business =
requires
of your system.


Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
* Nullius in verba *

Upcoming events:
- Performance Diagnosis 101: 1/4 Calgary
- SQL Optimization 101: 12/13 Atlanta
- Hotsos Symposium 2005: March 6-10 Dallas
- Visit www.hotsos.com for schedule details...


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx =
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Paula_Stankus@xxxxxxxxxxxxxxx
Sent: Friday, December 03, 2004 10:05 PM
To: niall.litchfield@xxxxxxxxx; cary.millsap@xxxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: RE: cpu average load

Cary,

I am sorry it took awhile for me to answer this question.  I have been
in implementations lately.  I truly apologize.

Cary, what you find simple and implement frequently will take me awhile
to catch on to - I am still reading your book as I can.

The question started when my management asked me to deploy certain big
brother monitors on the system - it appealed to them for various reasons
which is a completely different discussion.

Now we are getting warning messages regarding cpu average load.  I am
thinking of upping the thresholds on these warnings since no one has
complained about performance and frankly getting these messages
interferes with our ability to monitor "true" problems on the system -
but then again users sometimes live with bad performance and the
information never gets passed along - IMHO.

I don't mean to take anyone's time in answering an abstract question.  I
just wanted a general understanding of what this was measuring, what
impact it could have before I went ahead and changed it.  I was looking
for a good place to start to gain a better understanding of what this
measures.  For example, I commonly look at TOP to see how much CPU a
process is using.  It is very easy to tell from that which processes are
consuming the most CPU on the system, how much CPU (approximately) and
for how long.  When I get error messages from OEM regarding CPU I can
run TOP and trace it directly back to a particular process many times.
Then I can proceed with more in-depth tracing.  However, if I am getting
warnings, errors and e-mails about average CPU load then I am not
completely clear what that is measuring.

In my simple mind I think that looking at overall resource utilization
on a box is a good place to start if you are seeing things slowing down
(as a whole) then drilling down from there.  Also, proactively
monitoring system resource utilization on a regular basis if you are
supporting a number of databases operationally has proven useful to me.
That is what these overall monitoring processes are for - just to show
unusual activity.  That is why I was asking - where can I start finding
out what is usual or unusual average CPU load? =3D20

Cary, when you say:

"The amount of response time that process preemptions are costing your
performance is measured as the amount of response time in an extended
SQL trace file that is not accounted for by the sum of your file's c
values at recursive depth zero, plus the sum of your file's ela values."

Does not seem to answer my question.  Certainly, I shouldn't have to
start by running extending SQL traces on everything running on my system
when these warnings occur.  For example, that might require an extending
SQL trace of multiple OLTP system with 100+ users.  Shouldn't I be able
to discern something from this information at a higher level? =3D20



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Niall Litchfield
Sent: Wednesday, December 01, 2004 8:28 AM
To: cary.millsap@xxxxxxxxxx
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: cpu average load

On Tue, 30 Nov 2004 10:59:11 -0600, Cary Millsap
<cary.millsap@xxxxxxxxxx> wrote:
> I disagree that this advice is difficult to implement in practice, =
=3D3D =3D

> because I implement it in practice frequently.

I disagree with mladen for a somewhat different reason (i.e I don't care
about ease of use here). It seems to me this discussion springs from a
technical question that may or may not be worth answering.
Paula's question was along the lines of 'how can I tell if my server is
being utilized efficiently'. One possible answer to this is 'Who
cares?'. Now if the question is being asked because there is an ongoing
discussion about buying new hardware, or the transactional capacity of
the system is apparently not good enough for the business needs of the
state then there is a real business problem to investigate.

iff there is a real problem to be investigated, then it doesn't really
matter how easy or hard it is to get the correct answer (unless the cost
of obtaining the answer is higher than the cost of not answering), it is
the correct answer that you require.

So I'd be taking a step back and asking Paula to define *why* she is
investigating the amount of the CPU capacity of her machines that Oracle
is using. If you can express that in clear business terms then you can
go down the profiling route (or any other method you think appropriate).


BTW In this particular case, my money would be on unaccounted-for time
being a better measurement of time spent being prempted than the kernel
mode time consumed by the whole system, but I'm willing to be proven
wrong.



--
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
--
//www.freelists.org/webpage/oracle-l


BEGIN-ANTISPAM-VOTING-LINKS
------------------------------------------------------
Teach CanIt if this mail (ID 17285359) is spam:
Spam:
https://dohsmsi01.doh.state.fl.us/canit/b.php?c=3D3Ds&i=3D3D17285359&m=3D=
3D3143=3D
90471
7b0
Not spam:
https://dohsmsi01.doh.state.fl.us/canit/b.php?c=3D3Dn&i=3D3D17285359&m=3D=
3D3143=3D
90471
7b0
Forget vote:
https://dohsmsi01.doh.state.fl.us/canit/b.php?c=3D3Df&i=3D3D17285359&m=3D=
3D3143=3D
90471
7b0
------------------------------------------------------
END-ANTISPAM-VOTING-LINKS

--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l

Other related posts: