RE: Statspack ratios help

  • From: "Khemmanivanh, Somckit" <somckit.khemmanivanh@xxxxxxxxxxxxxxxx>
  • To: <Mark.Bobak@xxxxxxxxxxxxxxx>, <bunjibry@xxxxxxxxx>, "Oracle-L" <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 6 Jun 2006 16:49:17 -0700

>>>The thing to remember is that a hit ratio does not reliably correlate
to performance, and therefore is NOT a valid performance metric.  If you
approach a database that is having performance problems, look at a hit
ratio, and determine it's "too low", you're likely to spin your wheels
trying to "fix" the hit ratio by improving it, but miss the root cause
of the performance problem.<<<
 

I'll have to disagree with that somewhat. Consider that case of index
scans that are serviced mainly from the DB Buffer Cache. 

Your hit ratio will be good because the query(ies) are continually
hitting the cache and not reading from disk.

Once you start getting cache misses then your performance drops as
well....

I know what most of the response will be -- go tune the code to be more
efficient...of course that is the ideal solution but in reality that is
not always a possilbility...
 
So you are left with things like trying to buffer as much of the data as
possible (i.e. big DB cache sizes, keep pools, etc..)
 
I'm not saying you don't have a valid point (most of it is) but there
are always exceptions...

________________________________

From: oracle-l-bounce@xxxxxxxxxxxxx
[mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Bobak, Mark
Sent: Tuesday, June 06, 2006 1:08 PM
To: bunjibry@xxxxxxxxx; Oracle-L
Subject: RE: Statspack ratios help


Bryan,
 
Never be afraid to ask a question...and there are no stupid
questions...only stupid answers.  I shall endeavor to ensure this is not
a stupid answer.
 
There is nothing inherently wrong with hit ratios.  The problem is when
people rely on hit ratios as a valid performance metric.  If you want to
monitor hit ratios and you happen to notice a precipitous drop in a
particular hit ratio, well, perhaps there's something there....perhaps.
 
The thing to remember is that a hit ratio does not reliably correlate to
performance, and therefore is NOT a valid performance metric.  If you
approach a database that is having performance problems, look at a hit
ratio, and determine it's "too low", you're likely to spin your wheels
trying to "fix" the hit ratio by improving it, but miss the root cause
of the performance problem.
 
A few words about 100% buffer cache hit ratio.  This is actually often a
sign of trouble, perhaps big trouble.  The theory goes like this "disk
I/O is X times slower than memory access (pick your favorite number for
X, 100, 1000, 10000, whatever) so therefore, you want all your I/O to be
logical (from the buffer cache) thereby eliminating all that slow disk
I/O."  Sounds good, right?  There are some things you have to
remember.....Oracle's buffer cache is part of the SGA, which is a shared
memory segment, which all the Oracle server processes share access to.
In order to ensure the integrity of that data in shared memory, Oracle
has some pretty elaborate locking mechanisms.  I won't go into details,
but, very breifly (and incompletely) a "latch" is a small chunk of
memory which serializes access to a specific part of the SGA.  The
buffer cache is protected by latches and buffer locks that make sure
that while one process is reading data from a particular block, another
process can't overwrite that buffer.
 
So, when you consider the serialization overhead of Oracle's SGA, the
difference in cost between a logical I/O and a physical I/O is nowhere
near 10000 or 1000 or even 100.  If I recall correctly, Cary Millsap
published a number, based on huge amounts of empirical data (trace
files) that calculated it to be something like a factor of 37.  Ok,
that's still 37x slower!  That's worth eliminating, right??
 
Ok, next, a little change in focus here.  There is no problem if a
particular well-tuned query has a hit ratio of 100%.  In fact, most
anyone would agree that this is a good thing.  So, what's the problem
with a 100% (or nearly 100%) buffer cache hit ratio instance wide??
Well, maybe nothing....maybe.  But it's also very possible to have an
instance with a 100% buffer cache hit ratio that is so bottlenecked that
noone can get any use out of it.  The problem lies in the serialization
I mentioned before.  Even if you have a 100% buffer cache hit ratio,
every buffer cache access is serialized through latches, buffer locks,
and buffer pins.  This is necessary for Oracle to maintain data
integrity.  So, what happens if you have a query that (for example) uses
an index very inefficiently?  You have a query that hits the same blocks
over and over again.  Because they are repeatedly hit, they remain in
the buffer cache, so, no disk I/O!  Yay!  But, the serialization is
killing you.  What happens when you run several instances of these
inefficient queries concurrently?  They will push your buffer cache hit
ratio well past 99.99%, but, the system will be suffering.  The
serialization invlolved in locking and latching will bring the system to
it's knees.  So, in this case, you have the "ideal" BCHR, but your
system will run like a dog.
 
The most effective way to determine the root cause of a performance
problem is to specifically analyze the process that's having the
problem.  Look at the performance profile.  Where is time actually being
spent?  Focus on that!  Tools like the 10046 trace at level 8, the
Oracle wait interface, and StatsPack, can be very useful in determining
the root cause.
 
-Mark
 
PS  If you go to http://www.hotsos.com/ and click on Library, Cary
Millsap has a paper there called something like "Why a 99+% buffer cache
hit ratio is bad", and he makes a much more eloquent argument than I do
above.  Lots of other good papers there too....take some time to read
through some of them.  Registration is required, but free.
 

-- 
Mark J. Bobak 
Senior Oracle Architect 
ProQuest Information & Learning 

For a successful technology, reality must take precedence over public
relations, for Nature cannot be fooled.  --Richard P. Feynman, 1918-1988

.com <mailto:bunjibry@xxxxxxxxx>  

Other related posts: