RE: high "latch: cache buffers chains" waits in 10.2.0.3 DB

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <bkaltofen@xxxxxx>, "'Oracle-L'" <oracle-l@xxxxxxxxxxxxx>
Date: Wed, 27 Feb 2008 01:43:22 -0500
It seems that you exceed the ability of the machine and database to keep
even with the workload at some number of connections and then the existing
system degrades.

If you cannot get the folks writing the system to improve the system, you
have two paradoxical possibilites.

1) If the load is peaked and you could survive if you could just get enough
more connections for a while, then either shared servers or simply a higher
processes setting might do the trick. This would tend to be the case if the
workload of connections varies wildly with many of them being quite light
weight and you just once in a while have the unhappy coincidence of too many
heavy hitters. Of course if you could identify the worst heavy hitters, then
you might convince the application builders to check the number of executors
of the heavy requests and wait to fire off the current heavy hitter until
the number of concurrent running heavy hitters is low enough to be supported
by the system. This contemplates the notion that the heavy hitter queries
might be necessarily so rather than simply badly written wasteful queries.
Oracle's own "Concurrent Manager" from the eBusiness suite is an example of
handling this.


2) If the load reaches a sustained plateau over which you cannot sustain the
workload, then lower the number of connections allowed to the number you can
sustain with the current application. They will just have to live with "Try
again later." You will thus at least serve those who get a turn reasonably.
This will also give you a chance to evaluate the queries generating the work
for waste. The most obvious opportunity for improvement is in the queries
that take the longest and consume the most resources, but some folks enter
an infinite loop trying to improve a costly query that is already optimal.
Other load that is sometimes overlooked is a chronic background drag of many
unused indexes being updated, and I'm feeling like I'm about to start typing
a laundry list so I'll stop. If you (or the developers) cannot reduce the
work required per request, then the best you can do is allow the number to
be served who can be served.

In the mean time you've started to identify some objects that are apparently
getting slammed. (Note Greg Rahn's advice about the query you used and some
better alternatives.) The next step is to look at the sql that is doing the
slamming. There are various things you can do to reduce query expense if you
have no chance of changing the code. What to try depends on the specifics of
the query and the data. Whether or not there is any opportunity for
improvement worth pursuing without re-writing the sql is an experimental
question.

As for the hot block analysis, the details of what is making an object's
blocks hot is a required input before embarking on trying to cure it
efficiently. It sounds like you think partitioning might be a silver bullet
for your particular case but you don't have it licensed. Before you either
try to justify the license or implement some version of "Poor man's
partitioning" by separating the contents of the table into multiple tables
joined by a union all for the views that need it all while operating on only
the required piece when the scope of what you would otherwise be using as a
partitioning column is known, you would really need to examine what hits
that table.

Good luck!

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of bkaltofen@xxxxxx
Sent: Tuesday, February 26, 2008 12:25 PM
To: Oracle-L
Subject: high "latch: cache buffers chains" waits in 10.2.0.3 DB

Content-Type: multipart/alternative; 
boundary="------------050402000304000607000905"

This is a multi-part message in MIME format.
--------------050402000304000607000905
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit

Hello,

I'm experiencing a problem with a high number of "latch: cache buffers 
chains" waits in a customer database.

Environment:
Solaris 10 x86
Oracle EE 10.2.0.3
Application Type: Web-Application which opens a session for each request 
(no comment. We have no influence on the application)

Symptoms:
"latch: cache buffers chains" waits go up
number of sessions increases until "max processes" is reached, so no new 
connections can be established.
Web-application stops responding, as no more sessions are possible
I can not reproduce the issue by will and there is no test database at 
customer side.

Workaround:
- Wait till latch contention is resolved --> Customer is not appy about 
that. One time it took over 2 hours
or
- restart instance --> fast workaround, but buffer and library cache are 
lost.

I think it has something to do with hot blocks.

<snip>



--
//www.freelists.org/webpage/oracle-l
References:
- high "latch: cache buffers chains" waits in 10.2.0.3 DB
  - From: bkaltofen@xxxxxx
RE: high "latch: cache buffers chains" waits in 10.2.0.3 DB

Other related posts: