RE: checkpoint incomplete issue

  • From: "Cary Millsap" <cary.millsap@xxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Fri, 19 Nov 2004 16:19:51 -0600

One thing I'm surprised nobody's mentioned is that excessive =
checkpointing
can be caused by excessive writing. On occasion, we see the application =
that
does something like

        update t set col=3D'X'
        where {predicates that don't include "and col<>'X'"}

Get it? If there were a bunch of "col=3D'X'" rows to begin with, you're
telling Oracle to change the existing 'X' values to new 'X' values. Of
course, this can be brutal on DBWR, LGWR, ARCH, processes that are =
trying to
use CPU or acquire latches, undo management, etc.

I hate to see people trying to solve problems like this by randomly =
hacking
parameters, adding files, or buying hardware, when the best solution =
would
be to make the application not do stuff that it doesn't need to be doing =
in
the first place.


Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
* Nullius in verba *

Upcoming events:
- Performance Diagnosis 101: 1/4 Calgary
- SQL Optimization 101: 11/8 Dallas, 12/13 Atlanta
- Hotsos Symposium 2005: March 6-10 Dallas
- Visit www.hotsos.com for schedule details...


-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx =
[mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Jeremiah Wilton
Sent: Friday, November 19, 2004 3:14 PM
To: oracle-l@xxxxxxxxxxxxx
Subject: RE: checkpoint incomplete issue

On Fri, 19 Nov 2004, Freeman Robert - IL wrote:

> Hmmmm... maybe I was wrong, or the behavor has changed at some point =
and
> time (7 vs 8??).... looking at my 9i and 10g alert logs, it appears =
that
the
> checkpoint not complete messages only appear if it can't switch =
logfiles.
>
> Maybe I'm just getting to old and senile.

Well that may be the case with me as well.  I wrote the article in
1932 based on Oracle 7.3.2 on a hand-cranked granite computer.  But I
don't think the fundamental rule has changed. "Cannot allocate new log
- checkpoint not complete".

You can have 10 logs, and switch through them all without completing a
checkpoint (if using LOG_CHECKOINT_INTERVAL it will happily start over
with a new checkpoint at each log switch even if the previous one is
not complete), until you get to the last log, and they are all status
ACTIVE, at which point it runs out of logs that are INACTIVE and has
nowhere to switch to.  Then you get CNC.

This problem, once rife, is a real rarity these days.

Often someone has decided to use an immature filesystem and it
serializes on file access (defeating multiple DB writers).

Another fave is the old software mirroring across arrays.  That
software mirroring software (like VxVM) goes into full resync mode if
the machine crashes or the mirror gets otherwise out of sync.  This
can suck up huge quantities of available IO on a running production
system.

Finally, most people who think they are using async IO are not, due to
incorrect filesystem, OS kernel or volume manager configuration.  The
only way to verify is to use a system call tracer to see what flags
the DB writer is using to open the file.  There are numerous platform
and filesystem-specific recipes on MetaLink for getting async to work
and making sure it is really on.

--
Jeremiah Wilton
Independent Oracle Professional
Oracle Certified Master
Disaster Recovery - Seminars - Technical Interviews
http://www.speakeasy.net/~jwilton
--
//www.freelists.org/webpage/oracle-l

--
//www.freelists.org/webpage/oracle-l

Other related posts: