Re: Monitoring the "Checkpoint not complete" event

  • From: Jared Still <jkstill@xxxxxxxxx>
  • To: cichomitiko@xxxxxxxxx
  • Date: Tue, 27 Oct 2009 10:04:23 -0700

As with Niall, I've not had much trouble with 'checkpoint not complete'.

While investigating this I found that manual logfile switches seemed
to be causing 'checkpoint not complete; messages in the alert.log
of a couple databases, one 10.2.0.4 and the other 11.1.0.7.

The timing didn't explain it, as both archive processes were available
at the time of error.

(see the attached script for querying v$archived_log)

ML note *435887.1 *offers an explanation - private redo strands must be
flushed for the log switch.

The 'checkpoint not complete' is very easy to reproduce by leaving
an uncommitted transaction in flight during the log switch. There's an
example in the note.

This applies to 10.2.0.4+, so it would seem that this problem probably did
not occur in previous versions.

The timing of the 'checkpoint not complete' in the 10.2.0.4 database
corresponded to the normal backup window, approximately the time
the backups normally finished.  That would explain the manual log
switches, and the 'checkpoint not complete' errors.

Check your alert.log for the sequence# of the log that could not
be allocated, query v$archived_log and v$log_history and check
the size of the archived log.  If the size is less than the normal size,
the 'checkpoint not complete' probably occurred as the result of a forced
log switch during a backup while an uncommitted transaction still had
active redo strands.

HTH

Jared Still
Certifiable Oracle DBA and Part Time Perl Evangelist
Oracle Blog: http://jkstill.blogspot.com
Home Page: http://jaredstill.com
-- loghistory_8.sql
-- show times between log switches

-- format for time_to_switch is DAYS:HOURS:MINUTES:SECONDS
-- create the conv2time function with conv2time.sql

-- When querying v$archived_log, I think a fair approximation of the
-- time required to write the archived log can be derived by subtracting
-- the time of the next change (next_time) from the time the current
-- archive log is completely written to disk (completion_time).
-- 
-- If there were a start_time indicating the time that archiving started
-- that would be better of course, but I cannot find anything that actually
-- shows the time the archive log starts to be written.
-- 
-- The next_time column should be a fair approximation of start_time
-- for the archive log under normal circumstances.
-- 
-- When the archiver is behind however that time will not be accurate,
-- but is still useful.


--@clears

set pause off
set echo off
set timing off
set trimspool on
set feed on term on echo off verify off
set line 80
set pages 24 head on

clear col
clear break
clear computes

btitle ''
ttitle ''

btitle off
ttitle off

set newpage 1

--@ndf 1

alter session set nls_date_format='mm/dd/yyyy hh24:mi:ss';

set pagesize 60
set line 120

col sequence# format 99999999 head 'SWITCH|SEQUENCE'
col time format a20 head 'SWITCH TIME'
col time_to_switch format a12 head 'TIME SINCE|PREV SWITCH'
col bytes format 9,999,999,999 head 'BYTES'
col arch_write_time format 9999 head 'ARCH|WRITE|SECONDS'

with lograw as (
        select
                lc.sequence#
                ,lc.thread#
                ,lc.first_change#
                ,lc.first_time
                ,lc.first_time - lag(lc.first_time,1) over(order by sequence#) 
time_since_switch
        from v$log_history lc
),
loghist as (
select
        sequence#
        , first_change#
        , thread#
        , first_time
        -- get days portion of time since last log switch
        , to_char(trunc(time_since_switch))
                || ':'
                -- get hours portion of time since last log switch
                || to_char(trunc(sysdate)+time_since_switch,'hh24:mi:ss') 
time_to_switch
        from lograw
)
select
        a.sequence#
        , h.first_time
        , a.next_time
        , h.time_to_switch
        , a.completion_time
        , (a.completion_time - a.next_time) * 86400 arch_write_time
        , a.blocks * block_size bytes
        --, h.sequence#
        --, h.first_change#
        --, h.thread#
from loghist h
        , v$archived_log a
where a.thread# = h.thread#
        and a.sequence# = h.sequence#
        and a.FIRST_CHANGE# = h.FIRST_CHANGE#
        and a.dest_id = 1
        -- less than 1 minute
        --and substr(h.time_to_switch,1,8)  = '0:00:00:'
order by 1,2
/


Other related posts: