Re: Monitoring the "Checkpoint not complete" event

  • From: "Radoulov, Dimitre" <cichomitiko@xxxxxxxxx>
  • To: Niall Litchfield <niall.litchfield@xxxxxxxxx>
  • Date: Tue, 27 Oct 2009 09:42:13 +0100


Exactly.
Thank you Niall! We had an automatic Sun Cluster fault monitor switch because of this a few days ago. I don't believe that waking up the DBA at 3 o'clock in the morning is appropriate if the "Checkpoint not complete" message is immediately followed by a log file switch completion. Hence I wanted to know if and how other sites/shops implement
such a monitoring.


Regards
Dimitre


On 27/10/2009 9.32, Niall Litchfield wrote:
surachart
Dimitre's point is rather more subtle than that. It isn't the detection of "checkpoint not complete" that is the challenge, but the detection of the "hang" following it. I'd call it a wait not a hang, but still. I don't see how you can detect that from the alert.log until after the event, and certainly not by monitoring checkpoint not complete itself. This might show itself up in the following wait events (my best guess would be the third).
"log file switch (archiving needed)"
"log file switch (checkpoint incomplete)"
"log file switch completion"
regards
Niall (who hasn't ever been bothered enough by checkpoint incomplete to check)

On Mon, Oct 26, 2009 at 10:23 PM, Surachart Opun <surachart@xxxxxxxxx <mailto:surachart@xxxxxxxxx>> wrote:

    To Dimitre,

    About script monitor... You need to check "Checkpoint not
    complete" in alert log file

    If you use Enterprise Manager, You can set

    "Metric and Policy Settings" ->
    at "Generic Alert Log Error" Metric
     modify value to monitor "Checkpoint not complete"
    
http://download.oracle.com/docs/cd/B19306_01/em.102/b25986/oracle_database.htm

    if you don't have EM, you may make alert log error notification like
    http://www.dba-oracle.com/t_alert_log_monitoring_errors.htm

    You can check, How often switch log at...
    SQL> alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS';
    SQL> select * from v$log_history order by FIRST_TIME;
    -- check first_time between 2 times.

    If In normal time, your database often switches logfile... you
    have to tune it.
    - Make DBGW faster: tune DBWR by enable ASYNC I/O, using DBGW I/O
    slaves (dbwr_io_slaves) or using multiple
    processes(db_writer_processes).
    - Add more redo log file.
    - Re-create the log files with a larger size.


    Surachart Opun
    http://surachartopun.com <http://surachartopun.com/>


    On Tue, Oct 27, 2009 at 2:19 AM, Radoulov, Dimitre
    <cichomitiko@xxxxxxxxx <mailto:cichomitiko@xxxxxxxxx>> wrote:


        >>> On Mon, Oct 26, 2009 at 6:29 PM, Radoulov, Dimitre wrote:
        [...]

        >>> I'm trying to figure out how to implement an automated
        monitoring regarding the above mentioned "event".
        >>> When it happens the instance hang may become a problem and
        *I believe* that monitoring the single occurrence
        >>> of the "Checkpoint not complete" message in the alert log
        is not sufficient (the time between that message
        >>> and the following thread advance is quite important as well).
        >>>
        >>> So what's the logic/how exactly you monitor the
        "Checkpoint not complete" event?
        [...]

        >>  On 26/10/2009 15.16, Surachart Opun wrote:
        >>  "Checkpoint not complete" message in the alert log
        >>  The database attempts to reuse an online redo log file and
        it can not.
        [...]


        Hi Surachart Opun,
        thank you for your answer!
        I'm aware of the possible solutions. Moreover, I want to
        trigger a critical alert when an instance hangs
        because of this event. I'm not sure if only the monitoring of
        that message is sufficient
        and I would like to know how you have implemented (if
        implemented at all) it.


        Regards
        Dimitre





--
Niall Litchfield
Oracle DBA
http://www.orawin.info

Other related posts: