Exactly.Thank you Niall! We had an automatic Sun Cluster fault monitor switch because of this a few days ago. I don't believe that waking up the DBA at 3 o'clock in the morning is appropriate if the "Checkpoint not complete" message is immediately followed by a log file switch completion. Hence I wanted to know if and how other sites/shops implement
such a monitoring. Regards Dimitre On 27/10/2009 9.32, Niall Litchfield wrote:
surachartDimitre's point is rather more subtle than that. It isn't the detection of "checkpoint not complete" that is the challenge, but the detection of the "hang" following it. I'd call it a wait not a hang, but still. I don't see how you can detect that from the alert.log until after the event, and certainly not by monitoring checkpoint not complete itself. This might show itself up in the following wait events (my best guess would be the third)."log file switch (archiving needed)" "log file switch (checkpoint incomplete)" "log file switch completion" regardsNiall (who hasn't ever been bothered enough by checkpoint incomplete to check)On Mon, Oct 26, 2009 at 10:23 PM, Surachart Opun <surachart@xxxxxxxxx <mailto:surachart@xxxxxxxxx>> wrote:To Dimitre, About script monitor... You need to check "Checkpoint not complete" in alert log file If you use Enterprise Manager, You can set "Metric and Policy Settings" -> at "Generic Alert Log Error" Metric modify value to monitor "Checkpoint not complete" http://download.oracle.com/docs/cd/B19306_01/em.102/b25986/oracle_database.htm if you don't have EM, you may make alert log error notification like http://www.dba-oracle.com/t_alert_log_monitoring_errors.htm You can check, How often switch log at... SQL> alter session set nls_date_format='YYYY/MM/DD HH24:MI:SS'; SQL> select * from v$log_history order by FIRST_TIME; -- check first_time between 2 times. If In normal time, your database often switches logfile... you have to tune it. - Make DBGW faster: tune DBWR by enable ASYNC I/O, using DBGW I/O slaves (dbwr_io_slaves) or using multiple processes(db_writer_processes). - Add more redo log file. - Re-create the log files with a larger size. Surachart Opun http://surachartopun.com <http://surachartopun.com/> On Tue, Oct 27, 2009 at 2:19 AM, Radoulov, Dimitre <cichomitiko@xxxxxxxxx <mailto:cichomitiko@xxxxxxxxx>> wrote: >>> On Mon, Oct 26, 2009 at 6:29 PM, Radoulov, Dimitre wrote: [...] >>> I'm trying to figure out how to implement an automated monitoring regarding the above mentioned "event". >>> When it happens the instance hang may become a problem and *I believe* that monitoring the single occurrence >>> of the "Checkpoint not complete" message in the alert log is not sufficient (the time between that message >>> and the following thread advance is quite important as well). >>> >>> So what's the logic/how exactly you monitor the "Checkpoint not complete" event? [...] >> On 26/10/2009 15.16, Surachart Opun wrote: >> "Checkpoint not complete" message in the alert log >> The database attempts to reuse an online redo log file and it can not. [...] Hi Surachart Opun, thank you for your answer! I'm aware of the possible solutions. Moreover, I want to trigger a critical alert when an instance hangs because of this event. I'm not sure if only the monitoring of that message is sufficient and I would like to know how you have implemented (if implemented at all) it. Regards Dimitre -- Niall Litchfield Oracle DBA http://www.orawin.info