Re: [foxboro] CAD Failure

  • From: "Boulay, Russ" <Russ.Boulay@xxxxxxxxxxxx>
  • To: "foxboro@xxxxxxxxxxxxx" <foxboro@xxxxxxxxxxxxx>
  • Date: Wed, 18 Jan 2012 16:11:25 -0500

Jason......device monitor issues as you have described will cause the CAD's to 
stop receiving alarms.

I know of no monitoring script to help the cause.

But I can tell you is that on large systems where every station is capable of 
becoming the DevMon master..the most successful solution for prevention is to 
get the number of available DevMon capable workstations to a manageable number 
like 4-6 workstations. (by disabling DevMon)
Greater numbers can open a window of DevMon thrashing in an attempt for DevMon 
takeover.

I've included a Knowledge article on the subject ...which also references two 
other solutions at the end of the text...

Solution
Troubleshooting Device Monitor Problems on V8.x Systems
March 09, 2011
  

Problem Description

On V8.x systems the Device Monitor can get into a state where the Master 
appears to be thrashing and other workstations are attempting to become the 
Device Monitor Master without success.The following message or similar messages 
indicating failed slave talkeover will be repeatedly recorded in smon_log.

2008-08-17 13:02:41 BBWP74 Process = DEV_MONITOR -39 0 Attempted to be Master 
Device Monitor (Missing heartbeats) 
2008-08-17 13:02:37 BBWP74 Process = DEV_MONITOR -39 0 DEVICE MONITOR SLAVE 
STATION 


Fix/Resolution
Option 1

There are several Options that can be used to correct this problem. One of the 
first steps should be to identitfy which workstations should be Device Monitor 
candidates and disable Device Monitor on all other stations by commenting it 
out of the go_sysmgm script located in d:/usr/fox/exten. (See SOL1281)  
Reducing the number of stations that can be Device Monitor will help reduce the 
opportunity for thrashing by reducing the number of stations which can become 
the Device Monitor (DM) and will also simplify troubleshooting by reducing the 
number of stations which must be investigated. On a small system 1-2 stations 
should be designated as DM stations. On a large system 4-6 stations at the most 
should be designated as DM stations.

On some later versions of V8.x Device Monitor is started up through a 
go_ADM.ksh located in d:/usr/fox/bin. In this case comment out the ADM line in 
the fox_apps.dat file also located in d:/usr/fox/bin.

 

Option 2

If the "Missed Heartbeat" messages or "Failed Slave Takeover" messages continue 
then do the following:

Create a "cs_errs" file on all workstations which can be Device Monitors in 
d:/usr/fox/cs. 
This file should be deleted or renamed once the problem is resolved. 
See if the problem can be determined by the information in the cs_errs file. 
 

Option 3

If the "Missed Heartbeat" messages or "Failed Slave Takeover" messages continue 
then do the following:

 Try to determine who is the current Device Monitor.  glof -p DEV_MONITOR (See 
SOL965).  
Make sure the cs_devmon process is running on the current Device Monitor 
 

If cs_devmon is not running then in a cmd shell 
type d: <RET> 
sh <RET> 
cd /usr/fox/cs 
startp /b cs_devmon.exe 
Verify that cs_devmon.exe is running 
 

Option 4


If the "Missed Heartbeat" messages or "Failed Slave Takeover" messages continue 
then do the following: 

On the Device Monitor Master station and the station(s) reporting "Missed 
Heartbeats" or "Failed Slave Takeover" , 
check how much CPU time the cs_devmon process is using 
 If the cs_devmon is using alot of CPU time then kill the process and restart 
using the instructions in Option 2 
Check for other processes which may be using alot of CPU time preventing Device 
Monitor from running in a timely fashion. 
 

Option 5

If the "Missed Heartbeat" messages or "Failed Slave Takeover" messages continue 
then do the following: 

Verify that all Device Monitors have the same configuration by checking 
d:/usr/fox/cs/cs_devmon.cfg 
If configurations are different generate a new committal 
Make sure that PCHANG is set for the ADM7 package on all Device Monitor 
Workstations 
Re-commit all Device Monitor Workstation stations 
Run the dm_recon command on the current Device Monitor (See Option 3) 



Option 6

If the "Missed Heartbeat" messages or "Failed Slave Takeover" messages continue 
then do the following: 

Verify that there isn't an intermittent Network problem such as stations 
reporting ping test messages or a cable going bad then good 
Check for topology changes taking place in MESH switches 

Supporting Information
See also SOL765 and SOL287

 
Operating system
Windows XP
Windows Server 2003
Solaris 10.0

Copyright © 2012 Invensys Systems Inc. All Rights Reserved

-----Original Message-----
From: foxboro-bounce@xxxxxxxxxxxxx [mailto:foxboro-bounce@xxxxxxxxxxxxx] On 
Behalf Of SLADE Jason -NANTICOKE
Sent: Wednesday, January 18, 2012 12:56 PM
To: foxboro@xxxxxxxxxxxxx
Subject: Re: [foxboro] CAD Failure

Thanks for getting back to me Terry.  I recall reading about that CAD problem 
vs the new workstation a few years ago - nasty!

Last year we added a couple Windoz I/A V8.5 boxes with some mesh and a CP270 
and ATS into our predominantly Unix I/A V6.5 & 7.1 environment and everything 
seemed fine for several months.  Since getting blindsided by the CAD crash, 
we've learned that the highest version I/A box is supposed to be the 
DEV_MONITOR master at all times; in our case the Windoz AW.  All indications 
say is was, but apparently somebody took exception to that and they fought 
about it at length.  We have 9 AIM* packages running on 9 separate AWs and all 
CADs seized up solid about the same time.

I'm sure all users could breathe easy if they knew their I/A system could/would 
somehow alarm or draw attention to the fact that CAD is broke.

Here's hoping someone has and will share that silver bullet ...

Jason Slade 
  
Control Technologist 
Protection & Control Systems Support 
Nanticoke G.S. 


-----Original Message-----
From: foxboro-bounce@xxxxxxxxxxxxx [mailto:foxboro-bounce@xxxxxxxxxxxxx] On 
Behalf Of Terry Doucet
Sent: January 18, 2012 2:11 PM
To: foxboro@xxxxxxxxxxxxx
Subject: Re: [foxboro] CAD Failure

Jason,The closest problem for me to what you describe occurred when someone 
added new WP to the system without telling the rest of the system about this 
WP. At some later time, this WP took over as the Device Monitor MASTER for the 
system. Since this Device Monitor Master could not detect AW's, WP's and 
printers, it instructed the CP's to stop sending alarms to those devices.  I 
cannot remember if the Historian stopped receiving alarms or not but no CAD's 
received any alarms. Of course, you have to realize that no alarms are coming 
before you know that the CAD is dead.  Device Monitor was working correctly it 
was the installer who goofed.
Is there one AW or WP (capable of being Device Monitor Master) that is off by 
itself on a section of your network? Since the Historian kept receiving alarms 
perhaps the problem is with the AW running historian.
Terry


> Subject: [foxboro] CAD Failure
> Date: Wed, 18 Jan 2012 12:46:45 -0500
> From: jason.slade@xxxxxxx
> To: foxboro@xxxxxxxxxxxxx
> 
> We recently had a plant-wide CAD failure with all 46 workstations 
> being affected.  Due to the plant's running mode at the time, it took 
> OPS a while to notice that there were no process alarms coming from 
> dozens of CPs to CAD.  The historian however was still receiving the 
> messages!?
> No SYS_MON alarm - nothing to bring this failure to the attention of 
> OPS.  Even with all the log files, TAC could not ascertain what was 
> actually happening or what the cause was.  One particular log file 
> indicated that there was a fight going on for control of DEV_MONITOR.
> 
> We got CAD back up and running, but I was wondering:
> - has anyone else been hit by this problem?
> - does anyone have a site implemented work-around to bring to OPS 
> attention that CAD/DEV_MONITOR has failed?
> 
> Thanks,
> 
> Jason Slade
> 
> Control Technologist
> Protection & Control Systems Support
> Nanticoke GS
> 
> -----------------------------------------
> THIS MESSAGE IS ONLY INTENDED FOR THE USE OF THE INTENDED
> RECIPIENT(S) AND MAY CONTAIN INFORMATION THAT IS PRIVILEGED, 
> PROPRIETARY AND/OR CONFIDENTIAL. If you are not the intended 
> recipient, you are hereby notified that any review, retransmission, 
> dissemination, distribution, copying, conversion to hard copy or other 
> use of this communication is strictly prohibited. If you are not the 
> intended recipient and have received this message in error, please 
> notify me by return e-mail and delete this message from your system. 
> Ontario Power Generation Inc.
>  
>  
> ______________________________________________________________________
> _ This mailing list is neither sponsored nor endorsed by Invensys 
> Process Systems (formerly The Foxboro Company). Use the info you 
> obtain here at your own risks. Read 
> http://www.thecassandraproject.org/disclaimer.html
>  
> foxboro mailing list:             //www.freelists.org/list/foxboro
> to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
> to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
>  
                                          
 
 
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process Systems 
(formerly The Foxboro Company). Use the info you obtain here at your own risks. 
Read http://www.thecassandraproject.org/disclaimer.html
 
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
 
 
 
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
 
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
 


*** Confidentiality Notice: This e-mail, including any associated or attached 
files, is intended solely for the individual or entity to which it is 
addressed. This e-mail is confidential and may well also be legally privileged. 
If you have received it in error, you are on notice of its status. Please 
notify the sender immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any other person. This email comes from a division of the Invensys 
Group, owned by Invensys plc, which is a company registered in England and 
Wales with its registered office at 3rd Floor, 40 Grosvenor Place, London, SW1X 
7AW (Registered number 166023). For a list of European legal entities within 
the Invensys Group, please go to http://www.invensys.com/en/legal/default.aspx.

You may contact Invensys plc on +44 (0)20 3155 1200 or e-mail 
reception@xxxxxxxxxxxxx This e-mail and any attachments thereto may be subject 
to the terms of any agreements between Invensys (and/or its subsidiaries and 
affiliates) and the recipient (and/or its subsidiaries and affiliates).


 
 
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
 
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
 

Other related posts: