Re: [foxboro] OM ISSUES - HELP NEEDED!!

  • From: "Ellis, Jeff" <Jeff.Ellis@xxxxxxxxxxxxxxxxxx>
  • To: <foxboro@xxxxxxxxxxxxx>
  • Date: Thu, 8 Nov 2007 09:16:06 -0600

I truly appreciate all the feedback on my strand!  It's scary how much
brain power we have out there!!

I have began the monitoring of the WPIDLE<litterbug>(below
response)however, the below command does not seem to work on the WP70s
(NT).

pref -RWP701 dmcmd "enable monitor"

Nor does modifying the /usr/fox/wp/data/fv_cmds file enable the
variable.  (Change the "disable monitor" to "enable monitor".)  When I
perform

omget WPIDLERWP701

the OM returns a value of "-1" so the variable exists but the monitor is
not being enabled.  Any suggestions?

Thanks in Advance,
Jeff
-----Original Message-----
From: foxboro-bounce@xxxxxxxxxxxxx [mailto:foxboro-bounce@xxxxxxxxxxxxx]
On Behalf Of Doucet, Terrence
Sent: Wednesday, November 07, 2007 10:33 AM
To: foxboro@xxxxxxxxxxxxx
Subject: Re: [foxboro] OM ISSUES - HELP NEEDED!!

Jeff,

As seen by Ed, a weak or defective DNBX can cause this type of problem.
=3D
As a means of checking this and trying to locate the "bad" station:
        A) enable the WPIDLE<letterbug> variable for each AW and WP
(edit =3D
fv_cmds)

        B) build a display with all your WPIDLE<letterbug> (and
descriptive =3D
text) and display this graphic on AW's or WP's on both sides of your =3D
CLAN.

You may see the station with the weak DNBX as Out Of Service on this =3D
display even though the station may appear to be running OK. You might =
=3D
even see that station as OOS on a display running on a WP in its own =3D
node but in service (OK) on the same display running on a WP on the =3D
other side of the CLAN. This is because AW's and WP's do not normally =
=3D
act as OM Source stations requiring the consistant performance of a =3D
connection oriented data transfer.

Be cautious about pushing the elevator connections as this is pretty =3D
much a haphazard method for testing and you have just as much chance of
=3D
making the situation worse.=3D20

Terry



-----Message d'origine-----
De=3DA0: foxboro-bounce@xxxxxxxxxxxxx =3D
[mailto:foxboro-bounce@xxxxxxxxxxxxx] De la part de =3D
Ed.Zychowski@xxxxxxxxxxxxxx
Envoy=3DE9=3DA0: 7 novembre 2007 11:10
=3DC0=3DA0: foxboro@xxxxxxxxxxxxx
Objet=3DA0: Re: [foxboro] OM ISSUES - HELP NEEDED!!

Jeff,
Don't know if this will help but we had a very similar problem (and=3D20
somewhat similar architecture). Our cause was a DNBX (used to connect
a=3D20
WP51B to nodebus with ethernet tranceiver over a long distance).=3D20
Fortunately for us, the problem would last long enough for us to=3D20
troubleshoot. Called tac and they were able to dial in to our system and
=3D

identify that the DNBX was chattering on the nodebus. We lowered =
the=3D20
elevator and problem went away. We replaced the DNBX and things were =3D
fine.=3D20
Some time later (a year or two) we experienced the same problem, =
same=3D20
diagnosis and replacing DNBX fixed the problem. We also had problems =3D
with=3D20
this WP loosing the RS-232 connection during lightning strikes. We =3D
finally=3D20
installed lightning arresters on both ends of the semi-rigid coax =
and=3D20
RS-232 (422?) connections have had no problems since.




"Ellis, Jeff" <Jeff.Ellis@xxxxxxxxxxxxxxxxxx>=3D20
Sent by: foxboro-bounce@xxxxxxxxxxxxx
11/07/2007 09:59 AM
Please respond to
foxboro@xxxxxxxxxxxxx


To
<foxboro@xxxxxxxxxxxxx>
cc

Subject
[foxboro] OM ISSUES - HELP NEEDED!!






Last night we spent 6 hours troubleshooting the second episode of what
appears to be the OM(s) being overwhelmed.  We have a lot of
speculations but do not have enough tools to prove or disprove any of
them.  We would like to hear from anyone that has seen similar issues.
We called TechSupport but we did not get any resolution.  We will be
getting the local Foxboro technician to assist troubleshooting hardware
but want to supply as much information as possible.  Any help is
appreciated!!!  Here's what happened...
=3D20

            Their System:

A.      I/A version 6.5.1.

B.      Two nodes w/ one Sun E box on each node as the AW.

C.      NT workstations in the central control room.

D.      Three Exceed workstations hosted by the AWs in remote control
rooms.

E.      An old Fiber LAN interface.

F.      On the affected node:  4 CPs (3 - CP40s & 1 - CP60), 3 Modicon
Gateway.

G.     FoxAim* API version 5.4.0

=3D20

(1)     On October 27th operators at this plant called reporting that
displays were "smurfing-out" in bazaar sequences.  They saw some WPs
smurfing while other WPs were not having any issues with the same
display called up.=3D20

a.       The problem came and went roughly 8 times over 45 minutes.
Each episode lasted 30 seconds to 10 minutes.  When I got there to
troubleshoot the issue it mysteriously went away.=3D20

b.       The problem was manifested on only one of the two nodes at this
plant.

c.       No SYSMON alarms were present.=3D20

d.       Nothing was scheduled in Crontab that would have kicked off at
that time either.=3D20

e.       No non-routine operator actions were taken.=3D20

f.         We rebooted the AW w/ a couple of fsck to check for
issues...NOTHING.=3D20

g.       Performed a find / -name "*core*" and found nothing unusual.

h.       Both PI and Aim* could not gather data during these periods.

i.         Before and after PI trends showed an increase in OMOVERRUNS
on all stations within the affected node.  (We've set the major STATION
parameters up in PI for troubleshooting such problems).=3D20

j.         The problem did not manifest itself again until yesterday (10
days later).

=3D20

(2)     Yesterday (Nov 6th).

a.       Upon the first episode of smurf-outs the plant DCS guy rebooted
the AW and the problem appeared to go away.  20 minutes later it was
back again.

b.       The second episode the DCS guy reported that a grey screen came
up reporting "OBJECT MANAGER ERRORS - TERMINATION IN PROGRESS".  When
trying to reboot the AW again the reboot was observed as being
"ABNORMAL".  A core dump file was created (vmcore.4) with little
information.  The problem seemed to get better while rebooting but did
not go away and persisted for several minutes afterwards.

c.       During these episodes the OMOVERRUNS skyrocketed (over 1,000
overruns on each station).

d.       During these episodes FoxSelect was not available for any
Station on the affected node.

e.       LAN interface was rebooted...no improvement.

f.         All of the WPs were rebooted within the previous week.
(Assumed no opened OM lists hanging out...)  HOW CAN WE CHECK THIS???

g.       NODEBUS CABLE TEST was performed...no improvement.

h.       PI data collection was turned off... no improvement.

i.         TechSupport thought the issue was a CP out of resources but
the issue was not limited to one CP or station.  HOW CAN WE DETERMINE IF
A STATION IS OUT OF RESOURCES?

j.         Snoop yielded many "multi-cast" messages during these
episodes.  Fewer multi-cast messages were observed after the episodes
went away.  HOW CAN WE DETERMINE WHERE THESE MULTICAST MESSAGES
ORIGINATE AND ARE THESE AN ISSUE?

k.       A second Ethernet Network was added within the previous 3 weeks
to get unneeded traffic off of the nodebus (i.e. copying graphics
around). All second Ethernet ports was unconnected and all mapped drives
were unmapped...no apparent improvements.  COULD THIS REALISTICALLY BE A
PROBLEM ON THE NODEBUS?=3D20

l.         A similar issue was observed 1.5 years ago and it appeared to
be related to copying too much across the LAN.  However, both nodes were
affected.  Wayne Flippo investigated and thought it was a Cell Buss
going bad...never resolved.

m.     The final episode stopped relatively mysteriously...David pushed
on all elevator connections on all stations in the Refinery area and
shortly afterwards the final episode went away.

=3D20

=3D20

THANKS,

Jeff Ellis

PREMIER System Integrators=3D20

=3D20


=3D20
=3D20
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
=3D20
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         =3D
mailto:foxboro-request@xxxxxxxxxxxxx?subject=3D3Djoin
to unsubscribe:      =3D
mailto:foxboro-request@xxxxxxxxxxxxx?subject=3D3Dleave
=3D20



=3D20
=3D20
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
=3D20
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         =3D
mailto:foxboro-request@xxxxxxxxxxxxx?subject=3D3Djoin
to unsubscribe:      =3D
mailto:foxboro-request@xxxxxxxxxxxxx?subject=3D3Dleave
=3D20
=20
=20
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
=20
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         =
mailto:foxboro-request@xxxxxxxxxxxxx?subject=3Djoin
to unsubscribe:      =
mailto:foxboro-request@xxxxxxxxxxxxx?subject=3Dleave
=20
 
 
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
 
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
 

Other related posts: