Re: [foxboro] P2P links locked

  • From: "Tom VandeWater" <tjvandew@xxxxxxxxx>
  • To: <foxboro@xxxxxxxxxxxxx>
  • Date: Wed, 8 Sep 2010 01:05:22 -0400

Dirk,
        This is in response to your recent note about corruption of Foxboro
CP OM lists.  This is a rare but long-lived problem that apparently still
exists.  We experienced this problem on multiple occasions and it was
communicated to Foxboro each time.  They were all discovered shortly after
ICC configuration activity.  The problem occurred when blocks had been
copied and deleted in the ICC.  I submitted documentation on the first
occurrence of this problem to Sue Papineau of the TAC group at the Foxboro
Customer Satisfaction Center on June 24th 1998.   At the time we were at
Ver. 4.1.1.

        I am convinced that the problem is brought on by DELETING a block
that is involved in a CP-CP (peer to peer) communication and then adding or
changing other blocks before exiting the ICC or doing a checkpoint of that
CP.  It is common knowledge that peer to peer OM connections are not updated
in blocks that are modified with external connections until a checkpoint
occurs.  If you delete a block that frees up an OM list variable and then
add another connection before a checkpoint has occurred I believe that it is
not being correctly managed and can cause cross linking or corruption of OM
lists between CP's.

        This is a very insidious bug because it can actually cause
corruption of OM variables that were in no way associated with the
modification made during the current session of ICC configuration.  That is
what happened to us in the scenario described below.  Just because you fix
the problem that you see doesn't mean you haven't created another one that
will rear its ugly head in the future.

        As a possible precautionary measure, I would suggest doing a
checkpoint immediately following any deletion of a block or parameter that
is involved in CP to CP communication.  Do this before making any other
changes in the ICC.  Exiting the ICC will also cause a checkpoint to occur
if any changes to the control database have occurred during that session.

Sue,
        I never got any feedback on this problem.  Have other Fox IA users
experienced this problem

The 2 CP-30's involved were on the same node and were hosted by the same
AP-20 within that node.  We are running Ver 4.1.1 base software with no CP
quick fixes loaded.

        On Tuesday morning, June 23, employee entered the Control
Configurator for 01CP01, hosted by 01AP01 and copied block CMPD_01:BLK_01 to
block  CMPD_01:BLK_02 and then deleted block CMPD_01:BLK_01.  Sink
connections to CMPD_01:BLK_01.OUTINC still existed in CMPD_02:SEQ_01.BI0008
and CMPD_03:SEQ_02.BI0008 in 01CP02.  Employee then exitted 01CP01,(causing
Checkpoint) and entered 01CP02 to change the CMPD_01:BLK_01.OUTINC sink
connections to be  CMPD_01:BLK_02.OUTINC  Employee exited 01CP02, (causing
Checkpoint)and checked to see that BLK_02 operated properly. It was okay and
peer to peer connections were functioning for CMPD_01:BLK_02.OUTINC to
sequences in CMPD_02,. (see SOM source and sink lists attached below:
Source variable list:
 ID # 05 Ent. # 12. in 01CP01

also note  
 ID # 05 Ent. # 18 which shows the same connection but instead of scanning
it shows Deleted.  Are two source connections, one marked Scanning and the
other marked Deleted, supposed to be in the same source variable list?

Sink variable list:
 ID # 00 Ent. # 13. in 01CP02

Based on everything that we experienced we are pretty sure that during the
execution of the changes mentioned above, a previous source connection from
CMPD_01:SEQ_03.BO0004 in 01CP01 to CMPD_02:SEQ_04.BI0012 and
CMPD_03:SEQ_05.BI0012 in 01CP02 were overwritten in 01CP01 source list to
01CP02, but remained in the 01CP02 sink list!! Note that
CMPD_01:SEQ_03.BO0004 is Ent. # 39 in 01CP02 sink list but is nowhere to be
found in 01CP01 source list.  We were contacted on Tuesday pm 23 June when
the sequences dependant on CMPD_01:SEQ_03.BO0004 quit working.  We didn't
understand the problem on Tuesday night and began working on it again early
Wed morning.
 
We deleted CMPD_01:SEQ_03.BO0004 connection from CMPD_02:SEQ_04.BI0012 and
CMPD_03:SEQ_05.BI0012 in 01CP02 and checkpointed 01CP02.  It cleared from
the 01CP02 sink list and the 01CP01 source list did not change.  When we
added the connections mentioned in the top of this paragraph again and
checkpointed, they were correctly established in 01CP01-02 source and sink
list but Source variable list:
 ID # 05 Ent. # 12. in 01CP01 changed from CMPD_01:BLK_02.OUTINC - Scanning
to CMPD_01:SEQ_03.BO0004 - Scanning,(See variable list at bottom of this
note), and the CMPD_01:BLK_02.OUTINC connections in 01CP02 were no longer
following what that blocks' parameter was doing in 01CP01, but it was not
showing disconnected on the graphics or in som.  We decided to add a new
block in 01CP01 and make a connection from it to an existing block in 01CP02
to see where the connection would show up in the 01CP01 source list. That
connection worked and used an entry in the source list Ent # 21 that was
previously blank, but listed with No Response.  The connection functioned
properly.

 At that point we called TAC and began to speak with Sue Papineau, who
dialed in with Foxwatch.  We then deleted, checkpointed, and remade the
connections to CMPD_01:BLK_02.OUTINC in 01CP02 and everything appeared to
work correctly in 01CP01 and 01CP02 lists.  We agreed to write up the
sequence of events and want Foxboro to respond to these questions:

We know these CP's experienced some kind of a memory pointer mismatch that
caused an existing, functional, and non-related sequence connection to lose
site of it's correct source and yet not show disconnected when changes were
made to a supposedly unrelated block in the source CP.  

1. In Foxboro's knowledge, has this happened before?
2. If so, was a CP fix developed? 
3. Is it likely that there is still a problem in the source or sink CP
memory?
4. Should we try to bring the CP/CP's to a state where we can safely reboot
and then reboot the CP/CP's to fix a possible memory corruption problem?
5. Can Foxboro recreate the situation in their lab?

Source list from 01CP01 to 01CP02 before deleting connection
CMPD_01:SEQ_03.BO0004 from CMPD_02:SEQ_04.BI0012 and CMPD_03:SEQ_05.BI0012

   OM OPEN POINTS DATABASE INFORMATION      001437E6

    ID #  Next Hdr Adr  Status   Lbug  Opt?  PID #  Size

     04    0490:A256    Local   01CP01  N    00060   065
     05    0460:A836    Remote  01CP02  N    00060   024

 Ent Open Var Name/                 NTF/ C NET VrI  Delta/  Ptr or Dsc
  #   Variable Status                LOC G I/L /Ch   Value   /Pack/Off
005 CMPD_01:SEQ_03.BO0002        N    -01 014 3F800000  0440:8303
     Scanning    0625                 L  N -01 -1            00A5 04AD
012 CMPD_01:BLK_02.OUTINC           N    -01 013 3F800000  0458:095E
     Scanning    0425                 L  N -01 -1            00CE 00B8
013 CMPD_01:SEQ_03.II0008        N    -01 040 3F800000  0440:81CD
     Scanning    0426                 L  N -01 -1 0000000007 00A5 0377
014 CMPD_01:SEQ_03.RI0005        N    -01 041 3C23D70A  0440:8218
     Scanning    0423                 L  N -01 -1  428C0000  00A5 03C2
015 CMPD_01:SEQ_03.RI0006        N    -01 042 3C23D70A  0440:8227
     Scanning    0423                 L  N -01 -1  40A00000  00A5 03D1
018 CMPD_01:BLK_02.OUTINC           N    -01 046 3F800000  04B0:2F4E
     Deleted     0865                 L  N -01 -1            0062 00B8
020 CMPD_01:BLK_01.OUTINC            N    -01 012 3F800000  0460:989E
     Deleted     0865                 L  N -01 -1            00CC 00B8
021                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
022                                  
     Scanning    0425                 L  N -01 -1            00A5 023C

Sink list in 01CP02 from 01CP01 before deleting connection
CMPD_01:SEQ_03.BO0004 from CMPD_02:SEQ_04.BI0012 and CMPD_03:SEQ_05.BI0012

    OM OPEN POINTS DATABASE INFORMATION      0013B4E6

    ID #  Next Hdr Adr  Status   Lbug  Opt?  PID #  Size

     00    0470:2E06    Local   01CP02  N    00060   057
     05    0498:A066    Remote  01CP01  N    00060   022

 OM OPEN POINTS DATABASE OPEN VAR INFORMATION 00139086  ID #000

 Ent Open Var Name/                 NTF/ C NET VrI  Delta/  Ptr or Dsc
  #   Variable Status                LOC G I/L /Ch   Value   /Pack/Off
012                                  N    -03 000 00000000  0000:0000
     Deleted     0865                 L  N -01 02            0000 0000
013 CMPD_01:BLK_02.OUTINC           N    009 012 3F800000  0490:A256
     Scanning    0425                 R  N -01 02            00CE 00B8
014 CMPD_01:SEQ_03.BO0002        N    009 005 3F800000  0490:A256
     Scanning    0625                 R  N -01 02            00A5 04AD
017                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
018                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
019                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
039 CMPD_01:SEQ_03.BO0004        N    009 012 3F800000  0490:A256
     Scanning    0625                 R  N -01 02            00A5 04B9
040 CMPD_01:SEQ_03.II0008        N    009 013 3F800000  0490:A256
     Scanning    0426                 R  N -01 02 0000000007 00A5 0377
041 CMPD_01:SEQ_03.RI0005        N    009 014 3C23D70A  0490:A256
     Scanning    0423                 R  N -01 02  428C0000  00A5 03C2
042 CMPD_01:SEQ_03.RI0006        N    009 015 3C23D70A  0490:A256
     Scanning    0423                 R  N -01 02  40A00000  00A5 03D1
046                                  N    -03 000 00000000  0000:0000
     Deleted     0865                 L  N -01 02            0000 0000
048                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
049                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
050                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
051                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
052 CMPD_01:SEQ_03.BI0011        N    009 023 3F800000  0490:A256
     Scanning    0425                 R  N -01 02            00A5 023C
053                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
054                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000


Source list from 01CP01 to 01CP02 after deleting and re adding
CMPD_02:SEQ_04.BI0012 and CMPD_03:SEQ_05.BI0012 sink connections to
CMPD_01:SEQ_03.BO0004 in 01CP01.

   OM OPEN POINTS DATABASE INFORMATION      001437E6
 Ent Open Var Name/                 NTF/ C NET VrI  Delta/  Ptr or Dsc
  #   Variable Status                LOC G I/L /Ch   Value   /Pack/Off
005 CMPD_01:SEQ_03.BO0002        N    -01 014 3F800000  0440:8303
     Scanning    0625                 L  N -01 -1            00A5 04AD
012 CMPD_01:SEQ_03.BO0004        N    -01 012 3F800000  0440:830F
     Scanning    0625                 L  N -01 -1            00A5 04B9
013 CMPD_01:SEQ_03.II0008        N    -01 040 3F800000  0440:81CD
     Scanning    0426                 L  N -01 -1 0000000007 00A5 0377
014 CMPD_01:SEQ_03.RI0005        N    -01 041 3C23D70A  0440:8218
     Scanning    0423                 L  N -01 -1  428C0000  00A5 03C2
015 CMPD_01:SEQ_03.RI0006        N    -01 042 3C23D70A  0440:8227
     Scanning    0423                 L  N -01 -1  40A00000  00A5 03D1
018 CMPD_01:BLK_02.OUTINC           N    -01 046 3F800000  04B0:2F4E
     Deleted     0865                 L  N -01 -1            0062 00B8
020 CMPD_01:BLK_01.OUTINC            N    -01 012 3F800000  0460:989E
     Deleted     0865                 L  N -01 -1            00CC 00B8
021                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
022                                  N    -03 000 00000000  0000:0000
     No Response 0000                 L  N -01 -1            0000 0000
023 CMPD_01:SEQ_03.BI0011        N    -01 052 3F800000  0440:8092
     Scanning    0425                 L  N -01 -1            00A5 023C


 
 
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
 
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
 

Other related posts: