Re: [foxboro] CP60 will not boot

  • From: "Corbera, Angel" <angel.corbera@xxxxxxxxxxxxxxxx>
  • To: "'foxboro@xxxxxxxxxxxxx'" <foxboro@xxxxxxxxxxxxx>
  • Date: Wed, 14 Sep 2005 11:22:34 -0400

 David,
 
Here is my troubleshooting guide.
So far, it has never failed to anyone.
 
Regards
 
TROUBLESHOOTING CP or FBM BOOTING PROBLEMS

(Last update: Nov 2003)


Find out why your CP does not boot (leds stay RED-GREEN) 

- On the examples below replace "CPLBUG" or "4CP3B1" -a CP30B module- with
your actual CP letterbug".
- Tests were done at v6.2.1 and 6.1.2
- See "FBM won't boot" at the end.

NOTES:
- A CP will go GREEN after its Operating System has been loaded, but before
its control database gets loaded.
- The Shadow module of a FT CP won't boot if the software on the host is
different from the software running in the Primary module.
- RED LED only: Bad Module, Noisy Nodebus, Bad PIO Bus, Bad X-clip, or Bad
Z-clip.
- LEDs off: replace the module.



This brief time-action summary might help understand the troubleshooting
steps. Times are for reference only.

m:ss

0:00  CP is rebooted or reset

0:08  CP sends boot request to all APs (verify with: snoop -t a -x 0
01:00:6c:0f:0f:0f)

...   CP host verifies CP Comex file and SICT tables

0:09  Host creates lock file in /usr/fox/sp/locks: fCPLBUG and fCPLBUG-

...   checks if image overlay file exist

...   verify if all 3 files exist in /usr/fox/sp/files: OS, GDT, map     

...   reads OS file

...   writes to log file /usr/fox/sysmgm/softmgr/file/sm_errs

...   loads CP Operating System ...

0:55  Lock file fCPLBUG- is removed

0:56  CP goes GREEN



TROUBLESHOOTING SUMMARY:

Note: If file from step 8 is created you can skip steps 1-2-3-4-5-6.

1) Clip or no clip? <file:///D:/fox/doc/CP%20won't%20boot.htm#1> 
2) Try other slot <file:///D:/fox/doc/CP%20won't%20boot.htm#2> 
3) Reseat/check/replace  <file:///D:/fox/doc/CP%20won't%20boot.htm#3>
letterbug. Try upside down.
4) Try other module. Bent  <file:///D:/fox/doc/CP%20won't%20boot.htm#4>
pins?
5) SysMgmt:  <file:///D:/fox/doc/CP%20won't%20boot.htm#5> "Enable Download"
grayed out?; DISABLE ALL REPORTS active?
6) Is the Host from sldb ok? Does
<file:///D:/fox/doc/CP%20won't%20boot.htm#6> it boot a Comm processor?
7) Will this AP/AW boot the CP?.
<file:///D:/fox/doc/CP%20won't%20boot.htm#7> Use ds_stasict. Check boot
files (date, size, sum) from result.
8) Is the file  <file:///D:/fox/doc/CP%20won't%20boot.htm#8>
"/usr/fox/sp/locks/fCPLBUG-" created?
9) Use snoop to monitor boot  <file:///D:/fox/doc/CP%20won't%20boot.htm#9>
packets (or: 9b <file:///D:/fox/doc/CP%20won't%20boot.htm#9b>  or 9c
<file:///D:/fox/doc/CP%20won't%20boot.htm#9c> )
10) Module PN ok? <file:///D:/fox/doc/CP%20won't%20boot.htm#10> 
11) Check/Kill/Restart:  <file:///D:/fox/doc/CP%20won't%20boot.htm#11>
"lsap_dsp", "romload_svr", and "mles" (downld CIO_DB ?)
12) Multiple hosts? Run  <file:///D:/fox/doc/CP%20won't%20boot.htm#12>
ds_stasict on other hosts
13) Get CP's NSAP and Pseudo Mac
<file:///D:/fox/doc/CP%20won't%20boot.htm#13> address from IIF.prm
14) Check CP Comex file for NSAP
<file:///D:/fox/doc/CP%20won't%20boot.htm#14> and Pseudo Mac address
15) Check NSAP of boot host. Use
<file:///D:/fox/doc/CP%20won't%20boot.htm#15> fist
16) SysMgmt's  <file:///D:/fox/doc/CP%20won't%20boot.htm#16> Real and pseudo
Mac addresses
17) Insert CP and run: truss -p
<file:///D:/fox/doc/CP%20won't%20boot.htm#17> {romload_svr pid}. Log file:
sm_errs

Optional:
9b) SysMgmt Host Application  <file:///D:/fox/doc/CP%20won't%20boot.htm#9b>
layer: LLC FRAMES TRANSMITTED/RECEIVED
9c) Foxwatch: Monitor Nodebus for
<file:///D:/fox/doc/CP%20won't%20boot.htm#9c> XLLC packets

  _____  



DETAILED PROCEDURES:

BM_11) If the CP is single be sure no clip is on the back of the slot
(especially if the next slot has another station)

BM_22) Try to use a different slot.

BM_33) Check Letterbug:

        - Reseat letterbug
- Verify for "0" and "O", "I" and "1".
- Replace letterbug
- Install letterbug upside down.

BM_44) Check the module for bent pins. If available try using a different
module.

BM_55) Find this CP on SysMgmt and verify if "ENABLE DOWNLOAD", under EQUIP
CHG, is grayed out (enabled).
If not, correct it by selecting "ENABLE DOWNLOAD" once.

If you "DISABLE ALL REPORTS" for this CP on System Management, this is what
will happen:
- DISABLE ALL REPORTS will NOT be grayed out. It will remain white.
- EQUIP INFO will show "SM REPORT STATE: No Reporting".
- The CP will show the "FBM 0" box with "NONE" as letterbug.
- System Alarm printer (or smon_log) will show: 

SYSMON -00074 Enrolling station with Report State : None



BM_66) Find the CP host from /usr/fox/sp/sldb. Is the host (AP/AW) alive and
online?

        
           4APB01# grep 4CP3B1 /usr/fox/sp/sldb

   4CP3B1  4APB01  4APB01  SYMN4A

    ^^^      ^^

    CP      Host


        If not sure if the host (AW51 or AW70) is connected to the nodebus,
verify if it boots a Comm processor with "CSBOOT" letterbug.
Reminder: If you have several hosts in the system, anyone of them could boot
the CSBOOT module. If possible, isolate desired AW51/AW70 and CSBOOT module.


        


BM_77) Use "ds_stasict" to verify if the host thinks it is supposed to boot
the CP and to show which files are required.
You should confirm those files really exist.



        If ds_stasict returns nothing, this host does NOT think it is
supposed to boot the CP. 

        4APB01# /usr/fox/swi/ds_stasict 4CP3B1



4CP3B1   Sw_version -> 0  Station Type -> NFT

IMAGE records:

4CP3B1, 1, 00002000, 00000600, 00000000, OS1C3B, no note

4CP3B1, 2, 00000600, 00000600, 00000000, OS1C3B.GDT, no note

4CP3B1, 5, 0000C840, 00000600, 00000000, CMX4CP3B1.BIN, no note

CHECKPOINT records:

4CP3B1, 4, 00000000, 00000600, 00000000, DB4CP3B1.UC, no note

EEPROM records:

4CP3B1, 6, 00000600, 00000600, 00000000, eu_c3b.bin, no note



   NOTE: If using a CP30B or CP40B at pre-6.2 release, ds_stasict will
report

         boot files for a regular CP30 or CP40.  See example below:



3AWD01# /usr/fox/swi/ds_stasict 3CP401


3CP401   Sw_version -> 0  Station Type -> NFT


IMAGE records:                                                 

3CP401, 1, 00002000, 00000600, 00000000, OS1C40, no note       

3CP401, 2, 00000600, 00000600, 00000000, OS1C40.GDT, no note   

3CP401, 5, 0000C840, 00000600, 00000000, CMX3CP401.BIN, no note


CHECKPOINT records:                                            

3CP401, 4, 00000000, 00000600, 00000000, DB3CP401.UC, no note


EEPROM records:                                                

3CP401, 6, 00000600, 00000600, 00000000, eu_c40.bin, no note   



(The OpSys, GDT and bin files belong to a CP40A, not to a CP40B)


        Verify also size, date, and checksum of the CP Image file:
/usr/fox/sp/files/OS1C30, OS1C40, OS1C3B, OS1C4B, OS1C60
Check it against a known good one.

NOTE: The image file might be corrupted (power failure, RAID drives, no file
sync, etc).
Just because the CP booted yesterday, don't asume the image file is ok
today. 


BM_88) Check for lock files.



        To see how far the boot process goes, do this:

Remove all files in /usr/fox/sp/locks (with CP letterbug in their filenames)
Insert the CP (or reset it)
Check if file "fCPLBUG" is created, and "fCPLBUG-" links to it.
When boot is done, "fCPLBUG-" is removed but "fCPLBUG" remains.
Note the date.

If file "fCPLBUG-" is not created use next step to see if letterbug is
correct.

If file "fCPLBUG-" is created you know the letterbug is correct and the SICT
tables have this CP as a valid station to be booted.

If the file "fCPLBUG-" is created but not removed then something is wrong
with the OpSys, GDT or map file for that CP. See more details in step 17
<file:///D:/fox/doc/CP%20won't%20boot.htm#17> . 

        


BM_99) Use (Solaris) snoop command to look at boot request packets.



        Follow this procedure on ANY AP/AW51 connected to the nodebus, to
observe the (XLLC) boot request packets from the CP to all APs.

Type: 
        snoop -x 0 01:00:6c:0f:0f:0f                                  


        Insert the CP (or depress its reset button).

After about 8-10 seconds the screen will show the first packet.
If CP letterbug is correct, and its host starts booting it, you will see
maybe 1-2 more messages.
If CP letterbug is NOT correct, or its host is not available, the same
message will repeat every 8 seconds.
Type CTL-C to stop snoop.

The ascii (right) section shows the actual LETTERBUG of the CP requesting
boot packets.
See sample below:

        4APB01#  snoop -x 0 01:00:6c:0f:0f:0f




Using device /dev/le (promiscuous mode)                                

           ? -> (multicast)  ETHER Type=8501 (Unknown), size = 63 bytes

                                                                       

     0: 0100 6c0f 0f0f 0000 6c0d a061 0031 4040    ..l.....l..a.1@@    

    16: c0a1 80a0 8082 0204 0083 0101 8501 01a6    ................    

    32: 8030 80a1 8081 0202 0b00 0082 0834 4350    .0...........4CP    

    48: 3342 3200 0000 0000 0000 0000 0060 42   3B2..........`B        



        If desired, the snoop command can capture packets into a log file.
Use snoop again to examine it. See samples below.

Sample 1: CP letterbug correct, host available, files ok.

        4APB01# snoop -o snoop.log 01:00:6c:0f:0f:0f

      Using device /dev/le (promiscuous mode)

      0               (cursor starts blinking)

              (Insert CP now)

   CTL-C          (1 minute later)



4APB01# snoop -i snoop.log -t r -x 0 01:00:6c:0f:0f:0f

  1   0.00000            ? -> (multicast)  ETHER Type=8501 (Unknown), size =
63 bytes



     0: 0100 6c0f 0f0f 0000 6c0d a061 0031 4040    ..l.....l..a.1@@

    16: c0a1 80a0 8082 0204 0083 0101 8501 01a6    ................

    32: 8030 80a1 8081 0202 0b00 0082 0834 4350    .0...........4CP

    48: 3342 3200 0000 0000 0000 0000 0060 42   3B2..........`B


        Sample 2: CP letterbug is wrong. Messages from the "wrong station"
keep coming every 8 seconds.

        4APB01# snoop -i snoop.log -t r -x 0 01:00:6c:0f:0f:0f


  1   0.00000            ? -> (multicast)  ETHER Type=8501 (Unknown), size =
63 bytes

 


     0: 0100 6c0f 0f0f 0000 6c0c 0047 0031 4040    ..l.....l..G.1@@


    16: c0a1 80a0 8082 0204 0083 0101 8501 01a6    ................


    32: 8030 80a1 8081 0202 0c00 0082 0834 4350    .0...........4CP


    48: 3442 4300 0000 0000 0000 0000 006f 20   4BC..........o


 


  2   8.00789            ? -> (multicast)  ETHER Type=8501 (Unknown), size =
63 bytes

 


     0: 0100 6c0f 0f0f 0000 6c0c 0047 0031 4040    ..l.....l..G.1@@


    16: c0a1 80a0 8082 0204 0083 0101 8501 01a6    ................


    32: 8030 80a1 8081 0202 0c00 0082 0834 4350    .0...........4CP


    48: 3442 4300 0000 0000 0000 0000 006f 20   4BC..........o


 


  3  16.78556            ? -> (multicast)  ETHER Type=8501 (Unknown), size =
63 bytes

 


     0: 0100 6c0f 0f0f 0000 6c0c 0047 0031 4040    ..l.....l..G.1@@


    16: c0a1 80a0 8082 0204 0083 0101 8501 04a6    ................


    32: 8030 80a1 8081 0202 0c00 0082 0834 4350    .0...........4CP


    48: 3442 4300 0000 0000 0000 0000 000f 7d   4BC...........}





BM_1010) Verify if the module Part Number corresponds to the CP type
configured:

        
          4APB01# grep 4CP3B1 /usr/fox/sp/hldb  

  4CP3B1  20B                           

    ^^     ^^

  CPlbug  CPtype



  PartNumber   hldb(6.2)  hldb(4.3/6.1)     CP

   P0400VR       201         201           CP10

   P0960AW       203         203           CP30

   P0961EF       20B         203           CP30B

   P0960JA       205         205           CP40

   P0961BC       20C         205           CP40B

   P0961FR       C101        N/A           CP60



 NOTES: 

 - On pre-6.2 releases, a CP30B or CP40B is configured as plain CP30 or
CP40.

 - A CP30B will boot with the letterbug of a configured CP40B and viciversa.




BM_1111) Check if the processes: "lsap_dsp", "romload_svr", and "mles" are
running on the host station. 


        Even if they are running they might not be working properly.
Try first to KILL and RESTART those processes.
Last resource is to REBOOT the Host.
(Specially if system alarm printer or smon_log, do not show "downld CIO_DB",
even when REPORTS HAVE BEEN ENABLED)

            4APB01# ps -ef | egrep 'lsap|romload|mles'

    root  1065  1020  0   Mar 06 ?        0:00 /usr/fox/exten/lsap_dsp

    root  1066  1020  0   Mar 06 ?        4:39 /usr/fox/exten/romload_srvr

    root  1383     1  0   Mar 06 ?        0:01 /usr/fox/bin/mles




BM_1212) Check if more than one host is trying to boot the same CP.



        Run "ds_stasict" on each Host (AP/AW) in the system. Only the host
should return results.
Just the presence of the CMXCPLBUG.BIN file on other hosts does not
necessarily mean they will try to boot the CP.
(rm_station removes the CP from the SICT tables but it does NOT remove the
CMX file).


BM_1313) Get configured NSAP & Pseudo MAC addresses for this CP from
/usr/fox/sp/IIF.prm:

        
        4APB01# grep 4CP3B1 IIF.prm

4APB01 4APB01 ASMON6 SMSTM  003 4CP3B1                   000000

4APB01 4CP3B1 OS1C3B ADRMAC 001 C000E2  <-- pseudo       000000

4APB01 4CP3B1 OS1C3B ADRNSP 001 000104  <== NSAP         000000

4APB01 4CP3B1 OS1C3B CPBPC  001 5                        000000


        Use command below to verify if this CP has the same NSAP address as
the other stations in the SAME node.
If the CP was configured for Node 4, it won't boot on nodes 1, 2, or 3...

        3AWE01# grep ADRNSP /usr/fox/sp/IIF.prm | sort +5


2AP201 2AP201 OS3FS3 ADRNSP 001 000102                   000000

2AP201 2COM01 OS1CS  ADRNSP 001 000102                   000000

...

3AWD01 3AB101 OS1ADH ADRNSP 001 000103                   000000

3AWD01 3AB201 OS1ADR ADRNSP 001 000103                   000000

...

4APB01 4APB01 OS6FS1 ADRNSP 001 000104                   000000

4APB01 4CP301 OS1C30 ADRNSP 001 000104                   000000

4APB01 4CP302 OS1C30 ADRNSP 001 000104                   000000

4APB01 4CP3B1 OS1C3B ADRNSP 001 000104                   000000

4APB01 4CP3B2 OS1C3B ADRNSP 001 000104                   000000

...




BM_1414) Verify the CP COMEX file has the right NSAP & pseudo MAC Addresses



        Note: Any station that can be configured as Fault Tolerant WILL use
a pseudo MAC address even if it was configured and installed as SINGLE. 

        

        
           4APB01# /usr/foxbin/bpatch /usr/fox/sp/files/CMX4CP3B1.BIN



     FILE: CMX4CP3B1.BIN (106) - ASCII


   PAGE: 0 (0 - 0)                                                          

       x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf     0123456789abcdef 

                                                                         

   00: 65 00 66 00 67 00 64 00 64 00 64 00 32 00 32 00      e.f.g.d.d.d.2.2.

   01: 32 00 e8 03 e8 03 e8 03 c8 00 c8 00 c8 00 5e 01      2.............^.

   02: 5e 01 5e 01 28 00 0a 00 74 0e 3c 00 3c 00 33 00      ^.^.(...t.<.<.3.

   03: 00 00 03 00 00 00 01 00 4b 00 28 00 00 05 50 01      ........K.(...P.

   04: 3d 00 3d 00 49 30 30 30 31 30 34 00 50 43 41 54      =.=.I000104.PCAT

                      ^^ ^^ ^^ ^^ ^^ ^^

                       0  0  0  1  0  4

                         NSAP address

   05: 30 30 00 00 14 00 14 00 00 00 00 00 00 00 00 00      00..............

   06: 00 00 00 00 00 00 6c c0 00 e2                        ......l...


                   ^^ ^^ ^^ ^^ ^^ ^^

                          6CC000E2

                   pseudo MAC address




BM_1515) Verify the NSAP address of the Boot host for boot problems on the
Local Node.



        There are times when a boot host won't boot the station because it
has a wrong NSAP. This can occur after a Day 0 installation, before the
AW/AP gets rebooted.

        Use the "fist" command on the boot host to verify what NSAP it is
using.
Looking just at its CMXCPLBUG.BIN will not tell you the truth since Software
Install will patch the kernel with the proper NSAP.

On a remote (Hostless) Node, the NSAP of the CP must match the NSAP of the
LAN module on that node.


BM_1616) Verify Real & pseudo MAC addresses of the CP with System
Management's EQUIP INFO: 


        
                                 SINGLE CP

PRIMARY MODE:      No Information        SHADOW MODE:       No Information

PRIM ROM ADDRESS:  00006C00DA062 (Real)  SHAD ROM ADDRESS:  000000000000

STATION ADDRESS:   00006CC000E2 (pseudo)



                          FT CP

PRIMARY MODE:      Married Prim          SHADOW MODE:       Married Shad

PRIM ROM ADDRESS:  00006C00DA062 (Real)  SHAD ROM ADDRESS:  00006C0C0950
(Real)

STATION ADDRESS:   00006CC000E2 (pseudo)




BM_1717) Use the (Solaris) truss command on the host AP/AW to see the
actions of romload_srvr, the main process that boots the CP. This will allow
you to see which files and directories are being accessed, opened, linked,
etc. This might tell you why this CP is not booting.




        First find the process number of romload_srvr:

            ps -ef | grep romload_srvr



        The process number is the second number of the first result line.
Once you identified the process number for romload_srvr, run the truss
command: 
            truss -p {romload_svr number}



        Now, insert the CP or depress its reset button.
After 60-80 seconds use CTL-C to stop the truss process.
It is better to save the output of truss to a file. Normal boot produces
about 4000 lines.
Examples:

        4APB01# ps -ef |grep romload


    root  1086  1039  0   Oct 23 ?        0:04 /usr/fox/exten/romload_srvr

    root 25733 15779  0 09:19:16 pts/2    0:00 grep romload               



4APB01# truss -p 1086                                                     

ioctl(20, 0x20003000, 0xEFFFF92C)               = 0                       

ioctl(20, 0x20003000, 0xEFFFF92C) (sleeping...)
...

   ... 4000 lines more will follow if boot is ok



To capture all lines into a file (/opt/z), use the command below:



     truss -o /opt/z -p {romload_svr pid}



To see which files have been opened/accessed/linked to boot the CP:

(I highlighted the main names).



4APB01# egrep 'open|stat|link|access|chdir' /opt/z

   

   open("/usr/fox/sp/files/CMX4CP3B1.BIN", O_RDONLY) = 8

   open("/usr/fox/sp/sict1.idx", O_RDWR)           = 8

   open("/usr/fox/sp/sict1.dat", O_RDWR)           = 9

   open("/usr/fox/sp/sict2.idx", O_RDWR)           = 8

   open("/usr/fox/sp/sict2.dat", O_RDWR)           = 9

   stat("/usr/fox/sp/locks/f4CP3B1", 0xEFFFEF54)   = 0

   link("/usr/fox/sp/locks/f4CP3B1", "/usr/fox/sp/locks/f4CP3B1-") = 0

   access("/usr/fox/sp/imag_over.cfg", 4)          = 0

   stat("/usr/fox/sp/imag_over.cfg", 0xEFFFF7E0)   = 0

   chdir("/usr/fox/sp/files/")             = 0

   stat("OS1C3B", 0xEFFFF7D0)              = 0

   stat("OS1C3B.GDT", 0xEFFFF7D0)                  = 0

   stat("CMX4CP3B1.BIN", 0xEFFFF7D0)               = 0

   chdir("/usr/fox/sysmgm/")               = 0

   chdir("softmgr/file/")                  = 0

   open("/usr/fox/sp/files/OS1C3B", O_RDONLY)      = 8

   open("/usr/fox/sysmgm/softmgr/file/sm_errs", O_WRONLY|O_APPEND|O_CREAT,
0666) = 9

   open("/usr/fox/sp/files/OS1C3B", O_RDONLY)      = 9

   open("/usr/fox/sp/files/OS1C3B.GDT", O_RDONLY)  = 8

   open("/usr/fox/sp/files/OS1C3B.GDT", O_RDONLY)  = 9

   open("/usr/fox/sp/files/CMX4CP3B1.BIN", O_RDONLY) = 8

   open("/usr/fox/sp/files/CMX4CP3B1.BIN", O_RDONLY) = 9

   fstat(5, 0xEFFFF670)                    = 0

   open("/usr/fox/sp/sldb", O_RDWR)                = 5

   unlink("/usr/fox/sp/locks/f4CP3B1-")            = 0

   fstat(5, 0xEFFFF670)                    = 0

   open("/usr/fox/sp/sldb", O_RDWR)                = 5

   4APB01#



  If the host was already trying to boot other (unexistent) stations,

    remove error lines with: grep -v Err z

        

  imag_over.cfg = Multi image support

  sm_errs = log file for romload_srvr, checkpt_srvr, ldthru_srvr,
reload_srvr, etc




        ADDENDUM:
Create log files to help you diagnose the problem: 

        A) smon_log (to get System Alarm messages. v6.2.x only)
Create directory "/opt/fox/sysmgm/sysmon" on host of the System Monitor that
looks at this CP, then create "smon_log" file by using: touch smon_log.
Remember to remove this file after solving your problem, otherwise it will
grow indefinitely.

        Typical messages after a successful cp boot:


        2002-05-22 09:57:38 6CP601 Software Manager SYSMON -00003 Powerup
reboot OK. ROM Addr = 00006C0FAD46

2002-05-22 09:57:47 6CP601 Station  SYSMON -00041 Equipment on-line

2002-05-22 09:57:59 6CP601 Software Manager SYSMON -00019 Database Load
Successful

2002-05-22 09:57:59 6CP601 Process = downld CIO_DB 000001 DATABASE DOWNLOAD:
RESOLVE LINKAGES SUCCESSFUL 

2002-05-22 09:58:31 6CP601 Process = downld CIO_DB 000003 DATABASE DOWNLOAD
COMPLETE  UNDEFINED BLOCK(S)

2002-05-22 09:58:34 6CP601 Equip = 6CP601 SYSMON -00051 Equipment has been
added on-line



        

        Messages if the CP was booted while option "DISABLE ALL REPORTS" was
active on SysMgmt:


        2002-05-22 10:03:33 6CP601 Software Manager SYSMON -00003 Powerup
reboot OK. ROM Addr = 00006C0FAD46

2002-05-22 10:03:43 6CP601 Station  SYSMON -00074 Enrolling station with
Report State : None

2002-05-22 10:03:43 6CP601 Station  SYSMON -00041 Equipment on-line

2002-05-22 10:03:52 6CP601 Software Manager SYSMON -00019 Database Load
Successful



        

        B) sm_errs
If you create file "/usr/fox/sysmgm/softmgr/file/sm_errs", on the AP/AW51
hosting the CP, the ERROR (or success) messages of the romload_srvr process
will be logged to that file.
Additionally, "checkpoint_srvr" and "reload_srvr" are logged.

Remember to remove this file after solving your problem, otherwise it will
grow indefinitely.
By the way this file will show if the CP was booted as style A or B (type:
205, 20b, 20c, etc). Useful on pre v6.2 systems.
See samples below.

        {6.2.1 system. 4CP3B1,4CP3B2 are CP30Bs, while 4CP4B2 is a CP40B}



4APB01# cd /usr/fox/sysmgm/softmgr/file

4APB01# touch sm_errs

... boot CPs, etc



4APB01# more /usr/fox/sysmgm/softmgr/file/sm_errs

2001-10-25 17:47:10 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 4CP3B2 -
sta_type=0x20b -- 819 blocks))

2001-10-25 17:49:53 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 4CP3B2 -
sta_type=0x20b -- 819 blocks))

2001-10-25 17:51:52 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 4CP3B1 -
sta_type=0x20b -- 819 blocks))

2001-10-25 17:51:58 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 4CP4B2 -
sta_type=0x20c -- 819 blocks))

2001-10-25 17:52:46 (romload_srvr) -39 0 (   ) (l_fn_03 - Load Failure -
retry or TMO)

2001-10-25 18:00:34 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 4CP4B2 -
sta_type=0x20c -- 819 blocks))



-----------------------------------------------------

{pre-6.2 system. 3CP401 & 3CP402 were both configured as CP40.

                 3CP401 module is a CP40A, while 3CP402 is a CP40B}

 

3AWD01# more sm_errs  

2001-10-29 11:00:01 (romload_srvr) -39 0 (   ) (read_override_file - access
error)

2001-10-29 11:00:01 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 3CP402 -
sta_type=0x205 -- 818 blocks))

2001-10-29 11:03:56 (romload_srvr) -39 0 (   ) (RLS - Overriding SICT
Filename (OS1C4B), (LID: 3CP401 - sta_type=20c))

2001-10-29 11:03:56 (romload_srvr) -39 0 (   ) (read_override_file - access
error)

2001-10-29 11:03:56 (romload_srvr) -39 0 (   ) (RLS - Booting (LID: 3CP401 -
sta_type=0x20c -- 819 blocks))




UPDATE: After installing QF 1004163, the station upload packet for
CP30A/CP40A is reduced.

/usr/fox/sysmgm/softmgr/file/sm_errs can be used to monitor the operations
of the upload operation. Below is a sample of this output:



2003-12-09 15:07:13 (upload_srvr) -39 0 (   )  

(Valid Upload request from A21C4A (type: 205)  

size: 400000 rea: 8 RLR vers: 44007c decode_rtn: 0) 

 2003-12-09 15:07:13 (upload_srvr) -39 0 (   )  

(opened lock file A21C4A.dmp+ ... before msgrcv ....) 

 2003-12-09 15:07:13 (upload_srvr) -39 0 (   ) (After msgrcv()) 

 2003-12-09 15:07:13 (upload_srvr) -39 0 (   ) (Reply with - dump_start: 

0 s_mem_adr: 0  dump_size: 400000  memsize: 925 

 Dump Size: 700   vers: 44007c) 

 2003-12-09 15:12:01 (upload_srvr) -39 0 (   )  

(UPLOAD_SERVER - # blocks were missed is 0) 

 2003-12-09 15:16:47 (upload_srvr) -39 0 (   )  

(UPLOAD_SERVER - # blocks were missed is 0) 

 2003-12-09 15:21:33 (upload_srvr) -39 0 (   )  

(UPLOAD_SERVER - # blocks were missed is 0) 

 2003-12-09 15:26:18 (upload_srvr) -39 0 (   )  

(UPLOAD_SERVER - # blocks were missed is 0) 

 

Validate that the dump directory contains the newly-created dump files.  

1. These files will be 1049522 bytes in length. 

2. There could be missed blocks due to File Server activity, nodebus
activity, etc. 





  _____  

BM_9b9b- Using SysMgmt:



        Open System Management. Select the AP/AW who host the CP.
Select PERF, SysMgmt Counters, Application Layer, and RESET ALL (to reset
all counters).
Have someone insert the CP (or depress its reset button) while you keep
clicking on the READ ALL button. Observe these two counters:
"LLC FRAMES TRANSMITTED" (AP/AW --> CP)
"LLC FRAMES RECEIVED" (AP/AW <-- CP)

        If the CP is putting out boot request packets you will see the "LLC
FRAMES RECEIVED" counter increment by 1-2.
If this AP/AW is the CP host you should also see hundreds of "LLC FRAMES
TRANSMITTED".

        You can usually tell if the problem is someone has 'disabled
download' because you will see 1-2 boot request packets come in and the
AP/AW boot host sends only one packet back. (Fix this by selecting in
SysMgmt the CP, CONFIG, EQUIP CHG, click on ENABLE DOWNLOAD).

        You can also use this method to find multiple boot hosts because
only the station responsible for answering a boot request should be sending
"LLC FRAMES TRANSMITTED".

BM_9c9c- Use a Foxwatch Comm processor to monitor Nodebus for XLLC Boot
packets. 



  _____  

Find out why your FBM does not boot (Red LED stays ON, Green stays Off) 

HARDWARE:

- Check terminators (RESISTORS). Are they ONLY at end rack of last cabinet?

- Is twinax shield connected to ground only only on last FB isolator?

- Swap/replace FB isolators if none of the FBMs boot.

- Check letterbugs. Are they upsde down? Are they fully inserted?

    Swap with known good ones. 

- Move FBM to another slot/rack.

- Swap module with one that is ok, from a rack you know it works.

- Swap nosecones.

- Check pins of nosecone.

- Bended pin on backplane?

- Remove I/O wires from nosecone.

- Are IPMs ok?



SOFTWARE:

- EEPROM update the FBM and check for activity on LED's

- Blink Green LED: eeprom updating OR in Failsafe condition (lost comm to
CP)

- Are the ECBs properly configured on ICC?  Software/Hardware type?

- Is MPOLL parameter ok?

- Corrupted/missing iom file on host?

- Has the CP host been changed (new box)? 

    Reboot CP or use HH845 (/opt/fox/bin/tools/cp_utl)

- Cycle power to baseplate with the FBM

- Cycle power to FCM (DCM)

- Sometimes you need to reboot the CP. See QF8707B



- If FBM is rebooted/power cycled, while the CP boot host is off-line, 

   the FBM will never boot up, even after restoration of the boot host. 

   It can be fixed with a new version of the fbmload task. See CAR 2000175



At 6.2.1: The maint task does not retry after a failed attempt to get 

          a new server address. This can be kick started again ( see below)

At 6.1.1: As above but cannot be restarted as the soft_mgr does not
recognise

          the request for server address.



Workarounds: (From Development)



At 6.2.1:



1. Kill the fbmload task

2. Turn the FBMs that need to be downloaded off line 

    (this will clear the pending actions and reset the delay timer)

3. Wait for them to go offline (this may take a while, with 'maint'

    running only every 20 seconds - be patient)

4. Hit the Download key for one of the FBMs

5. Wait for the SMDH info to report that the FBM is "NOT READY" and for 

    the diag status 4 to be FF.

    This can take up to several maint cycles.  

        If the CP is not able to talk to the FBM, you won't reach this
state.



    At this point, after one more cycle the 'maint' task should return 

        to its normal cycle.



6. Restart the fbmload task

   Things should now be back to normal.



  The problem will return whenever the following event sequence occurs:



1. The CP attempts to contact the fbmload server and fails

2. The attempt to obtain a new server address from system management fails
(1 try)



At 6.1.1:



You require a new soft_mgr task ( attached )



To install it (on the CPs host):



    cd /usr/fox/exten

    cp soft_mgr soft_mgr.61

    mkdir temp

    cd temp

    tar xvf /dev/fd0

    cd ..

    cp temp/* soft_mgr.new

    ps -eaf|grep soft

    kill (the soft_mgr process)



This works because when fox_monitor sees the .new process, it will 

replace the old soft_mgr with the .new, and respawn the process.


  _____  




Angel Corbera

Invensys - The Foxboro Company
Technical Assistance Center (Systems)
33 Commercial Street (B52-AA), Foxboro, MA 02035
Ph: 866-PHONIPS (866-746-6477); Fax: 508-549-4492

Please mark your calendar for the 2005 Invensys Process Systems Customer
Conference, October 3rd - 6th 2005 - Houston, TX. For more information,
please visit www.invensys.com/usergroup2005 <
http://www.invensys.com/usergroup2005
<http://www.invensys.com/usergroup2005> >.



-----Original Message-----
From: foxboro-bounce@xxxxxxxxxxxxx
[ mailto:foxboro-bounce@xxxxxxxxxxxxx <mailto:foxboro-bounce@xxxxxxxxxxxxx>
]On Behalf Of David Johnson
Sent: Wednesday, September 14, 2005 10:46 AM
To: foxboro@xxxxxxxxxxxxx
Subject: Re: [foxboro] CP60 will not boot


Thanks to everyone, but no luck yet.

Terry,
        CP download is enabled.
Russ,
        copied to BBXXXXXX.UC  to DBXXXXXX.UC no luck
        copied DBXXXXXX.init to DBXXXXXX.UC no luck.
        put original DBXXXXXX.UC back.

        All of the files pointed to look OK.

Angel,
        looking for troubleshooting guide on TAC website.
        Haven't found it yet.


Here's a better description of the problem.
        AW51F is hosting this one CP only.
        DNBT connects AW51F to rack with only DNBT and CP60 in the rack.
        DNBT link light is green.
        DNBT NODEBUS light will flash one time when the CP is reset (or push
pulled) then no more on that side.
        DNBT TP Activity light flashes periodically (every 5 to 10 seconds)

        Everything was communicating and working yesterday when I rebooted
the AW,
(waited for it to come back up, and online), then rebooted the CP60.  At
that point I got stuck.  Letterbug is correct and appears to be correctly
inserted, no bent pins.  The CP has successfully rebooted in this
configuration about a week ago on a power outage.  I have not made any
sysdef or image changes.

Any help would be appreciated.

Thanks,
David

       



_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
<http://www.thecassandraproject.org/disclaimer.html> 

foxboro mailing list:             //www.freelists.org/list/foxboro
<//www.freelists.org/list/foxboro> 
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
<mailto:foxboro-request@xxxxxxxxxxxxx?subject=join> 
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
<mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave> 





 
 
_______________________________________________________________________
This mailing list is neither sponsored nor endorsed by Invensys Process
Systems (formerly The Foxboro Company). Use the info you obtain here at
your own risks. Read http://www.thecassandraproject.org/disclaimer.html
 
foxboro mailing list:             //www.freelists.org/list/foxboro
to subscribe:         mailto:foxboro-request@xxxxxxxxxxxxx?subject=join
to unsubscribe:      mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave
 

Other related posts: