David, Here is my troubleshooting guide. So far, it has never failed to anyone. Regards TROUBLESHOOTING CP or FBM BOOTING PROBLEMS (Last update: Nov 2003) Find out why your CP does not boot (leds stay RED-GREEN) - On the examples below replace "CPLBUG" or "4CP3B1" -a CP30B module- with your actual CP letterbug". - Tests were done at v6.2.1 and 6.1.2 - See "FBM won't boot" at the end. NOTES: - A CP will go GREEN after its Operating System has been loaded, but before its control database gets loaded. - The Shadow module of a FT CP won't boot if the software on the host is different from the software running in the Primary module. - RED LED only: Bad Module, Noisy Nodebus, Bad PIO Bus, Bad X-clip, or Bad Z-clip. - LEDs off: replace the module. This brief time-action summary might help understand the troubleshooting steps. Times are for reference only. m:ss 0:00 CP is rebooted or reset 0:08 CP sends boot request to all APs (verify with: snoop -t a -x 0 01:00:6c:0f:0f:0f) ... CP host verifies CP Comex file and SICT tables 0:09 Host creates lock file in /usr/fox/sp/locks: fCPLBUG and fCPLBUG- ... checks if image overlay file exist ... verify if all 3 files exist in /usr/fox/sp/files: OS, GDT, map ... reads OS file ... writes to log file /usr/fox/sysmgm/softmgr/file/sm_errs ... loads CP Operating System ... 0:55 Lock file fCPLBUG- is removed 0:56 CP goes GREEN TROUBLESHOOTING SUMMARY: Note: If file from step 8 is created you can skip steps 1-2-3-4-5-6. 1) Clip or no clip? <file:///D:/fox/doc/CP%20won't%20boot.htm#1> 2) Try other slot <file:///D:/fox/doc/CP%20won't%20boot.htm#2> 3) Reseat/check/replace <file:///D:/fox/doc/CP%20won't%20boot.htm#3> letterbug. Try upside down. 4) Try other module. Bent <file:///D:/fox/doc/CP%20won't%20boot.htm#4> pins? 5) SysMgmt: <file:///D:/fox/doc/CP%20won't%20boot.htm#5> "Enable Download" grayed out?; DISABLE ALL REPORTS active? 6) Is the Host from sldb ok? Does <file:///D:/fox/doc/CP%20won't%20boot.htm#6> it boot a Comm processor? 7) Will this AP/AW boot the CP?. <file:///D:/fox/doc/CP%20won't%20boot.htm#7> Use ds_stasict. Check boot files (date, size, sum) from result. 8) Is the file <file:///D:/fox/doc/CP%20won't%20boot.htm#8> "/usr/fox/sp/locks/fCPLBUG-" created? 9) Use snoop to monitor boot <file:///D:/fox/doc/CP%20won't%20boot.htm#9> packets (or: 9b <file:///D:/fox/doc/CP%20won't%20boot.htm#9b> or 9c <file:///D:/fox/doc/CP%20won't%20boot.htm#9c> ) 10) Module PN ok? <file:///D:/fox/doc/CP%20won't%20boot.htm#10> 11) Check/Kill/Restart: <file:///D:/fox/doc/CP%20won't%20boot.htm#11> "lsap_dsp", "romload_svr", and "mles" (downld CIO_DB ?) 12) Multiple hosts? Run <file:///D:/fox/doc/CP%20won't%20boot.htm#12> ds_stasict on other hosts 13) Get CP's NSAP and Pseudo Mac <file:///D:/fox/doc/CP%20won't%20boot.htm#13> address from IIF.prm 14) Check CP Comex file for NSAP <file:///D:/fox/doc/CP%20won't%20boot.htm#14> and Pseudo Mac address 15) Check NSAP of boot host. Use <file:///D:/fox/doc/CP%20won't%20boot.htm#15> fist 16) SysMgmt's <file:///D:/fox/doc/CP%20won't%20boot.htm#16> Real and pseudo Mac addresses 17) Insert CP and run: truss -p <file:///D:/fox/doc/CP%20won't%20boot.htm#17> {romload_svr pid}. Log file: sm_errs Optional: 9b) SysMgmt Host Application <file:///D:/fox/doc/CP%20won't%20boot.htm#9b> layer: LLC FRAMES TRANSMITTED/RECEIVED 9c) Foxwatch: Monitor Nodebus for <file:///D:/fox/doc/CP%20won't%20boot.htm#9c> XLLC packets _____ DETAILED PROCEDURES: BM_11) If the CP is single be sure no clip is on the back of the slot (especially if the next slot has another station) BM_22) Try to use a different slot. BM_33) Check Letterbug: - Reseat letterbug - Verify for "0" and "O", "I" and "1". - Replace letterbug - Install letterbug upside down. BM_44) Check the module for bent pins. If available try using a different module. BM_55) Find this CP on SysMgmt and verify if "ENABLE DOWNLOAD", under EQUIP CHG, is grayed out (enabled). If not, correct it by selecting "ENABLE DOWNLOAD" once. If you "DISABLE ALL REPORTS" for this CP on System Management, this is what will happen: - DISABLE ALL REPORTS will NOT be grayed out. It will remain white. - EQUIP INFO will show "SM REPORT STATE: No Reporting". - The CP will show the "FBM 0" box with "NONE" as letterbug. - System Alarm printer (or smon_log) will show: SYSMON -00074 Enrolling station with Report State : None BM_66) Find the CP host from /usr/fox/sp/sldb. Is the host (AP/AW) alive and online? 4APB01# grep 4CP3B1 /usr/fox/sp/sldb 4CP3B1 4APB01 4APB01 SYMN4A ^^^ ^^ CP Host If not sure if the host (AW51 or AW70) is connected to the nodebus, verify if it boots a Comm processor with "CSBOOT" letterbug. Reminder: If you have several hosts in the system, anyone of them could boot the CSBOOT module. If possible, isolate desired AW51/AW70 and CSBOOT module. BM_77) Use "ds_stasict" to verify if the host thinks it is supposed to boot the CP and to show which files are required. You should confirm those files really exist. If ds_stasict returns nothing, this host does NOT think it is supposed to boot the CP. 4APB01# /usr/fox/swi/ds_stasict 4CP3B1 4CP3B1 Sw_version -> 0 Station Type -> NFT IMAGE records: 4CP3B1, 1, 00002000, 00000600, 00000000, OS1C3B, no note 4CP3B1, 2, 00000600, 00000600, 00000000, OS1C3B.GDT, no note 4CP3B1, 5, 0000C840, 00000600, 00000000, CMX4CP3B1.BIN, no note CHECKPOINT records: 4CP3B1, 4, 00000000, 00000600, 00000000, DB4CP3B1.UC, no note EEPROM records: 4CP3B1, 6, 00000600, 00000600, 00000000, eu_c3b.bin, no note NOTE: If using a CP30B or CP40B at pre-6.2 release, ds_stasict will report boot files for a regular CP30 or CP40. See example below: 3AWD01# /usr/fox/swi/ds_stasict 3CP401 3CP401 Sw_version -> 0 Station Type -> NFT IMAGE records: 3CP401, 1, 00002000, 00000600, 00000000, OS1C40, no note 3CP401, 2, 00000600, 00000600, 00000000, OS1C40.GDT, no note 3CP401, 5, 0000C840, 00000600, 00000000, CMX3CP401.BIN, no note CHECKPOINT records: 3CP401, 4, 00000000, 00000600, 00000000, DB3CP401.UC, no note EEPROM records: 3CP401, 6, 00000600, 00000600, 00000000, eu_c40.bin, no note (The OpSys, GDT and bin files belong to a CP40A, not to a CP40B) Verify also size, date, and checksum of the CP Image file: /usr/fox/sp/files/OS1C30, OS1C40, OS1C3B, OS1C4B, OS1C60 Check it against a known good one. NOTE: The image file might be corrupted (power failure, RAID drives, no file sync, etc). Just because the CP booted yesterday, don't asume the image file is ok today. BM_88) Check for lock files. To see how far the boot process goes, do this: Remove all files in /usr/fox/sp/locks (with CP letterbug in their filenames) Insert the CP (or reset it) Check if file "fCPLBUG" is created, and "fCPLBUG-" links to it. When boot is done, "fCPLBUG-" is removed but "fCPLBUG" remains. Note the date. If file "fCPLBUG-" is not created use next step to see if letterbug is correct. If file "fCPLBUG-" is created you know the letterbug is correct and the SICT tables have this CP as a valid station to be booted. If the file "fCPLBUG-" is created but not removed then something is wrong with the OpSys, GDT or map file for that CP. See more details in step 17 <file:///D:/fox/doc/CP%20won't%20boot.htm#17> . BM_99) Use (Solaris) snoop command to look at boot request packets. Follow this procedure on ANY AP/AW51 connected to the nodebus, to observe the (XLLC) boot request packets from the CP to all APs. Type: snoop -x 0 01:00:6c:0f:0f:0f Insert the CP (or depress its reset button). After about 8-10 seconds the screen will show the first packet. If CP letterbug is correct, and its host starts booting it, you will see maybe 1-2 more messages. If CP letterbug is NOT correct, or its host is not available, the same message will repeat every 8 seconds. Type CTL-C to stop snoop. The ascii (right) section shows the actual LETTERBUG of the CP requesting boot packets. See sample below: 4APB01# snoop -x 0 01:00:6c:0f:0f:0f Using device /dev/le (promiscuous mode) ? -> (multicast) ETHER Type=8501 (Unknown), size = 63 bytes 0: 0100 6c0f 0f0f 0000 6c0d a061 0031 4040 ..l.....l..a.1@@ 16: c0a1 80a0 8082 0204 0083 0101 8501 01a6 ................ 32: 8030 80a1 8081 0202 0b00 0082 0834 4350 .0...........4CP 48: 3342 3200 0000 0000 0000 0000 0060 42 3B2..........`B If desired, the snoop command can capture packets into a log file. Use snoop again to examine it. See samples below. Sample 1: CP letterbug correct, host available, files ok. 4APB01# snoop -o snoop.log 01:00:6c:0f:0f:0f Using device /dev/le (promiscuous mode) 0 (cursor starts blinking) (Insert CP now) CTL-C (1 minute later) 4APB01# snoop -i snoop.log -t r -x 0 01:00:6c:0f:0f:0f 1 0.00000 ? -> (multicast) ETHER Type=8501 (Unknown), size = 63 bytes 0: 0100 6c0f 0f0f 0000 6c0d a061 0031 4040 ..l.....l..a.1@@ 16: c0a1 80a0 8082 0204 0083 0101 8501 01a6 ................ 32: 8030 80a1 8081 0202 0b00 0082 0834 4350 .0...........4CP 48: 3342 3200 0000 0000 0000 0000 0060 42 3B2..........`B Sample 2: CP letterbug is wrong. Messages from the "wrong station" keep coming every 8 seconds. 4APB01# snoop -i snoop.log -t r -x 0 01:00:6c:0f:0f:0f 1 0.00000 ? -> (multicast) ETHER Type=8501 (Unknown), size = 63 bytes 0: 0100 6c0f 0f0f 0000 6c0c 0047 0031 4040 ..l.....l..G.1@@ 16: c0a1 80a0 8082 0204 0083 0101 8501 01a6 ................ 32: 8030 80a1 8081 0202 0c00 0082 0834 4350 .0...........4CP 48: 3442 4300 0000 0000 0000 0000 006f 20 4BC..........o 2 8.00789 ? -> (multicast) ETHER Type=8501 (Unknown), size = 63 bytes 0: 0100 6c0f 0f0f 0000 6c0c 0047 0031 4040 ..l.....l..G.1@@ 16: c0a1 80a0 8082 0204 0083 0101 8501 01a6 ................ 32: 8030 80a1 8081 0202 0c00 0082 0834 4350 .0...........4CP 48: 3442 4300 0000 0000 0000 0000 006f 20 4BC..........o 3 16.78556 ? -> (multicast) ETHER Type=8501 (Unknown), size = 63 bytes 0: 0100 6c0f 0f0f 0000 6c0c 0047 0031 4040 ..l.....l..G.1@@ 16: c0a1 80a0 8082 0204 0083 0101 8501 04a6 ................ 32: 8030 80a1 8081 0202 0c00 0082 0834 4350 .0...........4CP 48: 3442 4300 0000 0000 0000 0000 000f 7d 4BC...........} BM_1010) Verify if the module Part Number corresponds to the CP type configured: 4APB01# grep 4CP3B1 /usr/fox/sp/hldb 4CP3B1 20B ^^ ^^ CPlbug CPtype PartNumber hldb(6.2) hldb(4.3/6.1) CP P0400VR 201 201 CP10 P0960AW 203 203 CP30 P0961EF 20B 203 CP30B P0960JA 205 205 CP40 P0961BC 20C 205 CP40B P0961FR C101 N/A CP60 NOTES: - On pre-6.2 releases, a CP30B or CP40B is configured as plain CP30 or CP40. - A CP30B will boot with the letterbug of a configured CP40B and viciversa. BM_1111) Check if the processes: "lsap_dsp", "romload_svr", and "mles" are running on the host station. Even if they are running they might not be working properly. Try first to KILL and RESTART those processes. Last resource is to REBOOT the Host. (Specially if system alarm printer or smon_log, do not show "downld CIO_DB", even when REPORTS HAVE BEEN ENABLED) 4APB01# ps -ef | egrep 'lsap|romload|mles' root 1065 1020 0 Mar 06 ? 0:00 /usr/fox/exten/lsap_dsp root 1066 1020 0 Mar 06 ? 4:39 /usr/fox/exten/romload_srvr root 1383 1 0 Mar 06 ? 0:01 /usr/fox/bin/mles BM_1212) Check if more than one host is trying to boot the same CP. Run "ds_stasict" on each Host (AP/AW) in the system. Only the host should return results. Just the presence of the CMXCPLBUG.BIN file on other hosts does not necessarily mean they will try to boot the CP. (rm_station removes the CP from the SICT tables but it does NOT remove the CMX file). BM_1313) Get configured NSAP & Pseudo MAC addresses for this CP from /usr/fox/sp/IIF.prm: 4APB01# grep 4CP3B1 IIF.prm 4APB01 4APB01 ASMON6 SMSTM 003 4CP3B1 000000 4APB01 4CP3B1 OS1C3B ADRMAC 001 C000E2 <-- pseudo 000000 4APB01 4CP3B1 OS1C3B ADRNSP 001 000104 <== NSAP 000000 4APB01 4CP3B1 OS1C3B CPBPC 001 5 000000 Use command below to verify if this CP has the same NSAP address as the other stations in the SAME node. If the CP was configured for Node 4, it won't boot on nodes 1, 2, or 3... 3AWE01# grep ADRNSP /usr/fox/sp/IIF.prm | sort +5 2AP201 2AP201 OS3FS3 ADRNSP 001 000102 000000 2AP201 2COM01 OS1CS ADRNSP 001 000102 000000 ... 3AWD01 3AB101 OS1ADH ADRNSP 001 000103 000000 3AWD01 3AB201 OS1ADR ADRNSP 001 000103 000000 ... 4APB01 4APB01 OS6FS1 ADRNSP 001 000104 000000 4APB01 4CP301 OS1C30 ADRNSP 001 000104 000000 4APB01 4CP302 OS1C30 ADRNSP 001 000104 000000 4APB01 4CP3B1 OS1C3B ADRNSP 001 000104 000000 4APB01 4CP3B2 OS1C3B ADRNSP 001 000104 000000 ... BM_1414) Verify the CP COMEX file has the right NSAP & pseudo MAC Addresses Note: Any station that can be configured as Fault Tolerant WILL use a pseudo MAC address even if it was configured and installed as SINGLE. 4APB01# /usr/foxbin/bpatch /usr/fox/sp/files/CMX4CP3B1.BIN FILE: CMX4CP3B1.BIN (106) - ASCII PAGE: 0 (0 - 0) x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xa xb xc xd xe xf 0123456789abcdef 00: 65 00 66 00 67 00 64 00 64 00 64 00 32 00 32 00 e.f.g.d.d.d.2.2. 01: 32 00 e8 03 e8 03 e8 03 c8 00 c8 00 c8 00 5e 01 2.............^. 02: 5e 01 5e 01 28 00 0a 00 74 0e 3c 00 3c 00 33 00 ^.^.(...t.<.<.3. 03: 00 00 03 00 00 00 01 00 4b 00 28 00 00 05 50 01 ........K.(...P. 04: 3d 00 3d 00 49 30 30 30 31 30 34 00 50 43 41 54 =.=.I000104.PCAT ^^ ^^ ^^ ^^ ^^ ^^ 0 0 0 1 0 4 NSAP address 05: 30 30 00 00 14 00 14 00 00 00 00 00 00 00 00 00 00.............. 06: 00 00 00 00 00 00 6c c0 00 e2 ......l... ^^ ^^ ^^ ^^ ^^ ^^ 6CC000E2 pseudo MAC address BM_1515) Verify the NSAP address of the Boot host for boot problems on the Local Node. There are times when a boot host won't boot the station because it has a wrong NSAP. This can occur after a Day 0 installation, before the AW/AP gets rebooted. Use the "fist" command on the boot host to verify what NSAP it is using. Looking just at its CMXCPLBUG.BIN will not tell you the truth since Software Install will patch the kernel with the proper NSAP. On a remote (Hostless) Node, the NSAP of the CP must match the NSAP of the LAN module on that node. BM_1616) Verify Real & pseudo MAC addresses of the CP with System Management's EQUIP INFO: SINGLE CP PRIMARY MODE: No Information SHADOW MODE: No Information PRIM ROM ADDRESS: 00006C00DA062 (Real) SHAD ROM ADDRESS: 000000000000 STATION ADDRESS: 00006CC000E2 (pseudo) FT CP PRIMARY MODE: Married Prim SHADOW MODE: Married Shad PRIM ROM ADDRESS: 00006C00DA062 (Real) SHAD ROM ADDRESS: 00006C0C0950 (Real) STATION ADDRESS: 00006CC000E2 (pseudo) BM_1717) Use the (Solaris) truss command on the host AP/AW to see the actions of romload_srvr, the main process that boots the CP. This will allow you to see which files and directories are being accessed, opened, linked, etc. This might tell you why this CP is not booting. First find the process number of romload_srvr: ps -ef | grep romload_srvr The process number is the second number of the first result line. Once you identified the process number for romload_srvr, run the truss command: truss -p {romload_svr number} Now, insert the CP or depress its reset button. After 60-80 seconds use CTL-C to stop the truss process. It is better to save the output of truss to a file. Normal boot produces about 4000 lines. Examples: 4APB01# ps -ef |grep romload root 1086 1039 0 Oct 23 ? 0:04 /usr/fox/exten/romload_srvr root 25733 15779 0 09:19:16 pts/2 0:00 grep romload 4APB01# truss -p 1086 ioctl(20, 0x20003000, 0xEFFFF92C) = 0 ioctl(20, 0x20003000, 0xEFFFF92C) (sleeping...) ... ... 4000 lines more will follow if boot is ok To capture all lines into a file (/opt/z), use the command below: truss -o /opt/z -p {romload_svr pid} To see which files have been opened/accessed/linked to boot the CP: (I highlighted the main names). 4APB01# egrep 'open|stat|link|access|chdir' /opt/z open("/usr/fox/sp/files/CMX4CP3B1.BIN", O_RDONLY) = 8 open("/usr/fox/sp/sict1.idx", O_RDWR) = 8 open("/usr/fox/sp/sict1.dat", O_RDWR) = 9 open("/usr/fox/sp/sict2.idx", O_RDWR) = 8 open("/usr/fox/sp/sict2.dat", O_RDWR) = 9 stat("/usr/fox/sp/locks/f4CP3B1", 0xEFFFEF54) = 0 link("/usr/fox/sp/locks/f4CP3B1", "/usr/fox/sp/locks/f4CP3B1-") = 0 access("/usr/fox/sp/imag_over.cfg", 4) = 0 stat("/usr/fox/sp/imag_over.cfg", 0xEFFFF7E0) = 0 chdir("/usr/fox/sp/files/") = 0 stat("OS1C3B", 0xEFFFF7D0) = 0 stat("OS1C3B.GDT", 0xEFFFF7D0) = 0 stat("CMX4CP3B1.BIN", 0xEFFFF7D0) = 0 chdir("/usr/fox/sysmgm/") = 0 chdir("softmgr/file/") = 0 open("/usr/fox/sp/files/OS1C3B", O_RDONLY) = 8 open("/usr/fox/sysmgm/softmgr/file/sm_errs", O_WRONLY|O_APPEND|O_CREAT, 0666) = 9 open("/usr/fox/sp/files/OS1C3B", O_RDONLY) = 9 open("/usr/fox/sp/files/OS1C3B.GDT", O_RDONLY) = 8 open("/usr/fox/sp/files/OS1C3B.GDT", O_RDONLY) = 9 open("/usr/fox/sp/files/CMX4CP3B1.BIN", O_RDONLY) = 8 open("/usr/fox/sp/files/CMX4CP3B1.BIN", O_RDONLY) = 9 fstat(5, 0xEFFFF670) = 0 open("/usr/fox/sp/sldb", O_RDWR) = 5 unlink("/usr/fox/sp/locks/f4CP3B1-") = 0 fstat(5, 0xEFFFF670) = 0 open("/usr/fox/sp/sldb", O_RDWR) = 5 4APB01# If the host was already trying to boot other (unexistent) stations, remove error lines with: grep -v Err z imag_over.cfg = Multi image support sm_errs = log file for romload_srvr, checkpt_srvr, ldthru_srvr, reload_srvr, etc ADDENDUM: Create log files to help you diagnose the problem: A) smon_log (to get System Alarm messages. v6.2.x only) Create directory "/opt/fox/sysmgm/sysmon" on host of the System Monitor that looks at this CP, then create "smon_log" file by using: touch smon_log. Remember to remove this file after solving your problem, otherwise it will grow indefinitely. Typical messages after a successful cp boot: 2002-05-22 09:57:38 6CP601 Software Manager SYSMON -00003 Powerup reboot OK. ROM Addr = 00006C0FAD46 2002-05-22 09:57:47 6CP601 Station SYSMON -00041 Equipment on-line 2002-05-22 09:57:59 6CP601 Software Manager SYSMON -00019 Database Load Successful 2002-05-22 09:57:59 6CP601 Process = downld CIO_DB 000001 DATABASE DOWNLOAD: RESOLVE LINKAGES SUCCESSFUL 2002-05-22 09:58:31 6CP601 Process = downld CIO_DB 000003 DATABASE DOWNLOAD COMPLETE UNDEFINED BLOCK(S) 2002-05-22 09:58:34 6CP601 Equip = 6CP601 SYSMON -00051 Equipment has been added on-line Messages if the CP was booted while option "DISABLE ALL REPORTS" was active on SysMgmt: 2002-05-22 10:03:33 6CP601 Software Manager SYSMON -00003 Powerup reboot OK. ROM Addr = 00006C0FAD46 2002-05-22 10:03:43 6CP601 Station SYSMON -00074 Enrolling station with Report State : None 2002-05-22 10:03:43 6CP601 Station SYSMON -00041 Equipment on-line 2002-05-22 10:03:52 6CP601 Software Manager SYSMON -00019 Database Load Successful B) sm_errs If you create file "/usr/fox/sysmgm/softmgr/file/sm_errs", on the AP/AW51 hosting the CP, the ERROR (or success) messages of the romload_srvr process will be logged to that file. Additionally, "checkpoint_srvr" and "reload_srvr" are logged. Remember to remove this file after solving your problem, otherwise it will grow indefinitely. By the way this file will show if the CP was booted as style A or B (type: 205, 20b, 20c, etc). Useful on pre v6.2 systems. See samples below. {6.2.1 system. 4CP3B1,4CP3B2 are CP30Bs, while 4CP4B2 is a CP40B} 4APB01# cd /usr/fox/sysmgm/softmgr/file 4APB01# touch sm_errs ... boot CPs, etc 4APB01# more /usr/fox/sysmgm/softmgr/file/sm_errs 2001-10-25 17:47:10 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 4CP3B2 - sta_type=0x20b -- 819 blocks)) 2001-10-25 17:49:53 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 4CP3B2 - sta_type=0x20b -- 819 blocks)) 2001-10-25 17:51:52 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 4CP3B1 - sta_type=0x20b -- 819 blocks)) 2001-10-25 17:51:58 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 4CP4B2 - sta_type=0x20c -- 819 blocks)) 2001-10-25 17:52:46 (romload_srvr) -39 0 ( ) (l_fn_03 - Load Failure - retry or TMO) 2001-10-25 18:00:34 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 4CP4B2 - sta_type=0x20c -- 819 blocks)) ----------------------------------------------------- {pre-6.2 system. 3CP401 & 3CP402 were both configured as CP40. 3CP401 module is a CP40A, while 3CP402 is a CP40B} 3AWD01# more sm_errs 2001-10-29 11:00:01 (romload_srvr) -39 0 ( ) (read_override_file - access error) 2001-10-29 11:00:01 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 3CP402 - sta_type=0x205 -- 818 blocks)) 2001-10-29 11:03:56 (romload_srvr) -39 0 ( ) (RLS - Overriding SICT Filename (OS1C4B), (LID: 3CP401 - sta_type=20c)) 2001-10-29 11:03:56 (romload_srvr) -39 0 ( ) (read_override_file - access error) 2001-10-29 11:03:56 (romload_srvr) -39 0 ( ) (RLS - Booting (LID: 3CP401 - sta_type=0x20c -- 819 blocks)) UPDATE: After installing QF 1004163, the station upload packet for CP30A/CP40A is reduced. /usr/fox/sysmgm/softmgr/file/sm_errs can be used to monitor the operations of the upload operation. Below is a sample of this output: 2003-12-09 15:07:13 (upload_srvr) -39 0 ( ) (Valid Upload request from A21C4A (type: 205) size: 400000 rea: 8 RLR vers: 44007c decode_rtn: 0) 2003-12-09 15:07:13 (upload_srvr) -39 0 ( ) (opened lock file A21C4A.dmp+ ... before msgrcv ....) 2003-12-09 15:07:13 (upload_srvr) -39 0 ( ) (After msgrcv()) 2003-12-09 15:07:13 (upload_srvr) -39 0 ( ) (Reply with - dump_start: 0 s_mem_adr: 0 dump_size: 400000 memsize: 925 Dump Size: 700 vers: 44007c) 2003-12-09 15:12:01 (upload_srvr) -39 0 ( ) (UPLOAD_SERVER - # blocks were missed is 0) 2003-12-09 15:16:47 (upload_srvr) -39 0 ( ) (UPLOAD_SERVER - # blocks were missed is 0) 2003-12-09 15:21:33 (upload_srvr) -39 0 ( ) (UPLOAD_SERVER - # blocks were missed is 0) 2003-12-09 15:26:18 (upload_srvr) -39 0 ( ) (UPLOAD_SERVER - # blocks were missed is 0) Validate that the dump directory contains the newly-created dump files. 1. These files will be 1049522 bytes in length. 2. There could be missed blocks due to File Server activity, nodebus activity, etc. _____ BM_9b9b- Using SysMgmt: Open System Management. Select the AP/AW who host the CP. Select PERF, SysMgmt Counters, Application Layer, and RESET ALL (to reset all counters). Have someone insert the CP (or depress its reset button) while you keep clicking on the READ ALL button. Observe these two counters: "LLC FRAMES TRANSMITTED" (AP/AW --> CP) "LLC FRAMES RECEIVED" (AP/AW <-- CP) If the CP is putting out boot request packets you will see the "LLC FRAMES RECEIVED" counter increment by 1-2. If this AP/AW is the CP host you should also see hundreds of "LLC FRAMES TRANSMITTED". You can usually tell if the problem is someone has 'disabled download' because you will see 1-2 boot request packets come in and the AP/AW boot host sends only one packet back. (Fix this by selecting in SysMgmt the CP, CONFIG, EQUIP CHG, click on ENABLE DOWNLOAD). You can also use this method to find multiple boot hosts because only the station responsible for answering a boot request should be sending "LLC FRAMES TRANSMITTED". BM_9c9c- Use a Foxwatch Comm processor to monitor Nodebus for XLLC Boot packets. _____ Find out why your FBM does not boot (Red LED stays ON, Green stays Off) HARDWARE: - Check terminators (RESISTORS). Are they ONLY at end rack of last cabinet? - Is twinax shield connected to ground only only on last FB isolator? - Swap/replace FB isolators if none of the FBMs boot. - Check letterbugs. Are they upsde down? Are they fully inserted? Swap with known good ones. - Move FBM to another slot/rack. - Swap module with one that is ok, from a rack you know it works. - Swap nosecones. - Check pins of nosecone. - Bended pin on backplane? - Remove I/O wires from nosecone. - Are IPMs ok? SOFTWARE: - EEPROM update the FBM and check for activity on LED's - Blink Green LED: eeprom updating OR in Failsafe condition (lost comm to CP) - Are the ECBs properly configured on ICC? Software/Hardware type? - Is MPOLL parameter ok? - Corrupted/missing iom file on host? - Has the CP host been changed (new box)? Reboot CP or use HH845 (/opt/fox/bin/tools/cp_utl) - Cycle power to baseplate with the FBM - Cycle power to FCM (DCM) - Sometimes you need to reboot the CP. See QF8707B - If FBM is rebooted/power cycled, while the CP boot host is off-line, the FBM will never boot up, even after restoration of the boot host. It can be fixed with a new version of the fbmload task. See CAR 2000175 At 6.2.1: The maint task does not retry after a failed attempt to get a new server address. This can be kick started again ( see below) At 6.1.1: As above but cannot be restarted as the soft_mgr does not recognise the request for server address. Workarounds: (From Development) At 6.2.1: 1. Kill the fbmload task 2. Turn the FBMs that need to be downloaded off line (this will clear the pending actions and reset the delay timer) 3. Wait for them to go offline (this may take a while, with 'maint' running only every 20 seconds - be patient) 4. Hit the Download key for one of the FBMs 5. Wait for the SMDH info to report that the FBM is "NOT READY" and for the diag status 4 to be FF. This can take up to several maint cycles. If the CP is not able to talk to the FBM, you won't reach this state. At this point, after one more cycle the 'maint' task should return to its normal cycle. 6. Restart the fbmload task Things should now be back to normal. The problem will return whenever the following event sequence occurs: 1. The CP attempts to contact the fbmload server and fails 2. The attempt to obtain a new server address from system management fails (1 try) At 6.1.1: You require a new soft_mgr task ( attached ) To install it (on the CPs host): cd /usr/fox/exten cp soft_mgr soft_mgr.61 mkdir temp cd temp tar xvf /dev/fd0 cd .. cp temp/* soft_mgr.new ps -eaf|grep soft kill (the soft_mgr process) This works because when fox_monitor sees the .new process, it will replace the old soft_mgr with the .new, and respawn the process. _____ Angel Corbera Invensys - The Foxboro Company Technical Assistance Center (Systems) 33 Commercial Street (B52-AA), Foxboro, MA 02035 Ph: 866-PHONIPS (866-746-6477); Fax: 508-549-4492 Please mark your calendar for the 2005 Invensys Process Systems Customer Conference, October 3rd - 6th 2005 - Houston, TX. For more information, please visit www.invensys.com/usergroup2005 < http://www.invensys.com/usergroup2005 <http://www.invensys.com/usergroup2005> >. -----Original Message----- From: foxboro-bounce@xxxxxxxxxxxxx [ mailto:foxboro-bounce@xxxxxxxxxxxxx <mailto:foxboro-bounce@xxxxxxxxxxxxx> ]On Behalf Of David Johnson Sent: Wednesday, September 14, 2005 10:46 AM To: foxboro@xxxxxxxxxxxxx Subject: Re: [foxboro] CP60 will not boot Thanks to everyone, but no luck yet. Terry, CP download is enabled. Russ, copied to BBXXXXXX.UC to DBXXXXXX.UC no luck copied DBXXXXXX.init to DBXXXXXX.UC no luck. put original DBXXXXXX.UC back. All of the files pointed to look OK. Angel, looking for troubleshooting guide on TAC website. Haven't found it yet. Here's a better description of the problem. AW51F is hosting this one CP only. DNBT connects AW51F to rack with only DNBT and CP60 in the rack. DNBT link light is green. DNBT NODEBUS light will flash one time when the CP is reset (or push pulled) then no more on that side. DNBT TP Activity light flashes periodically (every 5 to 10 seconds) Everything was communicating and working yesterday when I rebooted the AW, (waited for it to come back up, and online), then rebooted the CP60. At that point I got stuck. Letterbug is correct and appears to be correctly inserted, no bent pins. The CP has successfully rebooted in this configuration about a week ago on a power outage. I have not made any sysdef or image changes. Any help would be appreciated. Thanks, David _______________________________________________________________________ This mailing list is neither sponsored nor endorsed by Invensys Process Systems (formerly The Foxboro Company). Use the info you obtain here at your own risks. Read http://www.thecassandraproject.org/disclaimer.html <http://www.thecassandraproject.org/disclaimer.html> foxboro mailing list: //www.freelists.org/list/foxboro <//www.freelists.org/list/foxboro> to subscribe: mailto:foxboro-request@xxxxxxxxxxxxx?subject=join <mailto:foxboro-request@xxxxxxxxxxxxx?subject=join> to unsubscribe: mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave <mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave> _______________________________________________________________________ This mailing list is neither sponsored nor endorsed by Invensys Process Systems (formerly The Foxboro Company). Use the info you obtain here at your own risks. Read http://www.thecassandraproject.org/disclaimer.html foxboro mailing list: //www.freelists.org/list/foxboro to subscribe: mailto:foxboro-request@xxxxxxxxxxxxx?subject=join to unsubscribe: mailto:foxboro-request@xxxxxxxxxxxxx?subject=leave