OK, I tried to send this a couple of times and it never made it. Maybe it was just too long. I'll try breaking it into two parts and see what happens. Continuing on from a thread of last week, I looked at some of the CRS boot scripts to see what they did. This is a summary of my first cut at the logic. (wish me luck with the formatting) At boot time, the scripts that are run are: init.evmd run (from inittab) init.cssd fatal (from inittab) init.crsd run (from inittab) init.crs (from an rc directory) The three scripts run from inittab are all run using 'respawn' (the process is restarted if it is terminated) init.crs start (from rc) runs init.cssd autostart if the AUTOSTARTFILE (/etc/oracle/scls_scr/$HOST/root/crsstart)=disable then init.cssd norun (this just sets the cssrun file in the above dir to 'norun') if AUTOSTARTFILE = enable then run init.cssd manualstart get the boot time of the server (init.cssd booted) and put this into cssrun # so far cssrun is either norun, or the boottime of the server init.evmd run (from inittab) run init.cssd startcheck (I will digress in a moment to detail startcheck) check every 30 sec. until the exit status of init.cssd startcheck = 0 # what if it errors out? Does this loop ever exit? once init.cssd startcheck succeeds, run $CRS_HOME/bin evmd run (as oracle) #lockfiles, flagfiles and pidfiles are also cleaned up and apparently recreated init.cssd startcheck # this is called by just about every other script. According to internal comments it: # returns 0 if we should start # returns 1 on a non-cluster boot (i.e. ASM, no RAC) # returns 2 if disabled by admin # returns 3 on error # # I am skipping third party vendor clustering logic and non-cluster stuff if cssrun does not exist or if cssrun is not equal to the boottime (see init.crs start) then exit with status of 3 wait for crsctl to be readable wait for Voting disk and OCR to come up run crsctl check boot (as oracle) #what does this do? loop until exit status of crsctl is 0 exit init.cssd startcheck with status of 0 (OK) # Back to inittab stuff # init.cssd fatal is next, but the logic here is by far the longest, so # I will skip it and handle it last init.crsd run run init.cssd startcheck check every 30 seconds until exit status = 0 check if this is the first running of init.crsd run after server boot if crsdboot doesn't contain the boottime, this is the first running then FIRST=true echo boottime>crsdboot #do some PIDFILE, LOCKFILE, FLAGFILE stuff run $CRS_HOME/bin/crsd -1 & # not sure what this binary and flag does fi # for every run of init.crsd run (boot and respawn) run $CRS_HOME/bin/crsd run # start the crs daemon. We can guess what that does # Back to init.cssd fatal # init.cssd fatal calls init.cssd daemon as a background # process, and then continues to loop to make sure the # daemon script is still there. # init.cssd daemon calls ocssd. If ocssd fails, or a # duplicate one is started, the server reboots (Metalink Note265769.1). # init.cssd fatal run init.cssd startcheck check every 30 seconds until exit status = 0 run init.cssd daemon & run $CRS_HOME/bin/ocssd (as oracle) # what happens when this css daemon (ocssd )fails? if cssrun is 'norun' or if /etc/oracle/scls_scr/$HOST/oracle/cssfatal = disable then do nothing (exit out to loop in init.cssd fatal) else # css daemon dies, cssrun = 'boottime', cssfatal='enable' reboot -n -f init.cssd norun #disable respawn. init.cssd startcheck returns 3 # Return to init.cssd fatal # check every second if the daemon script still exists loop (infinite) run init.cssd startcheck check exit status. If non-zero, exit (e.g. if cssrun = norun) # respawn is now off. Node shutdown is handled outside of CRS look for pid of background daemon process (kill -0) if it exists then continue looping else start another one (init.cssd daemon &) # this will lead to a reboot -n -f end loop TBC . Henry