CRS stuff (part 1)

  • From: "Henry Poras" <henry@xxxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 23 Nov 2005 09:48:03 -0500

OK, I tried to send this a couple of times and it never made it. Maybe it
was just too long. I'll try breaking it into two parts and see what happens.

Continuing on from a thread of last week, I looked at some of the CRS 
boot scripts to see what they did. This is a summary of my first cut at 
the logic. (wish me luck with the formatting) At boot time, the scripts that
are run are:

init.evmd run (from inittab)
init.cssd fatal (from inittab)
init.crsd run (from inittab)
init.crs (from an rc directory)

The three scripts run from inittab are all run using 'respawn' (the 
process is restarted if it is terminated)

init.crs start (from rc)
  runs init.cssd autostart
    if the AUTOSTARTFILE (/etc/oracle/scls_scr/$HOST/root/crsstart)=disable
    then
         init.cssd norun (this just sets the cssrun file in the 
                          above dir to 'norun')
    if AUTOSTARTFILE = enable
    then
         run init.cssd manualstart
           get the boot time of the server (init.cssd booted) 
           and put this into cssrun

# so far cssrun is either norun, or the boottime of the server

init.evmd run (from inittab)
  run init.cssd startcheck (I will digress in a moment to detail startcheck)
    check every 30 sec. until the exit status of init.cssd startcheck = 0
    # what if it errors out? Does this loop ever exit?
    once init.cssd startcheck succeeds, run $CRS_HOME/bin evmd run (as
oracle)
    #lockfiles, flagfiles and pidfiles are also cleaned up and 
     apparently recreated

init.cssd startcheck
  # this is called by just about every other script. According to 
    internal comments it:
  # returns 0 if we should start
  # returns 1 on a non-cluster boot (i.e. ASM, no RAC)
  # returns 2 if disabled by admin
  # returns 3 on error
  #
  # I am skipping third party vendor clustering logic and non-cluster stuff
  if cssrun does not exist
  or
  if cssrun is not equal to the boottime (see init.crs start)
  then exit with status of 3
  wait for crsctl to be readable
  wait for Voting disk and OCR to come up
  run crsctl check boot (as oracle) #what does this do?
  loop until exit status of crsctl is 0
  exit init.cssd startcheck with status of 0 (OK)

# Back to inittab stuff

# init.cssd fatal is next, but the logic here is by far the longest, so 
# I will skip it and handle it last

init.crsd run
  run init.cssd startcheck
    check every 30 seconds until exit status = 0
  check if this is the first running of init.crsd run after server boot
     if crsdboot doesn't contain the boottime, this is the first running
     then
          FIRST=true
          echo boottime>crsdboot
          #do some PIDFILE, LOCKFILE, FLAGFILE stuff
          run $CRS_HOME/bin/crsd -1 &
          # not sure what this binary and flag does
     fi
     # for every run of init.crsd run (boot and respawn)
     run $CRS_HOME/bin/crsd run
     # start the crs daemon. We can guess what that does

# Back to init.cssd fatal
# init.cssd fatal calls init.cssd daemon as a background
# process, and then continues to loop to make sure the 
# daemon script is still there.
# init.cssd daemon calls ocssd. If ocssd fails, or a 
# duplicate one is started, the server reboots (Metalink Note265769.1).
#
init.cssd fatal
  run init.cssd startcheck
    check every 30 seconds until exit status = 0
  run init.cssd daemon &
    run $CRS_HOME/bin/ocssd (as oracle)
    # what happens when this css daemon (ocssd )fails?
    if cssrun is 'norun'
    or
    if /etc/oracle/scls_scr/$HOST/oracle/cssfatal = disable
    then
       do nothing (exit out to loop in init.cssd fatal)
    else # css daemon dies, cssrun = 'boottime', cssfatal='enable'
       reboot -n -f
       init.cssd norun 
       #disable respawn. init.cssd startcheck returns 3
  # Return to init.cssd fatal
  # check every second if the daemon script still exists
  loop (infinite)
    run init.cssd startcheck
      check exit status. If non-zero, exit (e.g. if cssrun = norun)
      # respawn is now off. Node shutdown is handled outside of CRS
      look for pid of background daemon process (kill -0)
      if it exists
      then
          continue looping
      else
          start another one (init.cssd daemon &)
          # this will lead to a reboot -n -f
  end loop

TBC .

Henry

Other related posts:

  • » CRS stuff (part 1)