Re: Question about resource's start dependency in Clusterware

  • From: Jure Bratina <jure.bratina@xxxxxxxxx>
  • To: "D'Hooge Freek" <Freek.DHooge@xxxxxxxxx>
  • Date: Thu, 30 Jan 2014 11:13:10 +0100

Hi,

> *Regardless of the value of the **AUTO_START** resource attribute for a
resource, the resource can start if another resource has a hard or weak
start dependency on it or if the resource has a pullup start dependency on
another resource.*

Thanks, I think that answers my question. Though the note in the
documentation is actually under the section dealing with startup of
Clusterware, so maybe the dependencies in the initial startup sequence are
treated differently than when Clusterware is already operational.

> Which seems to indicate that indeed a hard dependency is enough to start
the other resource.
> But in the same document, Oracle states also:
*> Oracle recommends that resources with **hard** start dependencies also
have **pullup** start dependencies.*
> I'm not sure why that is.....

I think the reason for pullup dependencies can be found in the out-of-order
startup sequence described here:
http://docs.oracle.com/cd/E11882_01/rac.112/e16794/crschp.htm#CWADD92086 :
"When two or more resources depend on each other, a failure of one of them
may end up causing the other to fail, as well. In most cases, it is
difficult to control or even predict the order in which these failures are
detected. For example, even if resource A depends on resource B, Oracle
Clusterware may detect the failure of resource B after the failure of
resource A.

This lack of failure order predictability can cause Oracle Clusterware to
attempt to restart dependent resources in parallel, which, ultimately,
leads to the failure to restart some resources, because the resources upon
which they depend are being restarted out of order." And this sentence
explains (in my opinion) why pullup start dependencies are needed: "If the
attempt to restart resource A fails, then as soon as resource B
successfully restarts, Oracle Clusterware reattempts to restart resource
A."
So if we take the explanation above and the example from my first post:

[oracle@london1 ~]$ crsctl status resource ora.prod.db -p
NAME=ora.prod.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
[...]
SPFILE=+DATA/prod/spfileprod.ora
START_DEPENDENCIES=hard(ora.DATA.dg) [...] pullup(ora.DATA.dg)
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg)

My understanding is as follows:
The ora.prod.db's hard start dependency on ora.DATA.dg means that upon
starting the ora.prod.db resource, the resource ora.DATA.dg should be
already running and if it's not, it should be automatically started (even
without the pullpup dependency). On the other hand the pullup start
dependency means that when the ora.DATA.dg resource is started, it should
also start the ora.prod.db resource if its TARGET is not OFFLINE (since we
don't have the "always" modifier).

Now, if a failure occurs and Clusterware tries to start those two resources
out of order as is stated in the documentation above, the pullup dependency
is the mechanism to automatically handle this problem, e.g. suppose the
ora.DATA.dg resource fails because the ASM instance crashes. Because of the
hard stop dependency on ora.asm (which now isn't in either the online or
intermediate state) and ultimately because the database can't run without
ASM (the assumption is of course that database files are in ASM), the
ora.prod.db resource also fails. Now, if Clusterware tries for whatever
reason to start the ora.prod.db resource before ora.DATA.dg, the start of
ora.prod.db fails since the ora.DATA.dg can't be started yet. However, when
the ASM instance starts and the ora.DATA.dg is brought online (by the ASM
instance dependency mechanism), the pullup(ora.DATA.dg) dependency will
actually reattempt to start the ora.prod.db resource which will now start
successfully (although I'm not sure what happens without the "always"
modifier in this case). So in this case if the pullup dependency didn't
exist, the second attempt to start the ora.prod.db resource wouldn't happen
and it would remain offline.

Maybe the example I made wasn't the most appropriate, since stopping ASM in
11.2 has other implications if OCR is stored in it (
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/srvctladmin.htm#RACAD5043:
"You cannot use this command when OCR is stored in Oracle ASM because it
will not stop Oracle ASM. To stop Oracle ASM you must shut down Oracle
Clusterware."), but anyway a similar scenario would probably apply if we
have two other dependent resources where neither of them depends on ASM.

Regards,
Jure

On Wed, Jan 29, 2014 at 9:33 PM, D'Hooge Freek <Freek.DHooge@xxxxxxxxx>wrote:

>
> Hi,
>
> In the documentation I found following note
>
> *Regardless of the value of the **AUTO_START** resource attribute for a
> resource, the resource can start if another resource has a hard or weak
> start dependency on it or if the resource has a pullup start dependency on
> another resource.*
>
> Which seems to indicate that indeed a hard dependency is enough to start
> the other resource.
> But in the same document, Oracle states also:
>
> *Oracle recommends that resources with **hard** start dependencies also
> have **pullup** start dependencies.*
>
> I'm not sure why that is.....
>
>

Other related posts: