Re: Anyone tried kill ASM in 11gR2 RAC?

  • From: LS Cheng <exriscer@xxxxxxxxx>
  • To: "Bobak, Mark" <Mark.Bobak@xxxxxxxxxxxx>
  • Date: Sat, 23 Jan 2010 09:43:02 +0100

Hi

I did further tests, previously the results I posted was a test on AIX 6.1
and 11gR2, I have tested now with Linux x86-64


   1. kill asm pmon
   2. chown root.root asmdisk


crsd process dies because it cannot access OCR (Disk Group not mounted):

2010-01-23 09:38:59.944: [  OCRASM][801350224]proprasmo: The ASM disk group
OCR is not found or not mounted
2010-01-23 09:38:59.944: [  OCRRAW][801350224]proprioo: Failed to open
[+OCR]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2010-01-23 09:38:59.944: [  OCRRAW][801350224]proprioo: No OCR/OLR devices
are usable
2010-01-23 09:38:59.944: [  OCRASM][801350224]proprasmcl: asmhandle is NULL
2010-01-23 09:38:59.944: [  OCRRAW][801350224]proprinit: Could not open raw
device
2010-01-23 09:38:59.944: [  OCRASM][801350224]proprasmcl: asmhandle is NULL
2010-01-23 09:38:59.945: [  OCRAPI][801350224]a_init:16!: Backend init
unsuccessful : [26]
2010-01-23 09:38:59.945: [  CRSOCR][801350224] OCR context init failure.
Error: PROC-26: Error while accessing the physical storage ASM error [SLOS:
cat=8, opn=kgfoOpenFile01, dep=15056, loc=kgfokge
ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCR.255.4294967295
ORA-17503: ksfdopn:2 Failed to open file +OCR.255.4294967295
ORA-15001: diskgroup "OCR"
] [8]
2010-01-23 09:38:59.945: [    CRSD][801350224][PANIC] CRSD exiting: Could
not init OCR, code: 26
2010-01-23 09:38:59.945: [    CRSD][801350224] Done.


crsctl gives error:

[root@grid1 ~]# /u01/grid/11.2.0/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

however ASM and cssd is up and running

So we have a complete different scenario, same test two different results in
two operating system.


Thanks!



On Thu, Jan 21, 2010 at 2:36 PM, Bobak, Mark <Mark.Bobak@xxxxxxxxxxxx>wrote:

>  Yep, makes sense, I think.
>
>
>
> Clusterware starts, ASM serves up OCR and voting disk geometry, as it
> relates to raw devices that make up your OCRDATA diskgroup.  Clusterware
> caches that info, no longer needs to talk to ASM for it.
>
>
>
> You do the damage, including changing ownership of devices that make up
> OCRDATA diskgroup to root:root.  But, clusterware processes run as root, so,
> they can still read/write those raw devices.
>
>
>
> What happens if you chown the devices to root:root, then also chmod 000 all
> those devices?
>
>
>
> -Mark
>
>
>
> *From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
> oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *LS Cheng
> *Sent:* Thursday, January 21, 2010 7:44 AM
> *To:* K Gopalakrishnan
> *Cc:* Oracle Mailinglist
> *Subject:* Re: Anyone tried kill ASM in 11gR2 RAC?
>
>
>
> Hi
>
> So even OCRDATA Disk Group is not  mounted and the physical disks has
> root.root instead of grid.oinstall ownership Clusterware will be up and
> running? So basically you mean Clusterware does not need ASM to be up to
> access the OCRDATA disks?
>
> My test was
>
>    - kill ASM
>    - change asm disks (OCRDATA) from grid.oinstall to root.root
>    - check clusterware status which was up and running
>
>
>
>
>
> Thanks
>
> On Thu, Jan 21, 2010 at 1:38 PM, K Gopalakrishnan <kaygopal@xxxxxxxxx>
> wrote:
>
> Clusterware failure will happen _only_ when it can not acess the
> physical devices (disk timeout in css) and shutting down ASM does not
> revoke the access to disks. In your case clusterware _knows_ the
> location of ocr/voting information in ASM disks and it can continue
> reading/writing even ASM instance is down.
>
> -Gopal
>
>
>
>
>
> On Thu, Jan 21, 2010 at 2:51 AM, LS Cheng <exriscer@xxxxxxxxx> wrote:
> > Hi
> >
> > I was doing some cluster destructive tests on RAC 11gR2 a few days ago.
> >
> > One of tests was kill ASM and see how does that affects Clusterware
> > operation since OCR and Voting Disks are located in ASM (OCRDATA Disk
> > Group). After killing ASM nothing happened as it was quicky started up
> > again. So far so good. The next test was same test but changing the ASM
> > Disks ownership so when ASM is restarted OCR Disk Group cannot be
> accessed.
> > Surprisingly ASM Was started up, Database Disk Group was mounted OCR disk
> > Group obviously did not get mounted but then the Cluster was working
> without
> > any problems.
> >
> > So how is this happening? Doesnt Clusterware need to write and read to
> > Voting Disk every second? I was expecting a Clusterware failure in the
> node
> > but everything worked just as everything were ok.
> >
> > Thanks!
> >
> > --
> > LSC
> >
> >
>
>
>

Other related posts: