RE: Trying to Simulate a disk failure for one of the disks used by ASM disk group

  • From: "Mark W. Farnham" <mwf@xxxxxxxx>
  • To: <hithanan@xxxxxxxxx>, "'Matthew Zito'" <matt@xxxxxxxxxxxxxxxxx>
  • Date: Thu, 28 Aug 2014 13:03:44 -0400

If this is Linux or Unix, then probably umount followed by a mount readonly
would do the trick if you're writing to that disk at all.

 

Possibly changing the permissions would intervene, but I think that varies
about whether that will stop a running application that already has a file
open.

 

Heh. It was easier when there was a button on each drive you could toggle to
make it read only.

 

mwf

 

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Hanan Hit
Sent: Thursday, August 28, 2014 12:41 PM
To: Matthew Zito
Cc: Chitale, Hemant K; ORACLE-L
Subject: Re: Trying to Simulate a disk failure for one of the disks used by
ASM disk group 

 

Okay thanks for the info. 

 

Will try that.

 

 

On Aug 28, 2014, at 9:38 AM, Matthew Zito <matt@xxxxxxxxxxxxxxxxx> wrote:





fdisk'ing the drive will *probably* not do what you want, since the OS will
keep the existing partition table in memory, and continue on cheerfully
writing to the correct partitions.

 

You'd have to reboot the box for it to "see" the updated partition table, I
suspect.

 

Similarly, if you overwrote the header with dd, Oracle wouldn't notice that
and would continue onward.

 

Best bet is to offline a LUN in the array itself, either by disabling that
LUN, or using the host restriction to remove access to it from that node.

 

On Thu, Aug 28, 2014 at 12:30 PM, Hanan Hit <hithanan@xxxxxxxxx> wrote:

Thanks Hemant K Chitale,

 

Yes I think my next step would be to just fdisk one of the drives -
basically the header.

 

I am not getting any real answer from Oracle though.

 

 

 

On Aug 27, 2014, at 10:52 PM, Chitale, Hemant K <Hemant-K.Chitale@xxxxxx>
wrote:





I would use "dd" to overwrite a disk (or just the header of it) to simulate
a failure.

 

Was your second test also about removing a disk physically ?  And the DG
didn't dismount merely because permissions on the dev had been changed ?

OR was the second test different in some other way ?

 

Hemant K Chitale

 

 

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Hanan Hit
Sent: Thursday, August 28, 2014 7:16 AM
To: Oracle L
Cc: Hit Hanan Gmail
Subject: Trying to Simulate a disk failure for one of the disks used by ASM
disk group

 

Hi All,

 

I am sorry for the large distribution but I am somehow hitting a wall. 

 

I have a new 12c single instance install (Oracle Database 12c Enterprise
Edition Release 12.1.0.2.0 - 64bit Production) using ASM on RHEL 6.5. 

 

 

The underline storage array that I am using is MD-1220 from HP. 

 

I have total of 24 drives that each presented as a single drives. 

 

Using ASMlib. 

 

 

I was able to create the ASM instance using two disk groups (DATADG with 16
drives and FRADG with 8 disks). I am using Normal Redundancy.

 

All drives were labeled (first 1M).  

 

 

Here are the compatibility details of both disk groups:

 

GROUP_NUMBER NAME

------------ ------------------------------

COMPATIBILITY

------------------------------------------------------------

DATABASE_COMPATIBILITY

------------------------------------------------------------

                  2 FRADG

12.1.0.0.0

12.1.0.0.0

 

                  1 DATADG

12.1.0.0.0

12.1.0.0.0

 

 

I also modified the disk repair time for the given disk to 6 hours from the
default of 3.6 hours .

 

 

 

SQL> show parameter disk_

 

NAME                                                         TYPE  VALUE

------------------------------------ -----------
------------------------------

asm_diskgroups                                          string  FRADG,
DATADG

asm_diskstring                                            string  ORCL:*

 

 

Now I am trying  to simulate  a failure of one disk (that of course
shouldn't fail  the DATADG). 

 

In the first test we physically plug out one drive (found the right device)
and the DATADG was dismounted (to my surprise).  This obviously didn't work
as I expected. I don't think it's a fat finger issue. 

 

In the second test after opening a SR with Oracle, I modified the permission
for the device during the run, rescan the drives but all is functioning well
and not failed disk were encountered. 

 

# chmod 000  /dev/sdd1

 

# /etc/init.d/oracleasm scandisks

Scanning the system for Oracle ASMLib disks:               [  OK  ]

 

 

So finally, is there a better way to logically simulate  a disk failure with
Normal Redundancy and while using my infrastructure?

 

Any help would be highly appreciated. 

 

Best,

            Hanan

 

 

 

 


This email and any attachments are confidential and may also be privileged.
If you are not the intended recipient, please delete all copies and notify
the sender immediately. You may wish to refer to the incorporation details
of Standard Chartered PLC, Standard Chartered Bank and their subsidiaries at
<https://www.sc.com/en/incorporation-details.html>
https://www.sc.com/en/incorporation-details.html.

 

 

 

Other related posts: