Re: Oracle ASM disk corruption

  • From: Mladen Gogala <gogala.mladen@xxxxxxxxx>
  • To: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>, "Mark W. Farnham" <mwf@xxxxxxxx>, "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 27 Jul 2020 14:57:20 -0400

Well, if V$ASM_DISK says that disk is not used and ASM says that it is used then you have an inconsistent build. Hopefully, you have a good backup. Personally, I would sacrifice the problem disk to Dionysus and keep working with what I have..An alternative is to drop and rebuild the GRID group. Judging by the name, this group probably houses the OCR and the performance database. And that means a rebuild of the cluster, including the always hilarious restore from the full backup.

On 7/27/20 1:26 PM, Hameed, Amir wrote:


Thanks Mark.

Please see the information below. I will follow up with Oracle and let the list know with the action plan.

I would think that ALTER DISKGROUP GRID DROP DISK GRID_0002 **might** fix that.

SQL> ALTER DISKGROUP GRID DROP DISK GRID_0002 ;

ALTER DISKGROUP GRID DROP DISK GRID_0002

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15054: disk "GRID_0002" does not exist in diskgroup "GRID"

Likewise, if has that disk listed as a member of diskgroup GRID, what happens if you do an ALTER DISKGROUP GRID REBALANCE?

SQL> ALTER DISKGROUP GRID REBALANCE ;

Diskgroup altered.

From the ASM alert log file:

SQL> ALTER DISKGROUP GRID REBALANCE

Mon Jul 27 13:16:29 2020

NOTE: GroupBlock outside rolling migration privileged region

NOTE: requesting all-instance membership refresh for group=2

Mon Jul 27 13:16:29 2020

GMON updating for reconfiguration, group 2 at 30 for pid 31, osid 25903

NOTE: group GRID: updated PST location: disk 0000 (PST copy 0)

NOTE: group GRID: updated PST location: disk 0001 (PST copy 1)

Mon Jul 27 13:16:29 2020

NOTE: group 2 PST updated.

Mon Jul 27 13:16:29 2020

NOTE: membership refresh pending for group 2/0x88994cfc (GRID)

NOTE: Attempting voting file refresh on diskgroup GRID

NOTE: Refresh completed on diskgroup GRID

. Found 2 voting file(s).

NOTE: Voting file relocation is required in diskgroup GRID

Mon Jul 27 13:16:29 2020

GMON querying group 2 at 31 for pid 22, osid 25543

Mon Jul 27 13:16:29 2020

SUCCESS: refreshed membership for 2/0x88994cfc (GRID)

Mon Jul 27 13:16:29 2020

SUCCESS: ALTER DISKGROUP GRID REBALANCE

ALTER DISKGROUP GRID CHECK

SQL> ALTER DISKGROUP GRID CHECK

SQL> ALTER DISKGROUP GRID CHECK ;

Diskgroup altered.

From the ASM alert log file:

NOTE: starting check of diskgroup GRID

Mon Jul 27 13:19:46 2020

GMON querying group 2 at 37 for pid 31, osid 4062

GMON checking disk 0 for group 2 at 38 for pid 31, osid 4062

GMON querying group 2 at 39 for pid 31, osid 4062

GMON checking disk 1 for group 2 at 40 for pid 31, osid 4062

Mon Jul 27 13:19:46 2020

SUCCESS: check of diskgroup GRID found no errors

Mon Jul 27 13:19:46 2020

SUCCESS: ALTER DISKGROUP GRID CHECK

Thanks

*From:* Mark W. Farnham <mwf@xxxxxxxx>
*Sent:* Monday, July 27, 2020 9:39 AM
*To:* Hameed, Amir <Amir.Hameed@xxxxxxxxx>; gogala.mladen@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
*Subject:* RE: Oracle ASM disk corruption

Okay. So it is closed and a member, but ASM has it recorded as still belonging to diskgroup “GRID”.

Let’s see: If it is closed and throwing no errors, does that mean that a former drop disk had finished rebalancing to drop it but somehow was interrupted before some chicklet in ASM was checked?

I would think that ALTER DISKGROUP GRID DROP DISK GRID_0002 **might** fix that.

Have you sent the error message below along with the SR information? I would think this represents an inconsistency in the ASM dictionary and therefore is a bug unless you hand edited something at the OS level.

Likewise, if has that disk listed as a member of diskgroup GRID, what happens if you do an ALTER DISKGROUP GRID REBALANCE?

Does that either a) work or b) fail to open the disk and give you some additional information?

IF a), great, right?

IF b), let us (and the SR folks) know the new information

IF neither a) nor b), I probably fubared the syntax in my semi-retired rust.

You might also report the results of

ALTER DISKGROUP GRID CHECK

Good luck, zero of this should be difficult and it should be 100% self diagnostic.

PS: I seriously doubt MLADEN is WRONG about the meaning of the status information. Anything I’ve written could be wrong and based on how I asked them to do it rather than how they did it. Other than being a pain to Veritas, ASM was supposed to be easy to use and bulletproof. When one of my best friends from Oracle left ASM, I think it was.

mwf

*From:*oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx> [mailto:oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Hameed, Amir
*Sent:* Sunday, July 26, 2020 11:04 PM
*To:* gogala.mladen@xxxxxxxxx <mailto:gogala.mladen@xxxxxxxxx>; oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>
*Subject:* RE: Oracle ASM disk corruption

Hi Mladen!

Thank you for your input. I already tried that and got the following result.

-----

SQL> ALTER DISKGROUP GRID

ADD DISK '/dev/oracleasm/grid/asmgrid01' NAME GRID_0002

/

ALTER DISKGROUP GRID

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15033: disk '/dev/oracleasm/grid/asmgrid01' belongs to diskgroup "GRID"

-----

I also opened an SR and the analyst suggested the following action:

/Closed and member status of the disk means that the disk is already dropped from asm. The only thing you can do at this point is to format that disk and then add it back to asm./

Since it is a block device, I was thinking that overwriting the device header would reinitialize it? (I am using UDEV and not using ASMLIB. The disk is not partitioned).

Thank you,

Amir

*From:* oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx> <oracle-l-bounce@xxxxxxxxxxxxx <mailto:oracle-l-bounce@xxxxxxxxxxxxx>> *On Behalf Of *Mladen Gogala
*Sent:* Sunday, July 26, 2020 10:44 PM
*To:* oracle-l@xxxxxxxxxxxxx <mailto:oracle-l@xxxxxxxxxxxxx>
*Subject:* Re: Oracle ASM disk corruption

Hi Amir!

The status of CLOSED means that the disk is not being used by the ASM instance:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/refrn/V-ASM_DISK.html#GUID-8E2E5721-6D4E-48C2-8DF3-A0EEBD439606

|MOUNT_STATUS|

        

|VARCHAR2(7)|

        

Per-instance status of the disk relative to group mounts:

·|MISSING|- Oracle ASM metadata indicates that the disk is known to be part of the Oracle ASM disk group but no disk in the storage system was found with the indicated name

·|CLOSED|- Disk is present in the storage system but is not being accessed by Oracle ASM

·|OPENED|- Disk is present in the storage system and is being accessed by Oracle ASM. This is the normal state for disks in a database instance which are part of a disk group being actively used by the instance.

·|CACHED|- Disk is present in the storage system and is part of a disk group being accessed by the Oracle ASM instance. This is the normal state for disks in an Oracle ASM instance which are part of a mounted disk group.

·|IGNORED|- Disk is present in the system but is ignored by Oracle ASM because of one of the following:

·The disk is detected by the system library but is ignored because an Oracle ASM library discovered the same disk

·Oracle ASM has determined that the membership claimed by the disk header is no longer valid

·|CLOSING|- Oracle ASM is in the process of closing this disk

So, the disk is there but it's not used by ASM. You can add it to one of your disk groups or leave it as a reserve for the rainy days, whatever suits you better. No action is necessary, this is no error condition.

Regards

On 7/26/20 10:09 PM, Hameed, Amir wrote:

    Hi,

    I have an Oracle 12.1.0.2 Grid Infrastructure setup with
    three-nodes. There exist multiple ASM disk groups that are managed
    by this setup. One of the disk groups is called GRID and it hosts
    the OCR and voting disks. Recently I have noticed that one of the
    ASM disks in this group has MOUNT_STATUS='CLOSED" and
    HEADER_STATUS='MEMBER' as shown below:

    The following data was captured from V$ASM_DISK but it is
    consistent on all nodes if queried from GV$ASM_DISK:

    OS disk Space   Space              Disk

    Mount   Header       Mode    Disk     Size    Total Free    ASM
    Disk Failgroup                                 Vote

    Grp# Disk# Status  Status       Status  State    (MB)    (MB)
    (MB)    Name       Name       Disk path                      file

    ---- ----- ------- ------------ ------- -------- ------- -------
    ------- ---------- ---------- ------------------------------ ----

       0     0 CLOSED MEMBER       ONLINE  NORMAL    20,490 0       0
    /dev/oracleasm/grid/asmgrid01  Y

       2     0 CACHED  MEMBER       ONLINE  NORMAL    20,490  20,480
    9,987 GRID_0000  GRID_0000  /dev/oracleasm/grid/asmgrid03 Y

       2     1 CACHED  MEMBER       ONLINE  NORMAL    20,490  20,480
    9,987 GRID_0001  GRID_0001  /dev/oracleasm/grid/asmgrid02 Y

    The disk that is not showing up is GRID_0002 and the block device
    name is /dev/oracleasm/grid/asmgrid01. The only change that has
    been made recently was that the OS on all three nodes was upgraded
    from RHEL6 to RHEL7. I have tried to drop this disk from the DG
    but that didn't work and I got the message that this disk is not
    part of the GRID DG.

    What is the best way to resolve this issue? Should I overwrite the
    header of this device using dd so that it becomes a candidate
    disk? Any help will be appreciated.

    Thank you,

    Amir

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

Other related posts: