Linux Native Multipath, ASM and Instance Failures

  • From: Guillermo Alan Bort <cicciuxdba@xxxxxxxxx>
  • To: oracle-l-freelists <oracle-l@xxxxxxxxxxxxx>
  • Date: Thu, 26 Jul 2012 13:44:12 -0300

Hi,
  I have an SR with Oracle for this, but perhaps some of you have
encountered this issue before.

  We have the following set up:

  1. RHEL 5.8 (standard RH kernel)
  2. Oracle RAC 11.2.0.3 (Jan PSU)
  3. Linux Native Multipath (/dev/mapper)
  4. 3PAR storage (don't know much about the storage layer, though).
  5. NO ASMLIB is used, the asm diskstring is /dev/mapper/*p1

  We were running some redundancy tests (pulling cables and seeing what
happens) and when the servers lost a path, the instances crashed. I'm still
gathering logs, but OS errors looks like this:
Jul 26 09:40:41 tvl-p-orep001 kernel: end_request: I/O error, dev sdbg,
sector 4151
Jul 26 09:40:41 tvl-p-orep001 kernel: sd 2:0:0:49: SCSI error: return code
= 0x00010000
Jul 26 09:40:41 tvl-p-orep001 kernel: end_request: I/O error, dev sdcu,
sector 2868761111
Jul 26 09:40:41 tvl-p-orep001 kernel: sd 2:0:0:49: SCSI error: return code
= 0x00010000
Jul 26 09:40:41 tvl-p-orep001 kernel: end_request: I/O error, dev sdcu,
sector 2868762711
Jul 26 09:40:41 tvl-p-orep001 kernel: sd 2:0:0:31: SCSI error: return code
= 0x00010000

and then

Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 65:192.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 65:208.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 65:224.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 66:16.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 66:32.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 66:80.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 66:96.
Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: Failing
path 66:144.

....

Jul 26 09:40:51 tvl-p-orep001 multipathd: sdac: tur checker reports path is
down
Jul 26 09:40:51 tvl-p-orep001 multipathd: checker failed path 65:192 in map
<DB>_fg1_data_14
Jul 26 09:40:51 tvl-p-orep001 multipathd: ghtgmp_fg1_data_14: remaining
active paths: 1
Jul 26 09:40:51 tvl-p-orep001 multipathd: sdad: tur checker reports path is
down
Jul 26 09:40:51 tvl-p-orep001 multipathd: checker failed path 65:208 in map
<DB> _fg1_data_15
Jul 26 09:40:51 tvl-p-orep001 multipathd: ghtgmp_fg1_data_15: remaining
active paths: 1
Jul 26 09:40:51 tvl-p-orep001 multipathd: sdae: tur checker reports path is
down
Jul 26 09:40:51 tvl-p-orep001 multipathd: checker failed path 65:224 in
map  <DB> _fg1_data_16
Jul 26 09:40:51 tvl-p-orep001 multipathd: ghtgmp_fg1_data_16: remaining
active paths: 1


In the meantime ASM logs show this:

WARNING: Read Failed. group:0 disk:22 AU:0 offset:0 size:4096
Errors in file
/u01/ORAUTL/grid/base/diag/asm/+asm/+ASM/trace/+ASM_ora_18784.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
WARNING: Read Failed. group:1 disk:2 AU:0 offset:0 size:4096
Errors in file
/u01/ORAUTL/grid/base/diag/asm/+asm/+ASM/trace/+ASM_ora_18784.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
WARNING: Read Failed. group:1 disk:1 AU:0 offset:0 size:4096
NOTE: Assigning number (2,0) to disk
(/dev/oracleasm/disks/<DB>_FG1_REDOA_01)
NOTE: ASM client <DB> disconnected unexpectedly.
NOTE: ASM client <DB> disconnected unexpectedly.

I've taken a look in MOS and found a few notes of worth:
Oracle ASM and Multi-Pathing Technologies [ID 294869.1] <--- would seem to
indicate device mapper is supported by ASM
Database Instance Crashes In Case Of Path Offlined In Multipath Storage [ID
555371.1] <--- Deals with ASMLib, so not really our particular test case.
Configuration and Use of Device Mapper Multipathing on Oracle Enterprise
Linux (OEL) [ID 555603.1] <--- Interesting note, not dealing with RHEL but
OEL is fairly similar. We have the path_grouping_policy different. The note
recommends setting it to failover and we have multibus, not sure this is
the issue though.

The other notes I found were of no relevance to this issue.

Thanks in advance for any input

Cheers
Alan.-


--
//www.freelists.org/webpage/oracle-l


Other related posts: