RE: Linux Native Multipath, ASM and Instance Failures

  • From: "CRISLER, JON A" <JC1706@xxxxxxx>
  • To: "cichomitiko@xxxxxxxxx" <cichomitiko@xxxxxxxxx>, "cicciuxdba@xxxxxxxxx" <cicciuxdba@xxxxxxxxx>
  • Date: Thu, 26 Jul 2012 22:08:19 +0000

We used 3par extensively and had few problems in that area- you want to check 
your multipath.conf to make sure it has all the right options.  The time it 
takes to perform the failover might be outside of the window that ASM and 
multipath allows a disk i/o to be suspended (which I think is 70 seconds for 
clusterware, and might be tunable, AND I might be munging up that whole concept 
:) ).  In Netapp FC we have had similar issues and it usually was a serious 
tweak to the multipath.conf to resolve it. There are also known bugs in the 
device mapper and multipath rpms so make sure you are up to date in that area.  
 Its been a while since I looked at this stuff, and I am not a storage guru.  
We did use ASMlib because it just makes life easier overall, and the scanorder 
was /dev/dm*.  I would suggest opening a ticket with RedHat Support specifying 
a multipath issue and see what they suggest- they were very helpful in our 
case.  Your lucky you are on 5.8 because RH 4 and previous versions had more 
obscure issues of that type.  You don't specify which FC driver and adapter 
vendor you are using: sometimes firmware updates are helpful, and if qlogic, 
there is the RH supplied and the qlogic supplied driver choice as well.  We 
used the RH supplied driver but I always had suspicions that the qlogic driver 
might be better.

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Radoulov, Dimitre
Sent: Thursday, July 26, 2012 2:29 PM
To: cicciuxdba@xxxxxxxxx
Cc: oracle-l-freelists
Subject: Re: Linux Native Multipath, ASM and Instance Failures

Hi Guillermo,

On 26/07/2012 18:44, Guillermo Alan Bort wrote:
> We have the following set up:
>
>    1. RHEL 5.8 (standard RH kernel)
>    2. Oracle RAC 11.2.0.3 (Jan PSU)
>    3. Linux Native Multipath (/dev/mapper)
>    4. 3PAR storage (don't know much about the storage layer, though).
>    5. NO ASMLIB is used, the asm diskstring is /dev/mapper/*p1
>
>    We were running some redundancy tests (pulling cables and seeing 
> what
> happens) and when the servers lost a path, the instances crashed. I'm 
> still gathering logs, but OS errors looks like this:
> Jul 26 09:40:41 tvl-p-orep001 kernel: end_request: I/O error, dev 
> sdbg, sector 4151
[...]
> and then
>
> Jul 26 09:40:43 tvl-p-orep001 kernel: device-mapper: multipath: 
> Failing path 65:192.
[...]
> In the meantime ASM logs show this:
>
> WARNING: Read Failed. group:0 disk:22 AU:0 offset:0 size:4096 Errors 
> in file
> /u01/ORAUTL/grid/base/diag/asm/+asm/+ASM/trace/+ASM_ora_18784.trc:
> ORA-27061: waiting for async I/Os failed
> Linux-x86_64 Error: 5: Input/output error Additional information: -1 
> Additional information: 4096
Just for your information:
we have no problems with RHEL 5.7 (RH kernel), RAC 11.2.0.3.2, 3PAR _and_ 
ASMLib.
We did the same tests and we had no problems (there were messages for the 
failing paths in the OS logs [as expected], but the Oracle stack remained up 
and running (no error messages at all in the various alert logs).

If I recall correctly some MOS notes suggest to set ORACLEASM_SCANORDER to dm 
(/dev/dm-* as opposed to /dev/mapper/* ).
As far as I know the fact that the names dm-* are not persistent shouldn't be a 
problem when clusterware files (voting/ocr) are in ASM disk groups (11.2).
I would try to set asm_diskstring to /dev/dm-* and then I would repeat the 
tests.


Regards
Dimitre
--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: