RE: Large ASM installation

  • From: "Amaral, Rui" <Rui.Amaral@xxxxxxxxxxxxxxxx>
  • To: 'Andrew Kerber' <andrew.kerber@xxxxxxxxx>
  • Date: Fri, 22 Oct 2010 13:08:41 -0400

Setup: At the time it was a 36 tb database. Storage was xp10000. Daily loads 
were averaged some 500 - 700 gigs so quite a bit of i/o. We were running out of 
space and I had to add some 30 tb of disk. I added the disks - no issue. And 
the rebalancing took place - power level 2 (I had to keep it low because of 
month reporting happening at the same time).  The estimated time for 
rebalancing to complete was some 36 hours. All 11.1.0.7 across the board - RHEL 
4 update 3 I believe. Also using asmlib.

The first thing we noticed about 12 hours after adding the disk was that one of 
the 4 nodes was no longer responding. I took a look in db alert log and nothing 
to indicate a problem (os was responding fine by the way no no cpu bound 
issues). Took a look at the asm alert log and saw the error about missing lun. 
I queried asmlib and sure enough one of the disks was missing on that node. I 
queried asm and it had the disk listed as missing. Went to one of the other 
nodes and did the same procedure and they reported that the disk was there. 
Asmlib reported fine and so did asm. But I did a scan on the asm disks as a 
sanity check (oracleasm scandisks) and they returned one disk missing.  From 
that point (beyond emailing the others in the group saying that we had 
problems) I though I had some time still so I tried dropping the disk from asm 
and let asm relocate the extents it found in memory. About 10 minutes after 
doing that the db and asm crashed.

The thing is only node reported a problem at the time. No OS errors were logged 
and no other errors on the db and nothing reported on the san either.

I do not know all of the details of the post mortem on the disk headers 
precisely. Oracle support had us dump the disk headers via dd (200meg dump) and 
send it over to the them. From their analysis it appeared that the headers were 
manually zeroed during some sort of disk operation (so they surmised since 
there was no real way to tell from the headers directly). One of the questions 
they did ask was if the san was shared. When we said yes their response was 
that they had something similar in other places.

I took a quick look at the disk header dump myself using bvi and it was blank. 
Nothing to indicate that there was anything on the disk ... using bvi on the 
first 8k of a disk header will give you the fs type and block size... 
nothing... zilch.

Was this similar to you Andrew?
________________________________
From: Andrew Kerber [mailto:andrew.kerber@xxxxxxxxx]
Sent: Friday, October 22, 2010 12:41 PM
To: Amaral, Rui
Cc: daniel.fink@xxxxxxxxxxxxxx; oracle-l
Subject: Re: Large ASM installation

Rui-

Can you expound a little on what happened to your disk headers?  That sounds 
strikingly similar to the problem we have had.
On Fri, Oct 22, 2010 at 10:40 AM, Amaral, Rui 
<Rui.Amaral@xxxxxxxxxxxxxxxx<mailto:Rui.Amaral@xxxxxxxxxxxxxxxx>> wrote:
Hi Daniel,

I have had several rac datawarehouses on asm (single asm instance) in the multi 
terabyte range (20 tb up to 90tb).

Pros - easy to manage from a dba perspective (I did the installation myself - 
OS, cluster, ASM, db so for me it was a snap)
    - good performance
    - easy to use multiple arrays on the same asm instance (my 90tb one was 
spread over 2 arrays - an xp10k and XIV for instance - different speeds of the 
arrays would need to be taken into account for the physical db design of course)

Cons - need to be aware of the 2 tb limit on individual luns
    - extra steps needed to taken on maintaining the luns (ie, take backups of 
the asm metadata regularly)
    - ideally on larger instances having a dedicated array to the asm is best 
(we had the large on go belly up because the san array was on a shared 
infrastructure and work being done for other systems had an impact on those 
luns - ie, some disk maintenance zeroed out the asm header on our luns even 
though that was not the system being worked on - or so oracle support told us).

Would I use it again? Yes, since the performance and ease of use outweighed the 
cons for our situation. Besides, knowing the cons it would be easier to add 
processes to make sure we would be covered.

HTH

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx> 
[mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>] On 
Behalf Of Daniel W. Fink
Sent: Friday, October 22, 2010 10:10 AM
To: oracle-l
Subject: Large ASM installation

We have a customer that is looking at ASM to handle their databases, the
total planned is about 8TB for a single ASM instance. Has anyone on the
list worked on a large (5+TB) ASM system? What have been the pros and
cons versus a regular LVM and storage? If you had the chance to go back
to the decision time, would you make the same decision and why?

I'm not needing nitty gritty details right now, more of a high level
decision making view.

Regards,
Daniel Fink
--
//www.freelists.org/webpage/oracle-l


NOTICE: Confidential message which may be privileged. Unauthorized 
use/disclosure prohibited. If received in error, go to 
www.td.com/legal<http://www.td.com/legal> for instructions.
AVIS : Message confidentiel dont le contenu peut être privilégié. 
Utilisation/divulgation interdites sans permission. Si reçu par erreur, allez 
au 
www.td.com/francais/avis_juridique<http://www.td.com/francais/avis_juridique> 
pour des instructions.
--
//www.freelists.org/webpage/oracle-l




--
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'

Other related posts: