RE: Large ASM installation

One thing you may find useful is the "ALTER DISKGROUP CHECK ALL;" command. I
didn't know about this until I had used ASM for a couple of years. There is
a REPAIR clause that may be able to correct corruption it finds.


That is pretty close.  We just got a corrupted header, not a zeroed out
header.  But ours was 11gR2 ASM, and it was the OCR ASM group.

Setup: At the time it was a 36 tb database. Storage was xp10000. Daily loads
were averaged some 500 ? 700 gigs so quite a bit of i/o. We were running out
of space and I had to add some 30 tb of disk. I added the disks ? no issue.
And the rebalancing took place ? power level 2 (I had to keep it low because
of month reporting happening at the same time).  The estimated time for
rebalancing to complete was some 36 hours. All across the board ?
RHEL 4 update 3 I believe. Also using asmlib.


The first thing we noticed about 12 hours after adding the disk was that one
of the 4 nodes was no longer responding. I took a look in db alert log and
nothing to indicate a problem (os was responding fine by the way no no cpu
bound issues). Took a look at the asm alert log and saw the error about
missing lun. I queried asmlib and sure enough one of the disks was missing
on that node. I queried asm and it had the disk listed as missing. Went to
one of the other nodes and did the same procedure and they reported that the
disk was there. Asmlib reported fine and so did asm. But I did a scan on the
asm disks as a sanity check (oracleasm scandisks) and they returned one disk
missing.  From that point (beyond emailing the others in the group saying
that we had problems) I though I had some time still so I tried dropping the
disk from asm and let asm relocate the extents it found in memory. About 10
minutes after doing that the db and asm crashed.


The thing is only node reported a problem at the time. No OS errors were
logged and no other errors on the db and nothing reported on the san either.


I do not know all of the details of the post mortem on the disk headers
precisely. Oracle support had us dump the disk headers via dd (200meg dump)
and send it over to the them. From their analysis it appeared that the
headers were manually zeroed during some sort of disk operation (so they
surmised since there was no real way to tell from the headers directly). One
of the questions they did ask was if the san was shared. When we said yes
their response was that they had something similar in other places.


I took a quick look at the disk header dump myself using bvi and it was
blank. Nothing to indicate that there was anything on the disk ? using bvi
on the first 8k of a disk header will give you the fs type and block size?
nothing? zilch.


Was this similar to you Andrew?


Can you expound a little on what happened to your disk headers?  That sounds
strikingly similar to the problem we have had.

Hi Daniel,

I have had several rac datawarehouses on asm (single asm instance) in the
multi terabyte range (20 tb up to 90tb).

Pros - easy to manage from a dba perspective (I did the installation myself
- OS, cluster, ASM, db so for me it was a snap)
    - good performance
    - easy to use multiple arrays on the same asm instance (my 90tb one was
spread over 2 arrays - an xp10k and XIV for instance - different speeds of
the arrays would need to be taken into account for the physical db design of

Cons - need to be aware of the 2 tb limit on individual luns
    - extra steps needed to taken on maintaining the luns (ie, take backups
of the asm metadata regularly)
    - ideally on larger instances having a dedicated array to the asm is
best (we had the large on go belly up because the san array was on a shared
infrastructure and work being done for other systems had an impact on those
luns - ie, some disk maintenance zeroed out the asm header on our luns even
though that was not the system being worked on - or so oracle support told

Would I use it again? Yes, since the performance and ease of use outweighed
the cons for our situation. Besides, knowing the cons it would be easier to
add processes to make sure we would be covered.


We have a customer that is looking at ASM to handle their databases, the
total planned is about 8TB for a single ASM instance. Has anyone on the
list worked on a large (5+TB) ASM system? What have been the pros and
cons versus a regular LVM and storage? If you had the chance to go back
to the decision time, would you make the same decision and why?

I'm not needing nitty gritty details right now, more of a high level
decision making view.

Daniel Fink

