One thing you may find useful is the "ALTER DISKGROUP CHECK ALL;" command. I didn't know about this until I had used ASM for a couple of years. There is a REPAIR clause that may be able to correct corruption it finds. -Randy _____ From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Andrew Kerber Sent: Friday, October 22, 2010 12:18 PM To: Amaral, Rui Cc: daniel.fink@xxxxxxxxxxxxxx; oracle-l Subject: Re: Large ASM installation That is pretty close. We just got a corrupted header, not a zeroed out header. But ours was 11gR2 ASM, and it was the OCR ASM group. On Fri, Oct 22, 2010 at 12:08 PM, Amaral, Rui <Rui.Amaral@xxxxxxxxxxxxxxxx> wrote: Setup: At the time it was a 36 tb database. Storage was xp10000. Daily loads were averaged some 500 ? 700 gigs so quite a bit of i/o. We were running out of space and I had to add some 30 tb of disk. I added the disks ? no issue. And the rebalancing took place ? power level 2 (I had to keep it low because of month reporting happening at the same time). The estimated time for rebalancing to complete was some 36 hours. All 22.214.171.124 across the board ? RHEL 4 update 3 I believe. Also using asmlib. The first thing we noticed about 12 hours after adding the disk was that one of the 4 nodes was no longer responding. I took a look in db alert log and nothing to indicate a problem (os was responding fine by the way no no cpu bound issues). Took a look at the asm alert log and saw the error about missing lun. I queried asmlib and sure enough one of the disks was missing on that node. I queried asm and it had the disk listed as missing. Went to one of the other nodes and did the same procedure and they reported that the disk was there. Asmlib reported fine and so did asm. But I did a scan on the asm disks as a sanity check (oracleasm scandisks) and they returned one disk missing. From that point (beyond emailing the others in the group saying that we had problems) I though I had some time still so I tried dropping the disk from asm and let asm relocate the extents it found in memory. About 10 minutes after doing that the db and asm crashed. The thing is only node reported a problem at the time. No OS errors were logged and no other errors on the db and nothing reported on the san either. I do not know all of the details of the post mortem on the disk headers precisely. Oracle support had us dump the disk headers via dd (200meg dump) and send it over to the them. From their analysis it appeared that the headers were manually zeroed during some sort of disk operation (so they surmised since there was no real way to tell from the headers directly). One of the questions they did ask was if the san was shared. When we said yes their response was that they had something similar in other places. I took a quick look at the disk header dump myself using bvi and it was blank. Nothing to indicate that there was anything on the disk ? using bvi on the first 8k of a disk header will give you the fs type and block size? nothing? zilch. Was this similar to you Andrew? _____ From: Andrew Kerber [mailto:andrew.kerber@xxxxxxxxx] Sent: Friday, October 22, 2010 12:41 PM To: Amaral, Rui Cc: daniel.fink@xxxxxxxxxxxxxx; oracle-l Subject: Re: Large ASM installation Rui- Can you expound a little on what happened to your disk headers? That sounds strikingly similar to the problem we have had. On Fri, Oct 22, 2010 at 10:40 AM, Amaral, Rui <Rui.Amaral@xxxxxxxxxxxxxxxx> wrote: Hi Daniel, I have had several rac datawarehouses on asm (single asm instance) in the multi terabyte range (20 tb up to 90tb). Pros - easy to manage from a dba perspective (I did the installation myself - OS, cluster, ASM, db so for me it was a snap) - good performance - easy to use multiple arrays on the same asm instance (my 90tb one was spread over 2 arrays - an xp10k and XIV for instance - different speeds of the arrays would need to be taken into account for the physical db design of course) Cons - need to be aware of the 2 tb limit on individual luns - extra steps needed to taken on maintaining the luns (ie, take backups of the asm metadata regularly) - ideally on larger instances having a dedicated array to the asm is best (we had the large on go belly up because the san array was on a shared infrastructure and work being done for other systems had an impact on those luns - ie, some disk maintenance zeroed out the asm header on our luns even though that was not the system being worked on - or so oracle support told us). Would I use it again? Yes, since the performance and ease of use outweighed the cons for our situation. Besides, knowing the cons it would be easier to add processes to make sure we would be covered. HTH -----Original Message----- From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On Behalf Of Daniel W. Fink Sent: Friday, October 22, 2010 10:10 AM To: oracle-l Subject: Large ASM installation We have a customer that is looking at ASM to handle their databases, the total planned is about 8TB for a single ASM instance. Has anyone on the list worked on a large (5+TB) ASM system? What have been the pros and cons versus a regular LVM and storage? If you had the chance to go back to the decision time, would you make the same decision and why? I'm not needing nitty gritty details right now, more of a high level decision making view. Regards, Daniel Fink -- http://www.freelists.org/webpage/oracle-l NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, go to www.td.com/legal for instructions. AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, allez au www.td.com/francais/avis_juridique pour des instructions. -- http://www.freelists.org/webpage/oracle-l -- Andrew W. Kerber 'If at first you dont succeed, dont take up skydiving.' -- Andrew W. Kerber 'If at first you dont succeed, dont take up skydiving.'