That is pretty close. We just got a corrupted header, not a zeroed out header. But ours was 11gR2 ASM, and it was the OCR ASM group. On Fri, Oct 22, 2010 at 12:08 PM, Amaral, Rui <Rui.Amaral@xxxxxxxxxxxxxxxx>wrote: > Setup: At the time it was a 36 tb database. Storage was xp10000. Daily > loads were averaged some 500 – 700 gigs so quite a bit of i/o. We were > running out of space and I had to add some 30 tb of disk. I added the disks > – no issue. And the rebalancing took place – power level 2 (I had to keep it > low because of month reporting happening at the same time). The estimated > time for rebalancing to complete was some 36 hours. All 11.1.0.7 across the > board – RHEL 4 update 3 I believe. Also using asmlib. > > > > The first thing we noticed about 12 hours after adding the disk was that > one of the 4 nodes was no longer responding. I took a look in db alert log > and nothing to indicate a problem (os was responding fine by the way no no > cpu bound issues). Took a look at the asm alert log and saw the error about > missing lun. I queried asmlib and sure enough one of the disks was missing > on that node. I queried asm and it had the disk listed as missing. Went to > one of the other nodes and did the same procedure and they reported that the > disk was there. Asmlib reported fine and so did asm. But I did a scan on the > asm disks as a sanity check (oracleasm scandisks) and they returned one disk > missing. From that point (beyond emailing the others in the group saying > that we had problems) I though I had some time still so I tried dropping the > disk from asm and let asm relocate the extents it found in memory. About 10 > minutes after doing that the db and asm crashed. > > > > The thing is only node reported a problem at the time. No OS errors were > logged and no other errors on the db and nothing reported on the san either. > > > > > I do not know all of the details of the post mortem on the disk headers > precisely. Oracle support had us dump the disk headers via dd (200meg dump) > and send it over to the them. From their analysis it appeared that the > headers were manually zeroed during some sort of disk operation (so they > surmised since there was no real way to tell from the headers directly). One > of the questions they did ask was if the san was shared. When we said yes > their response was that they had something similar in other places. > > > > I took a quick look at the disk header dump myself using bvi and it was > blank. Nothing to indicate that there was anything on the disk … using bvi > on the first 8k of a disk header will give you the fs type and block size… > nothing… zilch. > > > > Was this similar to you Andrew? > ------------------------------ > > *From:* Andrew Kerber [mailto:andrew.kerber@xxxxxxxxx] > *Sent:* Friday, October 22, 2010 12:41 PM > *To:* Amaral, Rui > *Cc:* daniel.fink@xxxxxxxxxxxxxx; oracle-l > *Subject:* Re: Large ASM installation > > > > Rui- > > Can you expound a little on what happened to your disk headers? That > sounds strikingly similar to the problem we have had. > > On Fri, Oct 22, 2010 at 10:40 AM, Amaral, Rui <Rui.Amaral@xxxxxxxxxxxxxxxx> > wrote: > > Hi Daniel, > > I have had several rac datawarehouses on asm (single asm instance) in the > multi terabyte range (20 tb up to 90tb). > > Pros - easy to manage from a dba perspective (I did the installation myself > - OS, cluster, ASM, db so for me it was a snap) > - good performance > - easy to use multiple arrays on the same asm instance (my 90tb one was > spread over 2 arrays - an xp10k and XIV for instance - different speeds of > the arrays would need to be taken into account for the physical db design of > course) > > Cons - need to be aware of the 2 tb limit on individual luns > - extra steps needed to taken on maintaining the luns (ie, take backups > of the asm metadata regularly) > - ideally on larger instances having a dedicated array to the asm is > best (we had the large on go belly up because the san array was on a shared > infrastructure and work being done for other systems had an impact on those > luns - ie, some disk maintenance zeroed out the asm header on our luns even > though that was not the system being worked on - or so oracle support told > us). > > Would I use it again? Yes, since the performance and ease of use outweighed > the cons for our situation. Besides, knowing the cons it would be easier to > add processes to make sure we would be covered. > > HTH > > > -----Original Message----- > From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] > On Behalf Of Daniel W. Fink > Sent: Friday, October 22, 2010 10:10 AM > To: oracle-l > Subject: Large ASM installation > > We have a customer that is looking at ASM to handle their databases, the > total planned is about 8TB for a single ASM instance. Has anyone on the > list worked on a large (5+TB) ASM system? What have been the pros and > cons versus a regular LVM and storage? If you had the chance to go back > to the decision time, would you make the same decision and why? > > I'm not needing nitty gritty details right now, more of a high level > decision making view. > > Regards, > Daniel Fink > -- > //www.freelists.org/webpage/oracle-l > > > NOTICE: Confidential message which may be privileged. Unauthorized > use/disclosure prohibited. If received in error, go to www.td.com/legalfor > instructions. > AVIS : Message confidentiel dont le contenu peut être privilégié. > Utilisation/divulgation interdites sans permission. Si reçu par erreur, > allez au www.td.com/francais/avis_juridique pour des instructions. > > -- > //www.freelists.org/webpage/oracle-l > > > > > -- > Andrew W. Kerber > > 'If at first you dont succeed, dont take up skydiving.' > -- Andrew W. Kerber 'If at first you dont succeed, dont take up skydiving.'