Re: Large ASM installation

  • From: Andrew Kerber <andrew.kerber@xxxxxxxxx>
  • To: "Amaral, Rui" <Rui.Amaral@xxxxxxxxxxxxxxxx>
  • Date: Fri, 22 Oct 2010 12:18:19 -0500

That is pretty close.  We just got a corrupted header, not a zeroed out
header.  But ours was 11gR2 ASM, and it was the OCR ASM group.

On Fri, Oct 22, 2010 at 12:08 PM, Amaral, Rui
<Rui.Amaral@xxxxxxxxxxxxxxxx>wrote:

>  Setup: At the time it was a 36 tb database. Storage was xp10000. Daily
> loads were averaged some 500 – 700 gigs so quite a bit of i/o. We were
> running out of space and I had to add some 30 tb of disk. I added the disks
> – no issue. And the rebalancing took place – power level 2 (I had to keep it
> low because of month reporting happening at the same time).  The estimated
> time for rebalancing to complete was some 36 hours. All 11.1.0.7 across the
> board – RHEL 4 update 3 I believe. Also using asmlib.
>
>
>
> The first thing we noticed about 12 hours after adding the disk was that
> one of the 4 nodes was no longer responding. I took a look in db alert log
> and nothing to indicate a problem (os was responding fine by the way no no
> cpu bound issues). Took a look at the asm alert log and saw the error about
> missing lun. I queried asmlib and sure enough one of the disks was missing
> on that node. I queried asm and it had the disk listed as missing. Went to
> one of the other nodes and did the same procedure and they reported that the
> disk was there. Asmlib reported fine and so did asm. But I did a scan on the
> asm disks as a sanity check (oracleasm scandisks) and they returned one disk
> missing.  From that point (beyond emailing the others in the group saying
> that we had problems) I though I had some time still so I tried dropping the
> disk from asm and let asm relocate the extents it found in memory. About 10
> minutes after doing that the db and asm crashed.
>
>
>
> The thing is only node reported a problem at the time. No OS errors were
> logged and no other errors on the db and nothing reported on the san either.
>
>
>
>
> I do not know all of the details of the post mortem on the disk headers
> precisely. Oracle support had us dump the disk headers via dd (200meg dump)
> and send it over to the them. From their analysis it appeared that the
> headers were manually zeroed during some sort of disk operation (so they
> surmised since there was no real way to tell from the headers directly). One
> of the questions they did ask was if the san was shared. When we said yes
> their response was that they had something similar in other places.
>
>
>
> I took a quick look at the disk header dump myself using bvi and it was
> blank. Nothing to indicate that there was anything on the disk … using bvi
> on the first 8k of a disk header will give you the fs type and block size…
> nothing… zilch.
>
>
>
> Was this similar to you Andrew?
>  ------------------------------
>
> *From:* Andrew Kerber [mailto:andrew.kerber@xxxxxxxxx]
> *Sent:* Friday, October 22, 2010 12:41 PM
> *To:* Amaral, Rui
> *Cc:* daniel.fink@xxxxxxxxxxxxxx; oracle-l
> *Subject:* Re: Large ASM installation
>
>
>
> Rui-
>
> Can you expound a little on what happened to your disk headers?  That
> sounds strikingly similar to the problem we have had.
>
> On Fri, Oct 22, 2010 at 10:40 AM, Amaral, Rui <Rui.Amaral@xxxxxxxxxxxxxxxx>
> wrote:
>
> Hi Daniel,
>
> I have had several rac datawarehouses on asm (single asm instance) in the
> multi terabyte range (20 tb up to 90tb).
>
> Pros - easy to manage from a dba perspective (I did the installation myself
> - OS, cluster, ASM, db so for me it was a snap)
>     - good performance
>     - easy to use multiple arrays on the same asm instance (my 90tb one was
> spread over 2 arrays - an xp10k and XIV for instance - different speeds of
> the arrays would need to be taken into account for the physical db design of
> course)
>
> Cons - need to be aware of the 2 tb limit on individual luns
>     - extra steps needed to taken on maintaining the luns (ie, take backups
> of the asm metadata regularly)
>     - ideally on larger instances having a dedicated array to the asm is
> best (we had the large on go belly up because the san array was on a shared
> infrastructure and work being done for other systems had an impact on those
> luns - ie, some disk maintenance zeroed out the asm header on our luns even
> though that was not the system being worked on - or so oracle support told
> us).
>
> Would I use it again? Yes, since the performance and ease of use outweighed
> the cons for our situation. Besides, knowing the cons it would be easier to
> add processes to make sure we would be covered.
>
> HTH
>
>
> -----Original Message-----
> From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
> On Behalf Of Daniel W. Fink
> Sent: Friday, October 22, 2010 10:10 AM
> To: oracle-l
> Subject: Large ASM installation
>
> We have a customer that is looking at ASM to handle their databases, the
> total planned is about 8TB for a single ASM instance. Has anyone on the
> list worked on a large (5+TB) ASM system? What have been the pros and
> cons versus a regular LVM and storage? If you had the chance to go back
> to the decision time, would you make the same decision and why?
>
> I'm not needing nitty gritty details right now, more of a high level
> decision making view.
>
> Regards,
> Daniel Fink
> --
> //www.freelists.org/webpage/oracle-l
>
>
>   NOTICE: Confidential message which may be privileged. Unauthorized
> use/disclosure prohibited. If received in error, go to www.td.com/legalfor 
> instructions.
> AVIS : Message confidentiel dont le contenu peut être privilégié.
> Utilisation/divulgation interdites sans permission. Si reçu par erreur,
> allez au www.td.com/francais/avis_juridique pour des instructions.
>
> --
> //www.freelists.org/webpage/oracle-l
>
>
>
>
> --
> Andrew W. Kerber
>
> 'If at first you dont succeed, dont take up skydiving.'
>



-- 
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'

Other related posts: