Re: RAC Vs Standby Database between Primary and Secondary Data Centers

From: Andrey Kriushin <Andrey.Kriushin@xxxxxxxx>
To: dannorris@xxxxxxxxxxxxx
Date: Tue, 22 Jan 2008 22:53:55 +0300


Hi,
comments inline

--Andrey

Dan Norris wrote:

Dick,
Here's where I think we need to make clear what defines "highavailability" versus what becomes "disaster recovery". Many siteswant/need both. In my dictionary, I define high availability as asystem that can tolerate a failure of a single component withoutaffecting the application availability. There's also "faulttolerance", but that starts to get into a whole other world, so let'sput that out of scope for now.

IMHO, the mentioning of "fault tolerance" (FT) is very appropriate. Dueto widly speaded misconception of those, who are new to Oracle RAC ornot informed enough to resist the marketroid's push. Namely, the HA isoften read as "FT". I.e. many believe that long running query... or thebatch job which modifies the data but doesn't make the SAVE POINTS (donot confuse with the SAVEPOINT SQL command :-)) would just continue itswork from the point of failure after the failover to the survived node.


... skipped

As another poster mentioned, RAC does have some support for "stretchclusters", but they are not widely used and the MAA still recommendsstandby database in combination with RAC (at least the last time Iread it).

The terminology is not very stable here (stretch clusters). Definitely,the nodes of the cluster are distributed among several (at least two)data centres and each data centre has its own storage. Usually peopleconsider the two capabilities:

1. (Most commonly used) There is a synchronous replication of diskblocks between the storages via the hardware capabilities of thestorages. All the nodes on of the particular site are working with thelocal storage. Is this case there is one cluster-critical point - thequorum disk (if it is used in the underlying clusterware). When theentire data centre is failured or there is a failure of it's storage,then either the nodes of that center are considered dead until thefailure will be resolved, or they switch to the storage of the otherdata centre just as another "local" storage (if there is propercapability).

2. All nodes of the stretch cluster are using only one storage at chosendata center. For all of the nodes at any site the storage looks like"local". The other storages contain the standby(-ies), served by theDataGuard/synchronous replication by storage HW/combination of synch(for crilical files) and async for datafiles replication of storage HW.I've heard of H.A.R.D. initiative, though have not practicalexperience and/or good docs on that. Would be interested if experiencedcolleagues will point me to the right docs.

The first configuration of "stretch cluster" is Higher Available, as thesecond one requires some manual steps (symbolic links redefinition etc).The second is usually cheaper. Also the first configuration mightprovide better throughput - more storages are runnig in parallel -especially when the modification rate is moderate.

Jared also mentioned the human error... Well... Uhgg... Which can bebetter tolerated with the DataGuard with its delay in applying of thearchived logs.

References:
- Re: RAC Vs Standby Database between Primary and Secondary Data Centers
  - From: Dan Norris

Re: RAC Vs Standby Database between Primary and Secondary Data Centers

Other related posts: