RE: Minimize recovery time

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <loknath.73@xxxxxxxxx>, "'Andy Sayer'" <andysayer@xxxxxxxxx>
Date: Wed, 27 Apr 2022 09:37:50 -0400

If the business requirement is truly for multiple site disasters still
providing business continuation you have a difficult task.

First, you should try to gain an understanding amongst the stakeholders that IF
you are guarding against multiple data center disasters (otherwise a dataguard
or a remote standby catch-up and fail-over seems sufficient), that implies you
have a third repository of the data far away from the first two, most likely
with an agreement with a third party to spin up hardware to recover on at their
site.

Very likely they will then understand that your current set up for failover is
sufficient for the requirement.

IF I am wrong about that, then the most likely solution is to introduce time
based partitioning of all the data that in fact has a date after which it is
not allowed to be changed AND is not required for the operations that must be
available for business continuation. (Rarely are old transaction histories
required with the same immediacy as current inventory quantities, and so forth).

IF sufficient data meeting those characteristics can be identified that will
permanently keep you within your physical reload recovery window, then you also
need to be in a position to shuffle (probably shrinking down free space and
permanently doing useful attribute clustering) partitions to “slower recovery
okay” tablespaces.

Then you can practice the plan to bring up immediately only those tablespaces
required for operations that need the stated business continuation immediacy
(continuing the reload and recovery of the other tablespaces after business of
the critical functions resumes.)

Avoiding this entire race dance is the point of online recovery mechanisms:
Modern systems, often on SSD, quite often grow far too large to “back up” onto
more persistent storage in terms of the ability to read that persistent storage
back onto storage connected to your machine.

Another possible way to do this is to plug multiple SANs into your machine(s).
This, of course, does not handle the multiple site disaster problem. You don’t
keep any “current” data on the alternated SANs (of which you have a minimum of
two), because you never start overwriting your only complete backup file set.
After a backup is complete, the relevant SAN is physically disconnected.

Then, in a “storage disaster”, after you clean up the host from the software
hack that was likely the cause, you connect your most recent backup SAN and
away you go.

Not all machines have connections for plugging in multiple SANs, and of course
you can’t make these SANs “virtual” storage. You’re unplugging one to make it
air gapped from attack. You might have an air gapped machine to plug it into to
run full surface scans and memory checks (SSD), but that entire set-up is
non-networked.

When they balk at the cost, perhaps it is time to engage a certified actuary to
explain to them what rare case they are insuring against (and probably don’t
have all the things they need to in order to make the plan possibly succeed.)

And, of course, any plan you have that you don’t test regularly is just wishful
thinking. Testing plug-in replacement storage is probably a bigger risk than
relying on something like dataguard or storage snapshots.

If they are worried about this, do they have multiple physically independent
communications infrastructure? How about power generators?

Good luck,

mwf

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Lok P
Sent: Wednesday, April 27, 2022 7:21 AM
To: Andy Sayer
Cc: Oracle L
Subject: Re: Minimize recovery time

Yes they are on different datacenter and those too in different locations.

And backup is being taken in both primary and secondary through ZDLRA and i
believe the respective backups must be kept in the respective data centre in
their configured ZDLRA storage.

On Wed, 27 Apr 2022, 4:06 pm Andy Sayer, <andysayer@xxxxxxxxx> wrote:

Your dataguard is using the same storage as your primary? Usually it would be a
whole different data centre. Where are your backups going?

On Wed, 27 Apr 2022 at 11:35, Lok P <loknath.73@xxxxxxxxx> wrote:

Yes we have dataguard setup , but this agreement is in place in case of both
primary and dataguard DB fails because of disaster or corruption etc.

On Wed, 27 Apr 2022, 3:30 pm Andy Sayer, <andysayer@xxxxxxxxx> wrote:

Have you considered Dataguard? You’d have a secondary database always ready to
failover to.

Thanks,

Andy

On Wed, 27 Apr 2022 at 10:50, Lok P <loknath.73@xxxxxxxxx> wrote:

Hello Listers, We have an Oracle Exadata (X7) database with 12.1.0.2.0 and it's
now grown up to size 12TB now. As per client agreement and criticality of this
application the RTO(Recovery time objective) has to be within ~4hrs. The team
looking after the backup recovery has communicated the RTO(recovery time
objective) as ~1hrs for ~2TB of data with current infrastructure. So going by
that, this current size of the database will have RTO ~6hrs which is more than
the client agreement(which is ~4hrs).

Going through the top space consumers, we see those are table/index
sub-partitions and non partitioned indexes. Should we look into table/index
compression here? But then i think there is also downside of that too on the
DML performance.

Wanted to understand Is there any other option to get this achieved (apart from
exploring possible data purge) to have this RTO faster or under the service
agreement? How should we approach.

Regards

Lok

Follow-Ups:
- Re: Minimize recovery time
  - From: Lok P

References:
- Minimize recovery time
  - From: Lok P
- Re: Minimize recovery time
  - From: Andy Sayer
- Re: Minimize recovery time
  - From: Lok P
- Re: Minimize recovery time
  - From: Andy Sayer
- Re: Minimize recovery time
  - From: Lok P

RE: Minimize recovery time

Other related posts: