Re: Lightweight method for testing database backup processes

  • From: Mladen Gogala <gogala.mladen@xxxxxxxxx>
  • To: Matthew Parker <dimensional.dba@xxxxxxxxxxx>, nenad.noveljic@xxxxxxxxxxx, cstephens16@xxxxxxxxx, oracle-l@xxxxxxxxxxxxx
  • Date: Mon, 21 Aug 2017 14:37:16 -0400

I am not working with tapes much these days, mostly with the things like Glacier, cheap remote storage like Isilon or a combination of both. So, backups are kept less than a week on the primary site, around a month on the Isilon and almost forever in the Amazon Glacier. Every modern backup suite has a built-in verification mechanism, which can verify whether backup is good or not. You can also run "restore validate" on a regular schedule. I don't see a big science here. I have restored a TB sized database from Glacier, no problems at all. There are also non-rman mechanisms like SRDF, HUR and SnapVault which can be used for backing up databases. At the last stage, a file backup of the snapshot is performed and stored to the Glacier, to meet regulatory obligations. How do you propose to validate those, on a weekly basis?

Regards


On 08/21/2017 02:25 PM, Matthew Parker wrote:


I have to disagree with you, in most organizations it is not a DR test. A DR test servers a different purpose than ensuring that your backups and processes are good by testing them on a regular basis..

It is not faith based testing.

There are a variety of testing that can be performed.

First there are backups that are offsite besides just database backups.

I have been in a variety of organizations that have quarterly SOX audits where we pull back a set of tapes based on a random selections of files by the auditor to verify that backups are statistically good.

There is also some organizations I have worked for where the requirement was to yearly test all backups and it was not a single yearly test it was were testing backups throughout the year to verify the system was working throughout the year, not just at one selected timepoint, but by the end of the year we had recovered at least once all multi-thousand databases. You normally setup automation to perform the onsite based backups, but the selection of offsite backups to prove those processes too normally has some manual intervention.

Testing of a single tablespace is a viability test of the database if you fully recover it to open the database. This is how lots of organizations that have databases that are 100TB – 1PB size oracle database test the viability of the backup. They don’t necessarily have enough space to restore every portion of the database but can restore pieces at time.

I also restored system, sysaux, undo and 1 tablespace through multiple cycles so that in the end the complete database was restore tested.

Having all your DBAs testing restores also keeps them practiced on the process and increases the interaction between the DBA and Backup team which is always a good thing. Yes, I have been at organizations where they do not test backups at all, and then when the oncall is pinged to do the restore something is wrong they fumble through SOPs to try and figure what needs to be done and the recovery takes longer than it should or others have to become involved because the DBAs are not practicing their craft.

It also helps your team capture changes in the process as sometime the different teams don’t communicate well with each other and it is better to discover some change that could be detrimental to you during a test instead of when you really need it.

When I first started out as a DBA the Senior DBA in our org basically setup a test system and put me through 30 days of disaster recovery training. He would destroy the database and it was my job to restore it and explain how he had destroyed/broken it. It was invaluable training

*Matthew Parker*

*Chief Technologist*

*Dimensional DBA*

*425-891-7934 (cell)*

*D&B *047931344**

*CAGE *7J5S7**

*Dimensional.dba@xxxxxxxxxxx*<mailto:Dimensional.dba@xxxxxxxxxxx>**

*View Matthew Parker's profile on LinkedIn*<http://www.linkedin.com/pub/matthew-parker/6/51b/944/>

www.dimensionaldba.com<http://www.dimensionaldba.com/>

*From:*Mladen Gogala [mailto:gogala.mladen@xxxxxxxxx]
*Sent:* Monday, August 21, 2017 9:32 AM
*To:* Matthew Parker <dimensional.dba@xxxxxxxxxxx>; nenad.noveljic@xxxxxxxxxxx; cstephens16@xxxxxxxxx; oracle-l@xxxxxxxxxxxxx
*Subject:* Re: Lightweight method for testing database backup processes

On 08/21/2017 11:04 AM, Matthew Parker wrote:

    Most organizations who have to participate in any type of
    compliance requirements such as SOX Compliance are required to
    test their backups.


And most organizations do perform such testing. Such practice is called "DR test" and usually occurs once or twice per year. Testing on daily or weekly schedule is something unusual, even if it's only done using "restore validate".
Further more, the OP proposed testing backup/restore of a specific tablespace. I don't see how correctness of the tablespace backup guarantees the correctness of the full or incremental database backup? That looks like a faith based testing strategy.
Regards

--
Mladen Gogala
Oracle DBA
Tel: (347) 321-1217

--
Mladen Gogala
Oracle DBA
Tel: (347) 321-1217

Other related posts: