RE: Backups versus snapshots

  • From: "Dimensional DBA" <dimensional.dba@xxxxxxxxxxx>
  • To: <yparesh@xxxxxxxxx>, <iggy_fernandez@xxxxxxxxxxx>
  • Date: Thu, 18 Sep 2014 23:19:45 -0700

We didn’t follow the version route. The Global backups team at Amazon was 1 
manager and 3 engineers, responsible for tape systems and backup SW worldwide 
(6 PB/month). The objective at Amazon was automation, not scaling the human 
work force and solve the problems of today and the future with good design.

 

The new process was to simply upgrade and uplift media in the 7 year cycle. The 
7 year cycle was based calculations on failure of tape media with the more 
hostile 90 degree Fahrenheit temperature environment and the 3 week over write 
reuse of the tapes in most instances. The object was to eliminate the equipment 
version problem by having newer versions of the LTO tape drives come into play 
and by the second version down the road that could still read 2 versions back 
start the retrofit of the old media to newer version media, then basically dump 
all the old equipment. You do what you have to do and then modify process, 
procedures and architecture to eliminate your problems. We also did a lot with 
process to eliminate humans interacting with the media which has the greatest 
potential for tape loss and monitoring the equipment to ensure proper operation 
of the tape drives and rate detection to adjust backups to eliminate shoe 
shining of tapes.

 

For the specific recovery I had external disk copies of Oracle back to Version 
7. I sort of maintain my own version copies at home. (If anyone has versions 
older than 7.x I would love to have them.)

I also maintain older copies of Linux, HPUX, Solaris and MS Windows. At the 
time for hardware as planning wasn’t done for maintaining older versions of 
equipment or what would you need to do to really restore, I fell back on ebay 
for equipment for the actual drive as I already had an old server tucked away.

 

The 1997 copy was an actually export instead of regular backup as someone had 
the foresight at the time to think about the complexity of full database 
restore if you were not storing all the other components although later I fond 
CD s of a backup too. The recovery was actually off of 8mm DAT. I had a few 
later ones from CDs.

Realistically speaking some of the recoveries were simply luck based on the 
lack of care of the media.

The longer time that you may be thinking of is from most people thinking 
linearly instead of performing multiple tasks in parallel where possible. I 
have seen a backup team not start any work until the tape is actually in their 
hands, when there is associated work that could be performed like prep the 
server and install he relevant SW, then start the restore as soon as the media 
arrives. I have had that need to push some teams in certain companies as the 
media was awaiting for the DBA team to formally request the backup team to 
start the restore.

 

In this case the 8mm tape was actually in a desk drawer I took over from the 
previous manager, so retrieving the item in question a simple check off the 
list as it was at my desk. If I would have had to go through the +12K tapes 
that were stored in Iron Mountain prior to the year 2004, then it would have 
taken much longer as the previous team had performed an upgrade in 2003 on 
Symantec SW and had simply installed net new and the previous catalog was lost. 
Also in 2000 Iron Mountain had upgraded their systems and everything prior to 
12/2000 was listed as ingested by Iron Mountain on that date. So if I really 
wanted a specific tape in Iron Mountain before 2003, then we would have had to 
retrieve every tape from Iron Mountain prior to 2004 read them all to rebuild a 
catalog to find anything (Estimated time would have been 9 months). There are 
lots of things that can go wrong in the infrastructure if you are not thinking 
about long term in the future. That includes disk storage if your vendor is not 
using some technology to counter bit rot and verification of data moves from 
point A to point B as they perform data moves to upgrade equipment. 

 

The fact I had the media immediately available then it was just kick start the 
server and install the OS, then install Oracle, less than 6 hours. The longest 
wait was 2 days for the arrival of the drive, (Saved me time from having to dig 
through hundreds of boxes in my storage shed where I know tape drives of all 
sort remains buried to this day.)

 

Once we got the processes down, we had tape backups of the OS kickstart servers 
with all copies of all images of the OS used along with all the database SW 
installable SW homes, so we could restore to any specific OS and version. You 
still have to deal with driver problems with the OS sometimes or relink 
problems with the Oracle homes. In some cases this is why a complete system 
image including database may be stored. (Every situation has some differences 
as what you are trying to do.)

 

As to small business or large business you have to have a process and 
understand the technology. Tape systems are not expensive for the small 
business if you use the smaller version systems from the smaller tape system 
vendors. An example you can buy a single tape drive desktop unit with 8 slots, 
I had one of the first ones back in 1996, that currently cost only about $4K, 
whereas for anyone that purchases large scale system vendors know that the list 
price on a single say LTO6 drive is 5 times that. There are also replaceable 
disk units or as I have seen at some small companies they simply attach a USB 
drive to a back of the server, then send it off to storage to Iron Mountain. It 
is the longer term process that media needs to be pulled back and converted if 
necessary. If you are a business under regulatory compliance you will do what 
is necessary to accomplish the task. How well you accomplish the tasks varies 
sadly by the humans involved.

 

I remember writing backups to tape systems with mt and tar and manipulating the 
library directly with shell scripts before all the nice backup SW existed. It 
is all doable for even small companies, but you must have process. Yes, it 
takes some extra human effort to perform the job. I have seen some really small 
business such as a local community center (non-profit which means spend no 
money if possible as it all should be spent to help the community) that had a 
single DAT drive on their windows server and pulled the tape out every morning 
after the backup ran the night before as a safety measure for their data. Yes, 
the kindly old lady at the computer dutifully took the tape backup from the 
previous night home every night in her purse and kept it in a cabinet for a 
couple months before bringing the old tapes back.

 

You have a variety of choices to make relative to cost, simplicity, risk etc. 
There is not one right answer for every business as each aspect of your choices 
have different priorities to the business, (disk, tape, cloud, nothing)

 

I worked at a lot of companies working closely with the CFO on financial 
systems and from the CFO perspective their concept was do what is necessary to 
ensure compliance and keep them out of jail. You present options with risk 
analysis and they will choose what level of risk versus cost they are willing 
to take. Your job is to implement the system with proper monitoring and 
processes to ensure reliability including testing restores on a regular basis. 
Even the best system can be undermined by the humans or by neglect.

 

 

 

Matthew Parker

Chief Technologist

425-891-7934 (cell)

Dimensional.dba@xxxxxxxxxxx

 <http://www.linkedin.com/pub/matthew-parker/6/51b/944/> View Matthew Parker's 
profile on LinkedIn

 

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Paresh Yadav
Sent: Thursday, September 18, 2014 9:15 PM
To: iggy_fernandez@xxxxxxxxxxx
Cc: kmoore@xxxxxxxxxxxx; Oracle-L@xxxxxxxxxxxxx
Subject: Re: Backups versus snapshots

 

Thanks Matthew for sharing your valuable experience at Amazon.. As Hemant 
mentioned you must have preserved all associated tech (tape library to read the 
old tapes, machine and OS version that can run the old db software version, db 
software at version that can restore the backups etc.). And this needs to be 
done for all possible tech (hardware and software) and its version that gets 
used over a period of time. Amazon can afford the infrastructure and manpower 
required to maintain this but how does a SMB meet 7 year regulatory retention 
requirement?

 

What was typical time to recover a 1997 Oracle db backup (probably Oracle 
version 7.x) in 2010 after having to install Oracle 7.x software on a 
compatible OS and hardware? This will involve not only locating the backups but 
also the software install media and the hardware that can run the software.




Thanks

Paresh

416-688-1003

 

 

On Fri, Sep 19, 2014 at 12:04 AM, Iggy Fernandez <iggy_fernandez@xxxxxxxxxxx> 
wrote:

snapshots or backups are just means to an end; that is, meeting the 
availability and regulatory requirements within the available budget. if, for 
example, you have regulatory requirements to store data for a certain number of 
years, then you could copy the contents of the snapshots to tape.

 

re:  if the database goes poof then the snapshot is gone as well

if the database goes poof, then the snapshot remains

 

iggy

 

> To be clear, the snapshots are not physical copies of the database. They only 
> track the differences between the database at the time of the snapshot and 
> the current time. So if the database goes poof then the snapshot is gone as 
> well.

 

Other related posts: