RMAN and NetBackup Performance

  • From: "Mark Strickland" <strickland.mark@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Tue, 6 Mar 2007 16:16:42 -0800

Oracle 10.1.0.5 RAC on Solaris9
Symantec/Veritas NetBackup 5

Anyone out there very experienced with managing RMAN with NetBackup?

Starting the evening of February 19th, duration of the backups to tape of
the Production Flash Recovery Area suddenly jumped from about 1/2 hour on
average to 2-3 hours.  These are level 1 incrementals.   The actual backup
time for each backupset is still 1-2 minutes as always, but there is a 5-7
minute delay in between.  Nothing has changed in terms of number/size of
backupsets and total size of the backups.  We opened a ticket with Symantec
and we were instructed to turn on verbose logging for the various NetBackup
processes on the RMAN client database server and NetBackup master server.
I've become obscenely intimate with verbose NetBackup logs over the last two
weeks.  So far, not getting very far with Symantec.  Their one contribution
has been to suggest that we explicitly set the format for the RMAN backups
with a %t at the end.  This apparently is supposed to improve performance of
NetBackup catalog lookups.  In the RMAN docs, it says that if the format
statement is used, Oracle will not manage the Flash Recovery Area
automatically.  So, that idea's out.  I don't want to manage the FRA
manually.  After poring over NetBackup logs, we've determined that:

The NetBackup image database for this particular RMAN client is quite large
with about 42,000 image files totalling 47-GB.  During each backup of an
RMAN backupset, the image database is searched to see if a record for the
RMAN backupset already exists.  It starts with the most recent image file
and works backward sequentially one-by-one through the 42,000 image files to
the oldest image file (90 days ago, the retention period) even after it has
already found the record for the backupset.  That takes about 6 minutes,
which is a long enough sleep for the NetBackup Scheduler to wake up and grab
the opportunity to do a backup of the NetBackup catalog.  Also, during this
6 minutes, the Media_Unmount_Delay has reached its default 180 second
timeout, so NetBackup determines that the tape is no longer needed in the
drive and ejects it.  Finally, after the catalog search has come back and
the same tape is re-mounted, 15 minutes have passed.  Symantec had us
disable automatic catalog backups and explicitly schedule them instead,
which removed 6-8 minutes from the duration of the backup.  What is left is
the 6-7 minutes of searching the image database and remounting the tape.

I can find nothing that changed on February 19th in our environment.  The
morning backups were of normal duration and the evening backups were not and
the backups have been slow ever since.  This is especially annoying because
in early February, I increased MAXSETSIZE to get more data files into each
backupset and reduce the number of backupsets from 100 to 15 which decreased
the backups to tape from 3-1/2 hours to 30 minutes.  We were quite enjoying
that improvement.

I've Googled and I've searched the Symantec site for clues.  Nothing so
far.  Any ideas?

Regards,
Mark Strickland
Seattle, WA

Other related posts: