Re: RMAN and NetBackup Performance

  • From: "Mark Strickland" <strickland.mark@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Mon, 12 Mar 2007 10:03:45 -0700

PARTIAL RESOLUTION

After much drama with Symantec and much testing on my end, I figured out
that you CAN use the "%t" format AND use an Oracle-managed Flash Recovery
Area.  The format has to be specific for the tape backups not for the
backups to the FRA.  I was trying to use it in my backups to the FRA.  So,
when you do RMAN backups to a Flash Recovery Area, don't specify a format
for the RMAN backupset names.  Just use the RMAN-supplied defaults.  When
you subsequently backup the FRA to tape, specify a format with "%t" at the
end.  Specifically, Symantec recommends this:

   'bk_%s_%p_%t'

Whether you configure the tape channel(s) with this format permanently or
specify it in the allocate command or in the backup command probably makes
no difference.  I'm doing it in the allocate command:

   allocate channel t1 type SBT format 'bk_%s_%p_%t';

My backups are now going as fast as they used to (or faster).  With the "%"
format, NetBackup is able to search the image catalog quickly.  Without it,
NetBackup starts at the most recent image file and works its way backwards
in time all the way to the oldest image file.  It does this because it has
to verify uniqueness for the name of the backupset.  In our case, for 90
days' worth of backups, we have 42,000 image files that have to be searched
before and after the backup of the backupset.  I did a simple test grep-ing
through the image files for the name of one of my backupsets.  It took 8
minutes.  NetBackup can do it in about 6 minutes (without the benefit of the
%t format).  By that time, the tape has gotten ejected and has to be
re-mounted.  Another 1-2 minutes.

What I still don't understand (and maybe never will) is why my backups
suddenly started taking a long time after 2/19.  I'm still sending logs to
Symantec.  I can find nothing that changed on 2/19.  The number of image
files in the NetBackup catalog didn't suddenly grow exponentially.  They
were actually, day by day, dropping in number.  At any rate, I do have a
solution that works well.  So, if you use RMAN and NetBackup, you might
consider that same solution.  Your tape backups may be taking much longer
than they need to.  I now know more about NetBackup logs than I ever cared
to.  If anyone would like a quick run-down of the logs, contact me offline.
I'll be happy to share the misery.

BTW, compared to Symantec's support site, Metalink is a thing of beauty.

HTH,
Mark Strickland
Seattle, WA


On 3/6/07, Mark Strickland <strickland.mark@xxxxxxxxx> wrote:

Oracle 10.1.0.5 RAC on Solaris9
Symantec/Veritas NetBackup 5

Anyone out there very experienced with managing RMAN with NetBackup?

Starting the evening of February 19th, duration of the backups to tape of
the Production Flash Recovery Area suddenly jumped from about 1/2 hour on
average to 2-3 hours.  These are level 1 incrementals.   The actual backup
time for each backupset is still 1-2 minutes as always, but there is a 5-7
minute delay in between.  Nothing has changed in terms of number/size of
backupsets and total size of the backups.  We opened a ticket with Symantec
and we were instructed to turn on verbose logging for the various NetBackup
processes on the RMAN client database server and NetBackup master server.
I've become obscenely intimate with verbose NetBackup logs over the last two
weeks.  So far, not getting very far with Symantec.  Their one contribution
has been to suggest that we explicitly set the format for the RMAN backups
with a %t at the end.  This apparently is supposed to improve performance of
NetBackup catalog lookups.  In the RMAN docs, it says that if the format
statement is used, Oracle will not manage the Flash Recovery Area
automatically.  So, that idea's out.  I don't want to manage the FRA
manually.  After poring over NetBackup logs, we've determined that:

The NetBackup image database for this particular RMAN client is quite
large with about 42,000 image files totalling 47-GB.  During each backup of
an RMAN backupset, the image database is searched to see if a record for the
RMAN backupset already exists.  It starts with the most recent image file
and works backward sequentially one-by-one through the 42,000 image files to
the oldest image file (90 days ago, the retention period) even after it has
already found the record for the backupset.  That takes about 6 minutes,
which is a long enough sleep for the NetBackup Scheduler to wake up and grab
the opportunity to do a backup of the NetBackup catalog.  Also, during this
6 minutes, the Media_Unmount_Delay has reached its default 180 second
timeout, so NetBackup determines that the tape is no longer needed in the
drive and ejects it.  Finally, after the catalog search has come back and
the same tape is re-mounted, 15 minutes have passed.  Symantec had us
disable automatic catalog backups and explicitly schedule them instead,
which removed 6-8 minutes from the duration of the backup.  What is left is
the 6-7 minutes of searching the image database and remounting the tape.

I can find nothing that changed on February 19th in our environment.  The
morning backups were of normal duration and the evening backups were not and
the backups have been slow ever since.  This is especially annoying because
in early February, I increased MAXSETSIZE to get more data files into each
backupset and reduce the number of backupsets from 100 to 15 which decreased
the backups to tape from 3-1/2 hours to 30 minutes.  We were quite enjoying
that improvement.

I've Googled and I've searched the Symantec site for clues.  Nothing so
far.  Any ideas?

Regards,
Mark Strickland
Seattle, WA


Other related posts: