Re: Tuning RMAN backup and recovery

  • From: "Mark Brinsmead" <pythianbrinsmead@xxxxxxxxx>
  • To: don@xxxxxxxxx
  • Date: Sun, 25 Nov 2007 08:58:17 -0700

I may be joining this thread a little late, but oh well.  Perhaps I can
still add something to the discussion.

Just to summarize Don's situation:

-------------------------
Don is using RMAN to backup a database of about 860GB.  The backups take
more than 10 hours;  less than 86GB/hr or 2.5 MB/s.

The backup is written to disk in /rman, a Veritas filesystem.

The RMAN backup uses 4 concurrent threads, with compression.

Don is unsure of the underlying disk configuration (RAID-1 vs. RAID-5, how
many spindles, etc.) but is reasonably sure that /rman shares physical
spindles with the database.

Don's "sar" statistics show that during the backup, the system is completely
"busy", spending about 30% of its time in CPU, and 70% waiting on I/O.
--------------------------

Okay, so it looks pretty clear that these backups are I/O bound.  It is also
highly likely from what we have been told that there is substantial I/O
contention.  There are four concurrent backup threads reading from and
writing to the same set of disks.  This might also be aggravated by the cost
of software-based RAID-5, but we do not actually *know* whether this this
the case.

With 10g, RMAN compression can be either a blessing or a curse.  In this
case, where we are probably (badly) I/O bound, so the compression is *
probably* beneficial.  I think Don has done tests to confirm that, but I'm
not certain I have seen that in this thread.

Based on what we have seen, I would think that the very best (or at least, *
first*) "optimization" we can apply here is to separate the back storage
from the database storage, on separate sets of spindles.  Do not use RAID-5
for the /rman filesystem, except *maybe* with high end hardware-supported
RAID-5 where sequential writes are recognised and optimised.

Don already plans to re-arrange the /rman storage.  This should be done
sooner rather than later, I think.

(Note: there are better reasons for rearranging this storage configuration
than just performance.  In the event of a storage failure, there is a
significant risk of losing *both* the database *and* the backups.  That
would be a "bad thing (tm)".)

While I/O contention remains the main limiting factor for backup
performance, RMAN compression is probably going to be a net benefit;  the
few disk blocks *written* by the backup, the fewer the counter-productive
disk seeks; this leads to less contention and faster throughput.

There is, however, a second potential source of I/O contention -- the
parallelism of the backup.  In cases where backup parallelism is not well
matched to the storage configuration, additional parallelism *harms*throughput.

Don, have you tried your backups with *fewer* parallel threads?  This could
be a tough thing to balance, but you may find that at least until you
separate the backup and database storage, the reduced I/O contention might
actually allow you to do your backups faster...

Back in the 90's, a typcial CPU could "gzip" data with a throughput of
around 1.0 MB/s.  Current CPUs can do much better.  But your backup threads
(unless I have botched my arithmetic) are averaging only somewhere around
0.6 MB/s.  Ignoring RMAN for the moment, how fast can you gzip a 1 GB file?
Until your backups are achieving *at least* four times that rate, you can
probably assume they are I/O bound.

Anyway, these are a few thoughts on your situation;  I hope they are not too
random or disjointed.  I hope even more that they are helpful.  :-)

I think someone earlier in this thread asked about methods to optimized
disk-based backups.  Aside from the observations offered above, I have only
come across one *really* reliable way of doing this -- buy a tape drive!
:-)  There are very affordable tape drives out there that are capable of
sustaining throughputs well in excess of 100MB/s.  That's 360 GB/hr.  In
this particular situation, a $5000 tape drive *could *completely transform
your backups.  Your only challenge then will be to find a way to keep the
tape drive "fed" -- it is common for tape-based backups to suffer
performance-wise when data cannot be delivered as fast at the tape drive can
take it.

But that is a different discussion, perhaps for a different day...


On Nov 16, 2007 3:22 PM, Don Seiler <don@xxxxxxxxx> wrote:

> Here's the "sar -u" output from Saturday night and Sunday morning of
> this past weekend when the level 0 database backup was running.  I'm
> not sure if you're interested in the -d output, or if you'd rather see
> iostat output.
>
> root@foo:/var/log/sa # sar -u -f sa10 -s 22:30:00 -i 900
> Linux 2.6.9-55.0.6.ELsmp (foo.bar.com)  11/10/2007
>
> 10:30:01 PM       CPU     %user     %nice   %system   %iowait     %idle
> 10:45:01 PM       all     29.56      0.00      0.90      0.16     69.37
> 11:00:01 PM       all     28.41      0.00      0.80      0.11     70.68
> 11:15:01 PM       all     29.75      0.00      0.89      0.11     69.25
> 11:30:01 PM       all     29.04      0.00      0.87      0.10     69.98
> 11:45:01 PM       all     31.25      0.00      0.95      0.10     67.71
> Average:          all     29.60      0.00      0.88      0.12     69.40
>
> root@foo:/var/log/sa # sar -u -f sa11 -e 06:00:00 -i 900
> Linux 2.6.9-55.0.6.ELsmp (foo.bar.com)  11/11/2007
>
> 12:00:01 AM       CPU     %user     %nice   %system   %iowait     %idle
> 12:15:01 AM       all     29.38      0.00      0.99      0.12     69.52
> 12:30:01 AM       all     29.57      0.00      0.99      0.24     69.20
> 12:45:01 AM       all     27.11      0.00      3.28      5.73     63.88
> 01:00:01 AM       all     33.61      0.00      3.82      5.01     57.55
> 01:15:01 AM       all     31.57      0.00      3.49      5.60     59.35
> 01:30:01 AM       all     27.54      0.00      2.50      4.16     65.80
> 01:45:02 AM       all     25.33      0.00      0.95      0.14     73.59
> 02:00:01 AM       all     24.30      0.00      0.91      0.12     74.67
> 02:15:01 AM       all     25.23      0.00      0.91      0.11     73.75
> 02:30:01 AM       all     25.19      0.00      0.94      0.13     73.74
> 02:45:01 AM       all     25.77      0.00      2.77      4.45     67.01
> 03:00:01 AM       all     26.14      0.00      3.17      5.82     64.87
> 03:15:01 AM       all     25.99      0.00      1.84      2.45     69.72
> 03:30:01 AM       all     25.67      0.00      0.97      0.13     73.23
> 03:45:01 AM       all     24.40      0.00      0.97      0.12     74.51
> 04:00:01 AM       all     25.76      0.00      0.97      0.13     73.14
> 04:15:01 AM       all     31.83      0.01      1.22      0.49     66.44
> 04:30:01 AM       all     27.24      0.00      1.70      0.24     70.82
> 04:45:01 AM       all     26.65      0.00      2.59      4.89     65.87
> 05:00:01 AM       all     27.05      0.00      3.14      5.93     63.88
> 05:15:01 AM       all     26.45      0.00      2.94      5.45     65.16
> 05:30:01 AM       all     25.99      0.00      1.05      0.13     72.83
> 05:45:02 AM       all     23.22      0.00      0.95      0.13     75.70
> Average:          all     27.00      0.00      1.87      2.25     68.88
>
>
> --
> Don Seiler
> http://seilerwerks.wordpress.com
> ultimate: http://www.mufc.us
> --
> //www.freelists.org/webpage/oracle-l
>
>
>


-- 
Cheers,
-- Mark Brinsmead
  Senior DBA,
  The Pythian Group
  http://www.pythian.com/blogs

Other related posts: