Re: Tuning RMAN backup and recovery

  • From: "Don Seiler" <don@xxxxxxxxx>
  • To: "Mark Brinsmead" <pythianbrinsmead@xxxxxxxxx>
  • Date: Mon, 26 Nov 2007 16:49:19 -0600

Apologies in advance for the top-posted reply.  Just to be clear,
/rman and all other disks are RAID 10.  Due to a lack of free disk, we
had to move /rman onto the same spindles that hold our datafiles.
This happened in September.  The plan is to separate them again early
in 2008, when we have more disk scheduled to be free.  When this
happens, should I be writing to four different paths (e.g. /rman01,
/rman02, /rman03, /rman04)?  As I've already admitted, I don't speak
storage too fluently.

If I might think out loud for a moment just to get the jumble out of my head:

* Each "channel" in RMAN is writing its own backupset piece, which
(can) contain multiple datafiles.
* I want each channel to write to it's own disk (or LUN).  4 channels
means, ideally, four disks.
** Showing my naivete: Should these disks be on their own spindles?
Would doing otherwise completely defeat the purpose?
* RMAN parallelism refers to having multiple channels.

Right now I have 4 channels, all writing to /rman.  Each channel is
writing compressed backupset pieces that contain 4 datafiles.

Mark: our backups did take much less time.  I attributed it (guessing
at the time, of course) to the fact that RMAN also does quality checks
on the data being written, as opposed to putting a tablespace in
BACKUP mode and then gzipping the file to a new dir.

Also one bit of sadder news.  I performed an RMAN duplication the
Friday before last that took over 25 hours.  However I really don't
want to even start any kind of diagnosis until we at least move RMAN
storage to different disk.

Don.

On Nov 25, 2007 9:58 AM, Mark Brinsmead <pythianbrinsmead@xxxxxxxxx> wrote:
> I may be joining this thread a little late, but oh well.  Perhaps I can
> still add something to the discussion.
>
> Just to summarize Don's situation:
>
> -------------------------
> Don is using RMAN to backup a database of about 860GB.  The backups take
> more than 10 hours;  less than 86GB/hr or 2.5 MB/s.
>
> The backup is written to disk in /rman, a Veritas filesystem.
>
> The RMAN backup uses 4 concurrent threads, with compression.
>
> Don is unsure of the underlying disk configuration (RAID-1 vs. RAID-5, how
> many spindles, etc.) but is reasonably sure that /rman shares physical
> spindles with the database.
>
> Don's "sar" statistics show that during the backup, the system is completely
> "busy", spending about 30% of its time in CPU, and 70% waiting on I/O.
> --------------------------
>
> Okay, so it looks pretty clear that these backups are I/O bound.  It is also
> highly likely from what we have been told that there is substantial I/O
> contention.  There are four concurrent backup threads reading from and
> writing to the same set of disks.  This might also be aggravated by the cost
> of software-based RAID-5, but we do not actually know whether this this the
> case.
>
> With 10g, RMAN compression can be either a blessing or a curse.  In this
> case, where we are probably (badly) I/O bound, so the compression is
> probably beneficial.  I think Don has done tests to confirm that, but I'm
> not certain I have seen that in this thread.
>
> Based on what we have seen, I would think that the very best (or at least,
> first) "optimization" we can apply here is to separate the back storage from
> the database storage, on separate sets of spindles.  Do not use RAID-5 for
> the /rman filesystem, except maybe with high end hardware-supported RAID-5
> where sequential writes are recognised and optimised.
>
> Don already plans to re-arrange the /rman storage.  This should be done
> sooner rather than later, I think.
>
>  (Note: there are better reasons for rearranging this storage configuration
> than just performance.  In the event of a storage failure, there is a
> significant risk of losing both the database and the backups.  That would be
> a "bad thing (tm)".)
>
> While I/O contention remains the main limiting factor for backup
> performance, RMAN compression is probably going to be a net benefit;  the
> few disk blocks written by the backup, the fewer the counter-productive disk
> seeks; this leads to less contention and faster throughput.
>
> There is, however, a second potential source of I/O contention -- the
> parallelism of the backup.  In cases where backup parallelism is not well
> matched to the storage configuration, additional parallelism harms
> throughput.
>
> Don, have you tried your backups with fewer parallel threads?  This could be
> a tough thing to balance, but you may find that at least until you separate
> the backup and database storage, the reduced I/O contention might actually
> allow you to do your backups faster...
>
> Back in the 90's, a typcial CPU could "gzip" data with a throughput of
> around 1.0 MB/s.  Current CPUs can do much better.  But your backup threads
> (unless I have botched my arithmetic) are averaging only somewhere around
> 0.6 MB/s.  Ignoring RMAN for the moment, how fast can you gzip a 1 GB file?
> Until your backups are achieving at least four times that rate, you can
> probably assume they are I/O bound.
>
> Anyway, these are a few thoughts on your situation;  I hope they are not too
> random or disjointed.  I hope even more that they are helpful.  :-)
>
> I think someone earlier in this thread asked about methods to optimized
> disk-based backups.  Aside from the observations offered above, I have only
> come across one really reliable way of doing this -- buy a tape drive!  :-)
> There are very affordable tape drives out there that are capable of
> sustaining throughputs well in excess of 100MB/s.  That's 360 GB/hr.  In
> this particular situation, a $5000 tape drive could completely transform
> your backups.  Your only challenge then will be to find a way to keep the
> tape drive "fed" -- it is common for tape-based backups to suffer
> performance-wise when data cannot be delivered as fast at the tape drive can
> take it.
>
> But that is a different discussion, perhaps for a different day...
>
>
>
>
> On Nov 16, 2007 3:22 PM, Don Seiler <don@xxxxxxxxx> wrote:
> >
> >
> >
> > Here's the "sar -u" output from Saturday night and Sunday morning of
> > this past weekend when the level 0 database backup was running.  I'm
> > not sure if you're interested in the -d output, or if you'd rather see
> > iostat output.
> >
> > root@foo:/var/log/sa # sar -u -f sa10 -s 22:30:00 -i 900
> > Linux 2.6.9-55.0.6.ELsmp (foo.bar.com)  11/10/2007
> >
> > 10:30:01 PM       CPU     %user     %nice   %system   %iowait     %idle
> > 10:45:01 PM       all     29.56      0.00      0.90      0.16     69.37
> > 11:00:01 PM       all     28.41      0.00      0.80      0.11     70.68
> > 11:15:01 PM       all     29.75      0.00      0.89      0.11     69.25
> > 11:30:01 PM       all     29.04      0.00      0.87      0.10     69.98
> > 11:45:01 PM       all     31.25      0.00      0.95      0.10     67.71
> > Average:          all     29.60      0.00      0.88      0.12     69.40
> >
> > root@foo:/var/log/sa # sar -u -f sa11 -e 06:00:00 -i 900
> > Linux 2.6.9-55.0.6.ELsmp (foo.bar.com)  11/11/2007
> >
> > 12:00:01 AM       CPU     %user     %nice   %system   %iowait     %idle
> > 12:15:01 AM       all     29.38      0.00      0.99      0.12     69.52
> > 12:30:01 AM       all     29.57      0.00      0.99      0.24     69.20
> > 12:45:01 AM       all     27.11      0.00      3.28      5.73     63.88
> > 01:00:01 AM       all     33.61      0.00      3.82      5.01     57.55
> > 01:15:01 AM       all     31.57      0.00      3.49      5.60     59.35
> > 01:30:01 AM       all     27.54      0.00      2.50      4.16     65.80
> > 01:45:02 AM       all     25.33      0.00      0.95      0.14     73.59
> > 02:00:01 AM       all     24.30      0.00      0.91      0.12     74.67
> > 02:15:01 AM       all     25.23      0.00      0.91      0.11     73.75
> > 02:30:01 AM       all     25.19      0.00      0.94      0.13     73.74
> > 02:45:01 AM       all     25.77      0.00      2.77      4.45     67.01
> > 03:00:01 AM       all     26.14      0.00      3.17      5.82     64.87
> > 03:15:01 AM       all     25.99      0.00      1.84      2.45     69.72
> > 03:30:01 AM       all     25.67      0.00      0.97      0.13     73.23
> > 03:45:01 AM       all     24.40      0.00      0.97      0.12     74.51
> > 04:00:01 AM       all     25.76      0.00      0.97      0.13     73.14
> > 04:15:01 AM       all     31.83      0.01      1.22      0.49     66.44
> > 04:30:01 AM       all     27.24      0.00      1.70      0.24     70.82
> > 04:45:01 AM       all     26.65      0.00      2.59      4.89     65.87
> > 05:00:01 AM       all     27.05      0.00      3.14      5.93     63.88
> > 05:15:01 AM       all     26.45      0.00      2.94      5.45     65.16
> > 05:30:01 AM       all     25.99      0.00      1.05      0.13     72.83
> > 05:45:02 AM       all     23.22      0.00      0.95      0.13     75.70
> > Average:          all     27.00      0.00      1.87      2.25     68.88
> >
> >
> >
> >
> >
> >
> > --
> > Don Seiler
> > http://seilerwerks.wordpress.com
> > ultimate: http://www.mufc.us
> > --
> >
> >
> >
> >
> > //www.freelists.org/webpage/oracle-l
> >
> >
> >
>
>
>
> --
> Cheers,
> -- Mark Brinsmead
>    Senior DBA,
>    The Pythian Group
>
>
>    http://www.pythian.com/blogs



-- 
Don Seiler
http://seilerwerks.wordpress.com
ultimate: http://www.mufc.us
--
//www.freelists.org/webpage/oracle-l


Other related posts: