Re: RMAN impact
- From: "Steve Perry" <sperry@xxxxxxxxxxx>
- To: "Mladen Gogala" <gogala@xxxxxxxxxxxxx>
- Date: Fri, 10 Mar 2006 06:32:24 -0600
Feel free to berate, it's not my doing :)
I wasn't saying that it was a good configuration. I didn't set it up and I'm
only watching from the sidelines. I said a lot of what you did, but some
people know better. I didn't want to get into why it's configured this way
or why it hasn't been changed. In fact, "they" won't allow any changes.
It would take too long to explain it all, but you'd have a good laugh :)
the box is a 4-way Dell 6850. side note: It has 2-2 ghz HBAs coming in, but
can only do 1GB to the storage node.
What is interesting to me is that this RMAN backup has this kind of impact.
at my last job, I had BCVs attached a single windows backup host (HP DL580)
and ran 3 concurrent backups (2 TB, 1TB and 250GB) and it ran/responded
fine. They completed in a little over 4 hours. They were file backups (we
did hot backups and split the BCVs).
The backup host had HBAs attached to 2 SANs (1 was tape and the other was
the data). The RMAN backup has data coming off the SAN and being written
over the network.
To sum up what you said "You get what you pay for" and I totally agree.
----- Original Message -----
From: "Mladen Gogala" <gogala@xxxxxxxxxxxxx>
Cc: <litanli@xxxxxxxxx>; <Oracle-L@xxxxxxxxxxxxx>
Sent: Thursday, March 09, 2006 10:45 PM
Subject: Re: RMAN impact
On 03/09/2006 09:14:21 PM, Steve Perry wrote:
we have a rman backup for 500 GB 9.2 RAC on RHEL3 (3 LTO II drives) that
takes 3 hours to complete, but kills the node it runs on.
io wait % goes to 60%, cpu is low. the server is pretty much unresponsive
until it completes.
We allocate 4 channels and are not using the large_pool or tape slaves.
We tried the same thing but used Disk (seperate LUN from the database) and
the server crashed.
Steve, it is not my intention to berate you or anything of the sort, but
is some weird stuff. PC equipment, even the one with the SMP motherboards is
not made for high volume I/O. Now you are discovering the difference between
PC and mini-computers like HP 9000/rp4400 or IBM p520: those can do massive
amounts of I/O while not even the best Dell PC can do that. The problem is
the fact that PC buss, even with the best SMP motherboard, doesn't have
capacity to allow simultaneous traffic between multiple CPU boards,
devices and memory. Disks normally use DMA and deposit the result of I/O
directly into the memory. Disks also notify CPU that I/O is done by sending
interrupts which must be handled. With 4 channels, you have 4 active RMAN
processes, each performing reads from your disks, depositing the result into
memory and notifying any available CPU that it has completed I/O. It
communicates with Oracle processes,sends data to and from network, which
results in some more interrupts and DMA traffic between NIC and memory and
your system bus is saturated. System is unresponsive because simple
like pressing enter, must wait to be handled. Steve, there is a reason why
equipment is so much cheaper: it cannot do massive amounts of I/O. What you
for when you buy p595 is a massive backplane which can sustain almost a
TB/second and will allow your system to operate normally, even if you are
writing 300MB/sec. The secret is in the fact that fiber channel adapter for
p595 is attached to memory and not the central system bus. When you issue an
I/O request, you deposit IORB on one location in the memory where smart I/O
controller reads it, executes and deposits results into memory. It then
a single interrupt saying that it's done. Central system bus, the one used
carry data between CPU and RAM isn't used at all. Drivers for that kind of
equipment are standard on AIX or HP-UX and require some work on Linux,
they're known as I2O. Also, architecturally speaking, those machines are
more balanced. When you have 4 screaming 3GHZ Intels inside, it is tough to
feed them with memory. The fastest available memories are 30-50ns. That
that CPU can ask memory for more data less then 40 million times per second.
Translated into megahertz, it corresponds to bus frequency of 40MHZ, needed
course, to feed each of the processors. Unless we can feed them faster, our
screaming 3GHZ chip is useless as it, like the angel in the movie
"Barbarella", has no memory to work on.
The situation can be improved by large L2 and L1 caches as well as TLB
buffers. The efficiency of those is severely impacted by things like long
across the address space and context switches like, for instance, ones
by interrupts or normal multiprocess work. Add, on that same bus,
lines for cache synchronization, which are on a separate bus on the proper
minis, needed to keep caches coherent. Those lines must exist between each
and L2 cache, so that it doesn't happen that one address has one value in
cache and another one in another. Now, add peripheral devices: video
disk controllers and NIC. Your central bus has frequency of 233MHZ and is 64
wires wide ("64 bits"). When you calculate the maximum speed, it gives you
megabytes/second for EVERYTHING. That is theoretical data transfer speed.
real one, due to retransmits and synchronizations between devices (Which
should transmit first? This is resolved by so called "bus arbitration", as
as the question which CPU should handle an interrupt) is significantly
only around 1.2 GB/sec. That is called "sustained data rate".
The massive amount of I/O that you are trying to make your poor PC perform
simply consume the central bus and nothing will work. In addition to that,
interrupt handlers and software components start to detect timeouts, your
machine will think that there is something wrong with the motherboard and
crash. Having only two channels would probably finish sooner and with less
problems then having 4 channels. Machine doesn't crash with the tape as
are slower and cannot do I/O fast enough to endanger the central bus.
disk drives are just about fast enough to crash the system. Detecting
like this is precisely the purpose of benchmarking and testing before you
machine. For things like that you should use a proper mini. Do you know why
they call them Infinitely Boring Machines? Nothing ever happens. They don't
crash, they don't go down, they just quietly work and don't provide any
excitement or adventure in your life.
Other related posts: