Re: DBWR - How Many is Too Many?

  • From: Chris Dunscombe <chris.dunscombe@xxxxxxxxxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Thu, 28 Feb 2008 12:18:21 +0000

Hi,

Just a few ideas etc.

1. I don't know the IBM DS4800 in any detail but you say your using a "128k
element size". Is this the stripe width? If it is then maybe it's a bit small.
I believe ASM uses 1MB for stripe width.

2. Identify a known part of the application e.g. a specific report, that is
performing poorly i.e. worse now with the new SAN than previously. Then trace
it with a 10046 trace level 8. This will give you the actual wait times for a
specific part of the application. You can then see what's taking the time, as
Oracle sees it. Is it buffer busy waits, db file scattered reads etc. This will
help you in finding the real cause. (I know it's SAP but under the covers it's
still an Oracle database no matter what SAP say.) Keep an open mind about the
cause of the problem while you gather good evidence that can lead you to the
real cause.

3. Try and get as much info as possible from the OS and IBM DS4800 about real
physical I/O rates, disk utilisation, I/O queue lengths etc. This isn't easy as
many I/Os are satisfied from the SAN cache. Compare all this with what's
realistic in terms of physical I/O given your 12 15K RPM disks. Remember that
queuing time increases exponentially with utilisation (see Cary's book).

4. Is the data "stripped" over all the disks in such a way that the I/O load is
evenly spread across all disks? If not then you may have 1 or 2 "hot" disks
that are causing the problem.

5. You say your using a SAN. Is this a real SAN or a direct attached disk array?
If the former then you may also need to consider the actual storage network
itself including the switches.

HTH

Cheers,

Chris

Quoting David Barbour <david.barbour1@xxxxxxxxx>:

We recently moved our database to a new SAN.  Performance has just tanked.
Here's the environment:
AIX5.3L
Oracle 9.2.0.7
SAN - IBM DS4800

We've got 8 filesystems for Oracle data files.  Redo, Archive, Undo and Temp
are all on seperate disk/filesystems from the data files.

All the Oracle datafiles are on RAID5 LUNs with 12 15K RPM 73 (68 usable) GB
drives.  SAN Read and Write Caching are both enabled.

A statspack (generally for any given interval - this was for a period of
"light" processing) shows me our biggest hit is:
Buffer wait Statistics for DB: PR1  Instance: PR1  Snaps: 12609 -12615
-> ordered by wait time desc, waits desc

                                Tot Wait    Avg
Class                    Waits   Time (s) Time (ms)
------------------ ----------- ---------- ---------
data block             278,194     20,811        75

sar is scary (just a small portion)

AIX r3prdci1 3 5 00CE0B8A4C00    02/27/08

System configuration: lcpu=8

00:00:00    %usr    %sys    %wio   %idle   physc
02:15:01      19      19      42      19    4.00
02:20:00      21      25      40      14    4.00
02:25:00      19      18      43      20    4.00
02:30:00      18      18      43      21    4.00
02:35:00      20      24      40      16    4.00

We're running JFS2 filesystems with CIO enabled, 128k element size on the
SAN and AIO Servers are set at minservers = 220 and maxservers = 440
We've got 32GB of RAM on the server and 4 CPUs (which are dual core for all
intents and purposes - they show up as eight).  We're running SAP which has
it's own memory requirements.  I've configured my SGA and PGA using
Automatic Memory Management and the SGA currently looks like:
SQL> show sga

Total System Global Area 1.0739E+10 bytes
Fixed Size                   757152 bytes
Variable Size            8589934592 bytes
Database Buffers         2147483648 bytes
Redo Buffers                1323008 bytes

filesystemio_options = setall

I'm thinking the data block waits is the result of  too many modified blocks
in the buffer cache.  Solution would be to increase the number of
db_writer_processes, but we've already got 4.  Metalink, manuals, training
guides, Google, etc.  seem to suggest two answers.

1.  One db writer for each database disk - in our case that would be 8
2.  CPUs/8 adjusted for multiples of CPU groups - in our case that would be
4

Any thoughts?


Chris Dunscombe

Christallize Ltd
www.christallize.com

Tel: 01903-714200
Mob: 07906-941850
Email: chris.dunscombe@xxxxxxxxxxxxxxxx

--
//www.freelists.org/webpage/oracle-l


Other related posts: