Re: DBWR - How Many is Too Many?

  • From: "Finn Jorgensen" <finn.oracledba@xxxxxxxxx>
  • To: david.barbour1@xxxxxxxxx
  • Date: Wed, 27 Feb 2008 21:18:07 -0500

It doesn't make a lot of sense that the SAN setup is the same, but with more
disks and more mount points and slower performance. Definitely sounds like
something is not configured right.

That said, the last time I had WIO that high it was on a Sun/Veritas system
and vmstat showed that spin mutex'es were very high. The problem was the
sync IO on VxFS. The solution was to either set mincache=direct or implement
Quick IO. Both of these are veritas specific solutions, so I'm not sure what
the AIX equivalent would be, but it means the filesystem files are accessed
without involving the filesystem buffer cache (like raw devices).

I would definitely try to gather evidence against the SAN being slow to make
a case with the storage team, but I would also look into the direct IO
thing.

Finn

On Wed, Feb 27, 2008 at 1:53 PM, David Barbour <david.barbour1@xxxxxxxxx>
wrote:

> We recently moved our database to a new SAN.  Performance has just
> tanked.  Here's the environment:
> AIX5.3L
> Oracle 9.2.0.7
> SAN - IBM DS4800
>
> We've got 8 filesystems for Oracle data files.  Redo, Archive, Undo and
> Temp are all on seperate disk/filesystems from the data files.
>
> All the Oracle datafiles are on RAID5 LUNs with 12 15K RPM 73 (68 usable)
> GB drives.  SAN Read and Write Caching are both enabled.
>
> A statspack (generally for any given interval - this was for a period of
> "light" processing) shows me our biggest hit is:
> Buffer wait Statistics for DB: PR1  Instance: PR1  Snaps: 12609 -12615
> -> ordered by wait time desc, waits desc
>
>                                  Tot Wait    Avg
> Class                    Waits   Time (s) Time (ms)
> ------------------ ----------- ---------- ---------
> data block             278,194     20,811        75
>
> sar is scary (just a small portion)
>
> AIX r3prdci1 3 5 00CE0B8A4C00    02/27/08
>
> System configuration: lcpu=8
>
> 00:00:00    %usr    %sys    %wio   %idle   physc
> 02:15:01      19      19      42      19    4.00
> 02:20:00      21      25      40      14    4.00
> 02:25:00      19      18      43      20    4.00
> 02:30:00      18      18      43      21    4.00
> 02:35:00      20      24      40      16    4.00
>
> We're running JFS2 filesystems with CIO enabled, 128k element size on the
> SAN and AIO Servers are set at minservers = 220 and maxservers = 440
> We've got 32GB of RAM on the server and 4 CPUs (which are dual core for
> all intents and purposes - they show up as eight).  We're running SAP which
> has it's own memory requirements.  I've configured my SGA and PGA using
> Automatic Memory Management and the SGA currently looks like:
> SQL> show sga
>
> Total System Global Area 1.0739E+10 bytes
> Fixed Size                   757152 bytes
> Variable Size            8589934592 bytes
> Database Buffers         2147483648 bytes
> Redo Buffers                1323008 bytes
>
> filesystemio_options = setall
>
> I'm thinking the data block waits is the result of  too many modified
> blocks in the buffer cache.  Solution would be to increase the number of
> db_writer_processes, but we've already got 4.  Metalink, manuals, training
> guides, Google, etc.  seem to suggest two answers.
>
> 1.  One db writer for each database disk - in our case that would be 8
> 2.  CPUs/8 adjusted for multiples of CPU groups - in our case that would
> be 4
>
> Any thoughts?
>
>

Other related posts: