Hi, Just a few ideas etc. 1. I don't know the IBM DS4800 in any detail but you say your using a "128k element size". Is this the stripe width? If it is then maybe it's a bit small. I believe ASM uses 1MB for stripe width. 2. Identify a known part of the application e.g. a specific report, that is performing poorly i.e. worse now with the new SAN than previously. Then trace it with a 10046 trace level 8. This will give you the actual wait times for a specific part of the application. You can then see what's taking the time, asOracle sees it. Is it buffer busy waits, db file scattered reads etc. This will
help you in finding the real cause. (I know it's SAP but under the covers it's still an Oracle database no matter what SAP say.) Keep an open mind about the cause of the problem while you gather good evidence that can lead you to the real cause. 3. Try and get as much info as possible from the OS and IBM DS4800 about realphysical I/O rates, disk utilisation, I/O queue lengths etc. This isn't easy as
many I/Os are satisfied from the SAN cache. Compare all this with what's realistic in terms of physical I/O given your 12 15K RPM disks. Remember that queuing time increases exponentially with utilisation (see Cary's book).4. Is the data "stripped" over all the disks in such a way that the I/O load is
evenly spread across all disks? If not then you may have 1 or 2 "hot" disks that are causing the problem.5. You say your using a SAN. Is this a real SAN or a direct attached disk array?
If the former then you may also need to consider the actual storage network itself including the switches. HTH Cheers, Chris Quoting David Barbour <david.barbour1@xxxxxxxxx>:
We recently moved our database to a new SAN. Performance has just tanked. Here's the environment: AIX5.3L Oracle 220.127.116.11 SAN - IBM DS4800 We've got 8 filesystems for Oracle data files. Redo, Archive, Undo and Temp are all on seperate disk/filesystems from the data files. All the Oracle datafiles are on RAID5 LUNs with 12 15K RPM 73 (68 usable) GB drives. SAN Read and Write Caching are both enabled. A statspack (generally for any given interval - this was for a period of "light" processing) shows me our biggest hit is: Buffer wait Statistics for DB: PR1 Instance: PR1 Snaps: 12609 -12615 -> ordered by wait time desc, waits desc Tot Wait Avg Class Waits Time (s) Time (ms) ------------------ ----------- ---------- --------- data block 278,194 20,811 75 sar is scary (just a small portion) AIX r3prdci1 3 5 00CE0B8A4C00 02/27/08 System configuration: lcpu=8 00:00:00 %usr %sys %wio %idle physc 02:15:01 19 19 42 19 4.00 02:20:00 21 25 40 14 4.00 02:25:00 19 18 43 20 4.00 02:30:00 18 18 43 21 4.00 02:35:00 20 24 40 16 4.00 We're running JFS2 filesystems with CIO enabled, 128k element size on the SAN and AIO Servers are set at minservers = 220 and maxservers = 440 We've got 32GB of RAM on the server and 4 CPUs (which are dual core for all intents and purposes - they show up as eight). We're running SAP which has it's own memory requirements. I've configured my SGA and PGA using Automatic Memory Management and the SGA currently looks like: SQL> show sga Total System Global Area 1.0739E+10 bytes Fixed Size 757152 bytes Variable Size 8589934592 bytes Database Buffers 2147483648 bytes Redo Buffers 1323008 bytes filesystemio_options = setall I'm thinking the data block waits is the result of too many modified blocks in the buffer cache. Solution would be to increase the number of db_writer_processes, but we've already got 4. Metalink, manuals, training guides, Google, etc. seem to suggest two answers. 1. One db writer for each database disk - in our case that would be 8 2. CPUs/8 adjusted for multiples of CPU groups - in our case that would be 4 Any thoughts?
Chris Dunscombe Christallize Ltd www.christallize.com Tel: 01903-714200 Mob: 07906-941850 Email: chris.dunscombe@xxxxxxxxxxxxxxxx -- http://www.freelists.org/webpage/oracle-l