Latency of "direct path reads"

  • From: "Hameed, Amir" <Amir.Hameed@xxxxxxxxx>
  • To: "oracle-l@xxxxxxxxxxxxx" <oracle-l@xxxxxxxxxxxxx>
  • Date: Mon, 19 Aug 2013 16:45:27 +0000

Hi Folks,
We migrated our single instance ERP database to a four-node RAC system about a 
week ago. As part of this migration, we also moved from SAN to NAS. Here is a 
high-level architecture of the environment:
Ø  RDBMS and Grid versions.2.0.3 with April PSU

Ø  Each node has two 10Gbe paths to the NAS.

Ø  Database is configured with DNFS.

Ø  Storage vendor is EMC. NAS Data Movers are connected to the EMC VMAX storage.

Ø  Database files are sitting on SSD drives with RAID-5 configuration.

Ø  Redo log files are on RAID-10

Since migrating to RAC, we are seeing high latencies on some batch jobs. Jobs 
that used to run in 45 minutes are now taking a few hours to run. We have a 
copy of production that is setup the same way as production (same disk layout, 
same hardware, etc.) with the only difference that non-production is a 
three-node RAC environment. If I take a simple statement to force a full scan 
in parallel on a large table and run it against production and the 
production-sized test environment, the average wait of "direct path reads" from 
10046 trace is almost three times higher in production than in non-production. 
Here is the statement that I am running:

select /*+ full(MMT) parallel(MMT,8) */ count(*) from 
apps.mtl_material_transactions MMT ;

Production
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.02       0.02          0         31          0           0
Fetch        2    185.89     888.70    7706356    7706545          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    185.91     888.73    7706356    7706576          0           1

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  reliable message                                2        0.00          0.00
  enq: KO - fast object checkpoint                4        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00
  Disk file operations I/O                      117        0.00          0.00
  direct path read                            83856        3.03        465.14
  db file sequential read                        14        0.00          0.02
  SQL*Net message from client                     2        0.00          0.00
********************************************************************************

Non-production
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          2          0           0
Execute      1      0.03       0.03          0         31          0           0
Fetch        2    148.36     224.13    7103122    7103268          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    148.39     224.17    7103122    7103301          0           1

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  enq: KO - fast object checkpoint                6        0.00          0.00
  reliable message                                2        0.00          0.00
  enq: PS - contention                           52        0.00          0.01
  SQL*Net message to client                       2        0.00          0.00
  Disk file operations I/O                      106        0.00          0.00
  direct path read                             3773        1.49         67.53
  SQL*Net message from client                     2      845.16        845.16
********************************************************************************

The raw trace file shows that some IOs in production are taking over a second 
to complete as shown below:

WAIT #18446744071452723008: nam='direct path read' ela= 1109814 file number25 
first dba"2080 block cnt8 obj#r04 tim•0081082913

We have EMC and Oracle engaged and are trying to diagnose the issue. Since, 
this is DNFS and all IOs are going over the network layer, we had our network 
folks look at the network layer and according to them all is clean there, 
however I am not convinced. EMC is reporting that from the NAS side, they are 
seeing an average service response of 2-4ms.

My question is, is there a way to send a large packet from the RAC host side 
and gauge its latency? Ping can work for up to 32k size packet but because 
Oracle tries to do a 1M IO for direct path reads, I am trying to see if there 
is a tool available that can allow me to send larger packets and gauge its 
latency.

Any help will be appreciated.

Thanks,
Amir



--
//www.freelists.org/webpage/oracle-l


Other related posts: