Re: direct reads and writes on Solaris

  • From: David Miller <David.J.Miller@xxxxxxx>
  • To: dannorris@xxxxxxxxxxxxx
  • Date: Fri, 25 Jan 2008 11:35:07 -0600

Hi Dan,

A couple of explanations.  First, the main reason to do directio is to get
around the Posix single-writer lock.  This lock is a mechanism that prevents
2 processes from writing to the same file at the same time, mainly to prevent
them from both trying to write the same block and having unpredictable results.

Since Oracle is already handling the coordination of writes, this lock is not
needed.  But the filesystems and OS enforce it automatically.  On larger systems
this can cause contention, since many processes may want to write to the same
Oracle datafile at the same time and will be forced to be single-threaded.

So methods were introduced to get around those semantics.  The 4 that work on
Solaris are using raw devices directly, UFS with directio (either through
filesystemio_options = setall or the forcedirect mount option), VxFS with QIO
or ODM and QFS with samaio.  Note that VxFS with mincache=direct is NOT included
here because it does NOT eliminate the single-writer lock.  You have to have QIO
or ODM with VxFS to avoid the lock.

A second benefit of directio is bypassing the buffer cache which can help on
writes by reducing the code path, although this is not always as big a benefit.

In particular, your test with a single dd, does NOT hit the single-writer lock
and so is not the same as Oracle write performance.  Plus it's sequential and
most of Oracle's I/O in OLTP contexts will be random.  Still what you're seeing
is some interaction with the buffer cache that costs efficiency here.

If you want to see the benefits of directio, you'll need to convert to one
of the 4 filesystem choices mentioned above.

Regards,

Dave Miller

Dan Norris wrote, On 01/24/08 18:10:
Thanks, looks like that confirms my theory below. (Not sure how I didn't find those references myself--sorry.) I then have one related question.

We did some specific testing where we used a crude method to test I/O (specifically, write) performance. The test was this:

timex dd if=/dev/zero of=<device> bs=1024k count=2048

For the <device> we tried many different things. The interesting part (and here's where I'd like some input) is that the results for testing the same device via buffered (block) devices was much, much slower than the result for the unbuffered (char) device. All things equal, here are some sample tests:

/dev/vx/dsk/testdg/test

real 25.12
usr 0.02
sys 24.94

/dev/vx/rdsk/testdg/test

real 10.35
usr 0.01
sys 1.55

So, basically, it took more than 2x as long to do the dd to the buffered device as compared to the unbuffered device. I was sort of expecting that writes to the buffered device would be possibly a little faster or maybe about equal. I never expected to have such a big delta and I also didn't expect that so much system time would be spent just writing to a buffered device.

Any of you I/O gurus see anything interesting in these results? Are the testing methods even valid? My conclusion is that since we're likely doing buffered I/O now (since we're not doing directIO), that if we switched to doing directIO (which is unbuffered by definition), that we'd see considerable performance gain--at least for writes (since my test was only for writes). I would presume that reads might be a similar ratio though.

Dan

----- Original Message ----
From: Ukja.dion <ukja.dion@xxxxxxxxx>
To: dannorris@xxxxxxxxxxxxx; Oracle L <oracle-l@xxxxxxxxxxxxx>
Sent: Thursday, January 24, 2008 5:55:18 PM
Subject: RE: direct reads and writes on Solaris

Visit following URLs

http://www.solarisinternals.com/wiki/index.php/Direct_I/O

http://www.ixora.com.au/notes/filesystemio_options.htm

*From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Dan Norris
*Sent:* Friday, January 25, 2008 7:14 AM
*To:* Oracle L
*Subject:* direct reads and writes on Solaris

Can someone help me interpret this set of data correctly?

The (vxfs) filesystem is mounted with these options:
/db51 on /dev/vx/dsk/oracledg/db18 read/write/setuid/mincache=direct/delaylog/largefiles/ioerror=mwdisable/dev=3ac36c1

This is 9.2.0.8 on Solaris 9 (V490, Generic_122300-07) with VxFS 4.1.

I have the following line in a truss of a dedicated server process:

open("/db51/oradata/tccrt1/member_questions_d01.dbf", O_RDWR|O_DSYNC) = 9

I also have the following settings in the DB:

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE
filesystemio_options                 string      ASYNCH

The question(s):
I was expecting to see O_DIRECT in there somehow, but I'm thinking that maybe that's just on Linux, not Solaris. I don't see O_DIRECT listed in the open(2) manual page. I am also wondering if filesystemio_options needs to be "setall" instead of the current setting of "ASYNCH" in order to achieve directIO. Or, am I looking at the wrong thing to determine if directIO is enabled?

Thanks in advance!

Dan


--
//www.freelists.org/webpage/oracle-l


Other related posts: