Re: Moving db to linux

----- Original Message ----- 
From: "Mladen Gogala" <mgogala@xxxxxxxxxxxx>

> I get EINVAL for the next I/O operation. Exactly the same thing happens
> to Oracle 10 when I set the DIRECTIO flag. I did strace of CKPT
> process, and that was precisely what I saw. I chose CKPT because

Bugger!


> > The problem is with your test case.  The buffer passed to read and write
> > must be aligned to the file system's block size.  From the open(2) man
> > page:
> >
> > "Transfer sizes, and the alignment of user buffer and file offset must
> > all be multiples of the logical block size of the file system."
> >
> > Attached is a modified version of the test case which succeeds on the
> > 2.4.25 kernel.
>
> The modified version is the one that you saw earlier.

Very much so.  Thanks for sharing that.  I'll bet someone at Oracle
forgot about the alignment thing when compiling the Linux kernel.
It can be done either via a compiler option or through runtime options
to malloc.  Either they forgot, or they used a fixed one that you'll
have to guess.  Try 1/2/4/8K for your f/s blocksize and Oracle DIO might
(just) work fine with one of them...


Why is it that DIO has to go to buffers aligned in memory to the
f/s block size?  Because the DIO is done using the disk controller's
DMA memory access DIRECTLY to the buffer cache.  Not to an interim
fixed buffer that then gets copied somewhere else.

And I/O controllers do not have the same number of address lines as
normal memory.  For example, they NEVER need to address anything less
than 512 bytes (2**8) because no disk currently exists that can read or
write less than that!  And their size counter works in increments of
512 for the same reason.

So, when a disk controller wants to directly write to say, a 4K
buffer in memory, it doesn't mean it wants a 4K size buffer
starting ANYWHERE in memory.  It does instead want a 4K buffer
STARTING at a given 4K memory boundary.  It can't address anything
less precise than that for that size of I/O.  There is more to this
than I can fit here.  A reading of the internals info on Seagate's
and a few controller maker's sites is very educational, for those
who want to be bothered with this level of detail.

DIO is low level IO and that means a compromise with the hardware
characteristics.  It won't always work like a simple s/w option.

Cheers
Nuno Souto
in sunny Sydney, Australia
dbvision@xxxxxxxxxxxxxxx

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: