Re: Moving db to linux
- From: Mladen Gogala <mgogala@xxxxxxxxxxxx>
- To: oracle-l@xxxxxxxxxxxxx
- Date: Sat, 28 Feb 2004 13:38:22 -0500
Nuno, here's an excerpt from IBM JFS manual:
*********************************************
File System operations logged by
JFS
The following list of file system operations
changes meta-data of the file system so they
must be logged.
· File creation (create)
· Linking (link)
· Making directory (mkdir)
· Making node (mknod)
· Removing file (unlink)
· Rename (rename)
· Removing directory (rmdir)
· Symbolic link (symlink)
· Set ACL (setacl)
· Writing File (write) (not on normal
conditions)
· Truncating regular file
*******************************************
You are right with logging for metadata only, but not so right
with direct I/O. Most file systems simply ignore request for
open with O_DIRECT, XFS reports an error on Linux (at a subsequent
read/write one gets EINVAL) , but works as advertized on Irix.
Below is a little program that I used to test direct I/O:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <asm/fcntl.h>
#include <errno.h>
#include <string.h>
#define BUFFSIZE 65536
#define ALIGN 4096
main() {
char *buff;
int stat1=0,stat2=0,stat3=0;
int fd1=0,fd2=0;
if (stat3=posix_memalign(&buff,ALIGN,BUFFSIZE)) {
fprintf(stderr,"ALIGN ERR:%s\n",strerror(stat3));
exit(0);
}
fd1=open("xxx", O_RDONLY|O_DIRECT,S_IRWXU);
fd2=open("yyy",O_CREAT|O_WRONLY|O_DIRECT,S_IRWXU);
while(stat1=read(fd1,buff,BUFFSIZE)) {
if (errno) {
fprintf(stderr,"READ ERR:%s\n",strerror(errno));
exit(0);
}
stat2=write(fd2,buff,(unsigned) stat1);
if (errno) {
fprintf(stderr,"WRITE ERR:%s\n",strerror(errno));
exit(0);
}
}
close(fd1);
close(fd2);
}
On 02/28/2004 10:13:12 AM, Nuno Souto wrote:
> ----- Original Message -----
> From: "Mladen Gogala" <mgogala@xxxxxxxxxxxx>
>
> > Journalling for files is a concept similar to redo in the world
> > of oracle.
>
> No, it MOST DEFINITELY is not. Journalled file systems are similar
> to redo ONLY for file system metadata. NOT for the data itself!
>
> > With JFS, you get the process called jfsCommit running,
> > which "commits" buffer operations. Each filehandle operation like
> > "flush" or "close" is a "commit".
>
> So it is in a non-journalled file system. "flush" has existed in
> normal file systems since the year dot and does exactly and precisely that.
> There is also a background process in non-JFS file systems that flushes
> every 30 seconds or so: it's called "sync".
>
> > Basically, journalled FS guarantees
> > that the data written down synchronously will really written down
> > to the disk device(s).
>
> ANY file system guarantees that data written synchronously
> is really written to the disk device.
> Synchronous access is NOT a synonym for journalling.
>
> > If you can do DIO, your data is a little bit
> > safer.
>
> Most file systems can do DIO. It's got nothing to do with
> journalling itself.
>
> >What a journalling FS protects you against is a huge data loss
> > of blocks that were in the buffer cache.
>
> NO WAY! If you do NOT write synchronously in a JFS, you WILL
> lose ANY data blocks in the cache!
>
> And to write synchronously you have to use synchronous I/O,
> DIO or frequent "flushes". Which you can equally do in ANY file
> system, be it journalled or not.
>
> I repeat: Synchronous writing has NOTHING to do with journalling.
>
>
>
> What a JFS really does is to automatically (like it or not) write
> - synchronously - to a journal file, ANY changes to file system METADATA.
> IOW, any changes that involve creation/delete files, allocation of
> disk space or freeing of disk space.
>
> Those and ONLY those are recovered after a system crash, by simply
> reading from the journal file. Instead of inspecting the ENTIRE file
> system looking for broken metadata. Which is what fsck does in a
> non-journalled file system.
>
> With the result (in a JFS) that you do not lose large chunks of a file.
> This is the problem that fsck has with non-journaled file systems:
> sometimes it cannot recover the metadata and it loses track of an entire
> space
> allocation for a file. Which can be a substantial part of the file. This
> happens mostly when files are very volatile or constantly changing in
> allocation.
>
> Which is NOT the case for Oracle datafiles. They are pre-allocated
> and do not often change in size.
>
>
> It's high time this myth of journalled file systems "protecting"
> data is exposed. A run-of-the-mill JFS does NOT protect data blocks inside
> files, it protects ONLY the file system's own meta data! That is certainly
> the case of ext3, JFS, NTFS and many other journalled f/s. Veritas
> is the only JFS I know of that can ALSO protect the data but that is
> an add-on, not a characteristic of JFS.
>
>
>
> Historical note:
> This f/s metadata thing is the major factor why I never lost a benchmark
> against
> Ingres: journalled file systems were unknown back then and Ingres did not
> use the concept of pre-allocated datafiles like Oracle. Their tables were
> stored one table per file, with dynamic space management done by the file
> system itself. With the result that if you specified a benchmark where
> tables
> were dropped/re-created and inserted/deleted from and you pulled the plug
> half
> way through, you'd have a very high probability fsck would NOT recover the
> file system where the Ingres database was.
>
> While Oracle would quietly just rollback the last transaction and keep
> going.
> After the fsck was finished, of course. Remember: no JFS back then! Not
> once
> did I have to use the redo log. Datafiles were pre-allocated and the f/s
> metadata
> never changed, no matter how busy the system was.
>
>
> As well, not ONCE did Ingres survive this little "technique"!
> Cheers
> Nuno Souto
> in sunny Sydney, Australia
> dbvision@xxxxxxxxxxxxxxx
>
> ----------------------------------------------------------------
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> ----------------------------------------------------------------
> To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx
> put 'unsubscribe' in the subject line.
> --
> Archives are at http://www.freelists.org/archives/oracle-l/
> FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
> -----------------------------------------------------------------
>
--
Mladen Gogala
Oracle DBA
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to: oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
- Follow-Ups:
- Re: Moving db to linux
- From: Nuno Souto
- References:
- RE: Moving db to linux
- From: Jesse, Rich
- Re: Moving db to linux
- From: Nuno Souto
- Re: Moving db to linux
- From: Mladen Gogala
- Re: Moving db to linux
- From: Nuno Souto
Other related posts:
- » Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » RE: Moving db to linux
- » RE: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » RE: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- » Re: Moving db to linux
- Re: Moving db to linux
- From: Nuno Souto
- RE: Moving db to linux
- From: Jesse, Rich
- Re: Moving db to linux
- From: Nuno Souto
- Re: Moving db to linux
- From: Mladen Gogala
- Re: Moving db to linux
- From: Nuno Souto