Re: Moving db to linux

  • From: "Nuno Souto" <dbvision@xxxxxxxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Sun, 29 Feb 2004 02:13:12 +1100

----- Original Message ----- 
From: "Mladen Gogala" <mgogala@xxxxxxxxxxxx>

> Journalling for files is a concept similar to redo in the world
> of oracle.

No, it MOST DEFINITELY is not.  Journalled file systems are similar
to redo ONLY for file system metadata.  NOT for the data itself!

> With JFS, you get the process called jfsCommit running,
> which "commits" buffer operations. Each filehandle operation like
> "flush" or "close" is a "commit".

So it is in a non-journalled file system.  "flush" has existed in
normal file systems since the year dot and does exactly and precisely that.
There is also a background process in non-JFS file systems that flushes
every 30 seconds or so: it's called "sync".

> Basically, journalled FS guarantees
> that the data written down synchronously will really written down
> to the disk device(s).

ANY file system guarantees that data written synchronously
is really written to the disk device.
Synchronous access is NOT a synonym for journalling.

> If you can do DIO, your data is a little bit
> safer.

Most file systems can do DIO.  It's got nothing to do with
journalling itself.

>What a journalling FS protects you against is a huge data loss
> of blocks that were in the buffer cache.

NO WAY! If you do NOT write synchronously in a JFS, you WILL
lose ANY data blocks in the cache!

And to write synchronously you have to use synchronous I/O,
DIO or frequent "flushes".  Which you can equally do in ANY file
system, be it journalled or not.

I repeat: Synchronous writing has NOTHING to do with journalling.



What a JFS really does is to automatically (like it or not) write
- synchronously - to a journal file, ANY changes to file system METADATA.
IOW, any changes that involve creation/delete files, allocation of
disk space or freeing of disk space.

Those and ONLY those are recovered after a system crash, by simply
reading from the journal file. Instead of inspecting the ENTIRE file
system looking for broken metadata.  Which is what fsck does in a
non-journalled file system.

With the result (in a JFS) that you do not lose large chunks of a file.
This is the problem that fsck has with non-journaled file systems:
sometimes it cannot recover the metadata and it loses track of an entire
space
allocation for a file.  Which can be a substantial part of the file.  This
happens mostly when files are very volatile or constantly changing in
allocation.

Which is NOT the case for Oracle datafiles.  They are pre-allocated
and do not often change in size.


It's high time this myth of journalled file systems "protecting"
data is exposed.  A run-of-the-mill JFS does NOT protect data blocks inside
files, it protects ONLY the file system's own meta data!  That is certainly
the case of ext3, JFS, NTFS and many other journalled f/s.  Veritas
is the only JFS I know of that can ALSO protect the data but that is
an add-on, not a characteristic of JFS.



Historical note:
This f/s metadata thing is the major factor why I never lost a benchmark
against
Ingres: journalled file systems were unknown back then and Ingres did not
use the concept of pre-allocated datafiles like Oracle.  Their tables were
stored one table per file, with dynamic space management done by the file
system itself.  With the result that if you specified a benchmark where
tables
were dropped/re-created and inserted/deleted from and you pulled the plug
half
way through, you'd have a very high probability fsck would NOT recover the
file system where the Ingres database was.

While Oracle would quietly just rollback the last transaction and keep
going.
After the fsck was finished, of course.  Remember: no JFS back then!  Not
once
did I have to use the redo log.  Datafiles were pre-allocated and the f/s
metadata
never changed, no matter how busy the system was.


As well, not ONCE did Ingres survive this little "technique"!
Cheers
Nuno Souto
in sunny Sydney, Australia
dbvision@xxxxxxxxxxxxxxx

----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request@xxxxxxxxxxxxx
put 'unsubscribe' in the subject line.
--
Archives are at //www.freelists.org/archives/oracle-l/
FAQ is at //www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Other related posts: