Re: are redo records always flushed in order?

by introducing batch nowait oracle has left the tx durability in users' hands. i believe physically db will still be consistent. (engineers at oracle ain't that bad. ;o) ) but logically the data could be corrupted.

tx1 tx2 below belong to different sessions. that's the point, if some sessions are running in batch nowait, some immediate wait, and have data overlapping/dependence, any chances for data corruption?

coming to redo write and physical write, many platform can handle 1MB per physical write, if the db is on raw, so is enough. but if db is on file system, it could be as small as 8K per write and is subject to tuning. (didn't use lgwr but did test i/o size using simple os cmds) but today as our storage specialist pointed out, os/storage have rollback too! they should be able to make sure 1 big write request from lgwr is either accomplished or failed -- cleanup then. so could probably put my assumption / worry about corruption from partially flushed redo write to rest.

p.s. it wasn't our choice to join, and we're still getting lost on the campus. i'll come back once i have something (that's not confidential) ;o) p.p.s. thanks for pointing to the very interesting thread. exactly why i chose to post my Qs here.

-jessica


Jeremy Paul Schneider wrote, On 4/26/2007 6:22 AM:
FWIW here's a good discussion of private redo strands (aka zero-copy redo): http://www.freelists.org/archives/oracle-l/02-2005/threads.html#00630 - The thread is called "latch-free SCN scheme ( 10.1.0.3 <http://10.1.0.3>)

On 4/26/07, *Jeremy Paul Schneider* <jeremy.schneider@xxxxxxxxxxxxxx <mailto:jeremy.schneider@xxxxxxxxxxxxxx>> wrote:

    Yeah...  I wasn't thinking about nowait or private strands...  and
    I don't (yet) know a lot about the specifics of how these work
    internally.  Also, in addition to private strands which were
    apparently introduced in 10g there's also log parallelism which
    was introduced in 9i allowing multiple processes to write to
    different areas of the main redo buffer simultaneously.
I don't know what the implications are of this; but as I said
    before I have a hunch that this has already been carefully worked
    through by the engineers at Oracle - considering the fanfare with
    the release of COMMIT NOWAIT and considering the importance of
    crash recoverability in Oracle.
A few other thoughts - based on my understanding of redo and crash
    recovery my guess is the opposite of yours - that in your example
    using COMMIT NOWAIT *any* records whose COMMIT made it into the
    redo log will not be rolled back.  But another thought - from what
    I can gather (based on reading a few old oracle-l emails,
    presentations, and my own guesses) - private redo strands and
    individual buffer latches (when using parallelism) are allocated
    per-process; so assuming that TX1 and TX2 are happening in the
    same session, I think that their log entries would probably be
    written out in order to the logfiles even if private strands or
    parallelism were enabled.  But that's just conjecture on my part.
Hmmm... maybe you could make a test tablespace and a test table
    with a few rows and one row per block, then put the tablespace in
    backup mode and spawn a few processes that update the table.  Then
    strace (or truss on sun) the LGWR process and see if the writes
    are sequential and how big the writes are...  also it's worth
    pointing out that even if we're issuing 1MB writes to the OS we'd
    still want to ensure that that OS is writing that data in order
    (if the device itself doesn't support 1MB writes).  I think it
    does but I can't prove this either at the moment.
-Jeremy PS - considering the domain name of your email address, if this is
    such a critical question for your "bosses" then is there any way
    they can make an inquiry to some of the engineers who actually
    work on this stuff?
PPS - maybe someone who's got a lot more experience than I will
    add their thoughts...  then I could learn a bit more about this
    too.  :)

--
http://www.freelists.org/webpage/oracle-l


Other related posts: