[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Ordered Mode vs Journaled Mode

On Tue, Oct 02, 2001 at 04:59:32PM -0600, Andreas Dilger wrote:
> On Oct 02, 2001  14:55 -0700, Mike Fedyk wrote:
> > I've been wondering exactly what you gain by using journaled mode over
> > ordered mode.
> > 
> > Are there any known cases where journaled mode could recover where ordered
> > mode wouldn't?
> I don't think it's as much an issue of recoverability (from the fs metadata
> point of view) as it is for data integrity.
> With ordered or writeback mode, it is possible that you are in the middle
> of writing data into your file when you get a crash.  What is in the file?
> Half new data and half old data.  

SCT, has described in a recent LKML thread that during a truncate-write
transaction that the deleted blocks won't be written to until the
transaction has completed.  This seems to imply that the previous data would
be recoverable if the transaction didn't complete.

I replied on LKML for a clarification, but haven't received one yet...  Is
this true?

>If you have journaled data mode, then
> either the data made it into the journal along with the metadata (in which
> case it ALL makes it into the file) or the transaction is incomplete (in
> which case none of the data is written to your file).


> While this isn't 100% true (i.e. if you have very large writes they will
> be split into multiple transactions) it is mostly true.

Hmm, this may be a candidate for the ordered mode:
Keep track of transactions that are for the same file and make sure that the
deleted blocks aren't used until the entire set of transactions have

Hmm, the data could even be flushed out of the journal in *journaled* mode
leaving the meta-data in the journal for the related transactions while
still keeping the previous blocks from being overwritten...

What do you think?

> You are also protected from disk hardware problems, where they write
> garbage to the disk in a powerfail situation.  If garbage goes to disk
> either it is in the journal (where it is discarded) or it will be written
> to the filesystem again at journal recovery time.

Or the garbage is written again?

What happens if the drive decides to write garbage to completed transaction
areas?  What about surrounding blocks?

> Finally, data journalling is _way_ faster if you are doing synchronous I/O
> like for a mail server.  Both the data and metadata are written in one
> pass to the journal, so no seeking, and you can return control to the
> application knowing the data is safe.  It can then re-order writes to the
> fs safely, since it will be recovered in case of a crash.

How long do you get before it has to start seeking? (that may be what the
flush time means...)  If so, you could end up with a journal flush happening
just as a new email has come in...  Hopefully the VM would let the flush
complete before writing to the journal again...  But then again, the
elevator doesn't seem to work very good at balancing multiple requests...

> Obviously, data journaling adds a lot more overhead to a system, so it
> isn't for everyone.  In most cases, however, you don't use your disks at
> full speed anyways, so there is little real impact.

It would be nice to do what software raid does during a resync:
Slowly write data out to the FS from the journal during idle, and not
waiting until the journal fills up to flush it...


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]