OT - Journaling File Systems?

Fri Jul 2 17:15:07 UTC 2004

On Fri, Jul 02, 2004 at 11:22:20AM -0500, Edwards, Scott (MED, Kelly IT Resouces) wrote:
> The ext3 have almost a perfect record with the write cache off:  I have
> run over 300 cycles on the two drives and only had two corrupted lines
> in the output files.  So out of 600 total cycles on the two drives there
> were only two lines with bad data, I think that is a pretty good record.
> 
> None of the other journaling file systems have come anywhere near this
> performance.  After 3 or 4 power cycles, ReiserFS became corrupted to
> the point that the system would not boot up (the fsck failed and the
> bootup stopped there).  XFS never got corrupted to the point it wouldn't
> boot, but with approximately 100 power cycles on each drive, one drive
> had 73 corrupted lines and the other had 82.  With JFS after 15 power
> cycles one of the drives was corrupted and the system would no longer
> boot up (fsck failed again).

You need to distinguish meta-data consistency from file data consistency.
Aside from Ext3, the other journaling filesystems usually only guarantee
meta-data consistency.  (Reiserfs just got data journaling with
ChangeSet 1.1804, 2004/06/18 07:55:25-07:00.)  Corrupted files are
expected with non-Ext3 filesystems.  Though if fsck fails
on those filesystems, that indicates a meta-data consistency problem.

Here is a comment that I wrote a long time ago in reply to a comparison
of Reiserfs to Ext3.

- Ext3 has three journaling modes:

    data=writeback  Journals meta-data only.  This is traditionally
                    the (only) form of journaling provided by the other
                    filesystems.  It is most appropriate for databases
                    and other applications which assure data integrity
                    with their own mechanisms (using fsync(), etc.).
                    This mode contains a security hole, though, because a
                    file can be extended before the blocks at the end of
                    the file are committed, exposing whatever the contents
                    of the uninitialized blocks are, e.g., the previous
                    version of /etc/shadow, after an unscheduled shutdown.

    data=ordered    This is the default mode.  In this mode, Ext3
                    guarantees that data blocks at the end of a file
                    are written before the new file length is committed.
                    This eliminates the security hole, and also provides
                    the guarantees of data journaling for files that
                    are written sequentially, i.e., the file may
                    be truncated, but won't contain random garbage.
                    (And as you are no doubt aware, the *vast* majority
                    of files for non-database applications are written
                    sequentially).  Since data is written only once in
                    this mode, it can provide a substantial speedup
                    over full data journaling with an internal journal,
                    but the write ordering requirements interfere somewhat
		    with sorting and merging of the write requests.

    data=journal    This mode provides full data journaling.  Since data
                    is written to both the journal and its final place
                    in the filesystem, double the disk bandwidth is
                    consumed.  It can, however, improve the latency of
                    synchronous writes, as the write can be acknowledged
                    as soon as the blocks hit the (sequential) journal,
                    while the blocks are written back to their final
                    location asynchronously.  With an external journal
                    on a separate spindle or in NVRAM, seeking can be
		    avoided, and write speed is limited by the speed
		    of sequential writes to the journal, while preserving
                    the desirable low-latency.

Since strictly-conforming NFS (and potentially other network file
systems) require synchronous data writes, the ability of Ext3 to journal
data while providing low-latency write acknowledgements makes it a
natural choice among Linux journaled filesystems for this task.

Regards,

	Bill Rugolsky