OT - Journaling File Systems?
Bill Rugolsky Jr.
brugolsky at telemetry-investments.com
Fri Jul 2 17:15:07 UTC 2004
On Fri, Jul 02, 2004 at 11:22:20AM -0500, Edwards, Scott (MED, Kelly IT Resouces) wrote:
> The ext3 have almost a perfect record with the write cache off: I have
> run over 300 cycles on the two drives and only had two corrupted lines
> in the output files. So out of 600 total cycles on the two drives there
> were only two lines with bad data, I think that is a pretty good record.
> None of the other journaling file systems have come anywhere near this
> performance. After 3 or 4 power cycles, ReiserFS became corrupted to
> the point that the system would not boot up (the fsck failed and the
> bootup stopped there). XFS never got corrupted to the point it wouldn't
> boot, but with approximately 100 power cycles on each drive, one drive
> had 73 corrupted lines and the other had 82. With JFS after 15 power
> cycles one of the drives was corrupted and the system would no longer
> boot up (fsck failed again).
You need to distinguish meta-data consistency from file data consistency.
Aside from Ext3, the other journaling filesystems usually only guarantee
meta-data consistency. (Reiserfs just got data journaling with
ChangeSet 1.1804, 2004/06/18 07:55:25-07:00.) Corrupted files are
expected with non-Ext3 filesystems. Though if fsck fails
on those filesystems, that indicates a meta-data consistency problem.
Here is a comment that I wrote a long time ago in reply to a comparison
of Reiserfs to Ext3.
- Ext3 has three journaling modes:
data=writeback Journals meta-data only. This is traditionally
the (only) form of journaling provided by the other
filesystems. It is most appropriate for databases
and other applications which assure data integrity
with their own mechanisms (using fsync(), etc.).
This mode contains a security hole, though, because a
file can be extended before the blocks at the end of
the file are committed, exposing whatever the contents
of the uninitialized blocks are, e.g., the previous
version of /etc/shadow, after an unscheduled shutdown.
data=ordered This is the default mode. In this mode, Ext3
guarantees that data blocks at the end of a file
are written before the new file length is committed.
This eliminates the security hole, and also provides
the guarantees of data journaling for files that
are written sequentially, i.e., the file may
be truncated, but won't contain random garbage.
(And as you are no doubt aware, the *vast* majority
of files for non-database applications are written
sequentially). Since data is written only once in
this mode, it can provide a substantial speedup
over full data journaling with an internal journal,
but the write ordering requirements interfere somewhat
with sorting and merging of the write requests.
data=journal This mode provides full data journaling. Since data
is written to both the journal and its final place
in the filesystem, double the disk bandwidth is
consumed. It can, however, improve the latency of
synchronous writes, as the write can be acknowledged
as soon as the blocks hit the (sequential) journal,
while the blocks are written back to their final
location asynchronously. With an external journal
on a separate spindle or in NVRAM, seeking can be
avoided, and write speed is limited by the speed
of sequential writes to the journal, while preserving
the desirable low-latency.
Since strictly-conforming NFS (and potentially other network file
systems) require synchronous data writes, the ability of Ext3 to journal
data while providing low-latency write acknowledgements makes it a
natural choice among Linux journaled filesystems for this task.
More information about the fedora-test-list