Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time

Peter Grandi pg_ext3 at ext3.for.sabi.co.UK
Mon Mar 28 16:43:20 UTC 2011

[ ... ]

>> When executing an fsync(), in data=ordered mode you have to
>> write the data data blocks into the journal and wait for the
>> data blocks to be written.  This requires generally will
>> require extra seeks.  In data=journaled mode, the data blocks
>> can be written directly into the sjoujournal without needing
>> to seek.

>> Of course eventually the data and metadata blocks will need
>> to be written to their permanent locations before the journal
>> space can be reused.  But for short bursty write patterns,
>> the fsync() latency will be much smaller in data=journal
>> mode.

>  [ ... ]

> In this case, if we conduct the experiment in data=journal
> mode and data=ordered mode respectively,

That experiment is not necessarily demonstrative, it depends on
RAM caching, elevator, ...

> since write latency is much smaller in data=journal mode,

Write latency is actually much longer: because it requires *two*
writes instead of one. It is *fsync* latency as mentioned above
that is smaller, because it depends only on the first write to
what is in effect a small log based filesystem. This distinction
matters a great deal, because it is the reason why "short bursty
write patterns" is the qualification above. For long write
patterns things are very different as the journal eventually
fills up. For any given size it will also fill up a lot faster
for 'data=journal'.

Ahhh while writing that I have just realized that large journals
can be a bad idea especially for metadata operations. Will have
to think more about that.

> the disk will focus more on the read operation, hence, the
> read operation will also finish earlier than it do in the
> data=ordered mode. Am I understanding correctly?

That again depends on a lot of things, including caching, the
elevator, flusher behaviour, exactly where the files are...

ALso, whether the journal is on the same drive as the filesystem
or another drive can matter enormously; also whether for example
the journal is on SSD or battery backed RAM. There are reasons
why 'ext2' still quite outperforms 'ext3' on simple tests.

More information about the Ext3-users mailing list