Write ordering in Ext4

Andreas Dilger adilger at dilger.ca
Mon Jun 3 14:47:38 UTC 2013


On 2013-06-02, at 23:33, "Arul Selvan" <Rarul at novell.com> wrote:
> Greetings. I am Arul Selvan works for Novell. I am exploring the Ext4 architecture, more specifically i would like to understand the write ordering, basically the same blocks is modified more than once, how the write is ordered. Could you point me the doc or the specific source file to look.

Writes in memory to the same file are serialized by i_mutex, but may
modify the same page in memory repeatedly.

When that page us being written to disk, it will be marked with the
page writeback flag, in order to stabilize the content, and allow consistent
checksums (e.g. for MD RAID or disks with T10-DIF). This may block
any further writes from modifying the same page as it is being
submitted to disk, depending on the kernel version and the
requirements of the underlying storage. Once the disk write has been
finished, the writeback bit is cleared and the page can be modified again.

In all cases, the writes to a single page are ordered, but there is no
_guarantee_ about writes to different data blocks being ordered.
The ext4 journal will in fact impose some order on data writes,
by ensuring that the data from all writes associated with a transaction
are flushed before the data for the next transaction.

Since fsync() of any file commits the current transaction, this has
the side-effect that any fsync causes all older writes to be committed.  This is NOT required by POSIX, and applications that depend on this behavior are not portable to/safe on other filesystems.

Cheers, Andreas




More information about the Ext3-users mailing list