[PATCH] BUG on fsync/fdatasync with Ext3 data=journal
akpm at osdl.org
Thu Sep 16 21:50:59 UTC 2004
Seiji Kihara <kihara.seiji at lab.ntt.co.jp> wrote:
> We found that fsync and fdatasync syscalls sometimes don't sync
> data in an ext3 file system under the following conditions.
> 1. Kernel version is 2.6.6 or later (including 184.108.40.206 and 2.6.9-rc2).
> 2. Ext3's journalling mode is "data=journal".
> 3. Create a file (whose size is 1Mbytes) and execute umount/mount.
> 4. lseek to a random position within the file, write 8192 bytes
> data, and fsync or fdatasync.
> We presume the data was not written to the corresponding disk
> before returning from fsync or fdatasync syscall on the evidence
> as follows:
> 1. The response time of fsync() and fdatasync() was extremely
> We use the "diskio" tool, which is downloadable from OSDL page
> (http://developer.osdl.jp/projects/doubt/). The program showed
> that the response time was under 10 microseconds. This time
> cannot be achieved with data transfer on IDE and PCI bus!
> 2. The IDE writing routine ide_start_dma() was not called under
> DMA enabled.
> We inserted the print messages in the sys_write(), sys_fsync()
> and ide_start_dma() by the attached patch. Sometimes the
> "ide_start_dma: ..." message was not shown between "write: in
> ..." and "fsync: out ...".
> The problem was occurred since 2.6.5-bk1, which includes the patch
> "[PATCH] ext3 fsync() and fdatasync() speedup". We found that the
> problem was solved by deleting the part of the patch which
> modifies ext3_sync_file(). Maybe, i_state is not correctly set to
> I_DIRTY when the related page cache is dirty (is it true?)
I forgot about this one.
> Attached file is tarball (tar + bzip2) which contains following
> files. The patches are for 220.127.116.11-kernel (applicable to
> 2.6.9-rc2), and the results were also produced with
We really don't need a 100k tarball to communicate a three line patch :(
Yes, the I_DIRTY test is bogus because data pages are not marked dirty at
write() time when the filesystem is mounted in data=journal mode.
However your patch will disable the above optimisation for data=writeback
and data-ordered modes as well. I don't think that's necessary?
How about this?
--- 25/fs/ext3/fsync.c~ext3-journal-data-fsync-fix Thu Sep 16 14:47:21 2004
+++ 25-akpm/fs/ext3/fsync.c Thu Sep 16 14:47:33 2004
@@ -49,10 +49,6 @@ int ext3_sync_file(struct file * file, s
J_ASSERT(ext3_journal_current_handle() == 0);
- smp_mb(); /* prepare for lockless i_state read */
- if (!(inode->i_state & I_DIRTY))
- goto out;
* The caller's filemap_fdatawrite()/wait will sync the data.
@@ -76,6 +72,10 @@ int ext3_sync_file(struct file * file, s
+ smp_mb(); /* prepare for lockless i_state read */
+ if (!(inode->i_state & I_DIRTY))
+ goto out;
* The VFS has written the file data. If the inode is unaltered
* then we need not start a commit.
More information about the Ext3-users