[Ext2-devel] Re: Linux performance bug: fsync() for files with zero links

Stephen C. Tweedie sct at redhat.com
Tue Feb 28 17:58:17 UTC 2006


Hi,

On Tue, 2006-02-28 at 17:30 +0100, Erik Mouw wrote:

> > From man write(2):
> > 
> >        write  writes  up  to  count  bytes  to the file referenced by the file
> >        descriptor fd from the buffer starting at buf.  POSIX requires  that  a
> >        read()  which  can  be  proved  to  occur  after a write() has returned
> >        returns the new data.  Note that not all file systems  are  POSIX  con-
> >        forming.
> 
> AFAIK that's read() from the same process, not read() from another
> process.

No, it's read() from any process.  fsync() has absolutely no effect in
the scenario you describe.  This is different from fflush() of buffered
IO written by fwrite(): the fflush() *is* needed when using buffered IO
if you want to make this guarantee. 

>  Otherwise there would be no need for fsync()/fdatasync().

No -- f[data]sync() is there only to force the flush to disk.  The
effects of fsync are completely invisible to running processes (apart
from some indirect effects, such as performance side-effects incurred
due to the disk accesses.)  But we still need fsync() to be able to
guarantee that data is stable on disk, if we want to support
applications that have guaranteed consistency properties over power
failure (eg. a mail spooler should not tell a remote mail-sending host
that an email has been accepted until an fsync() or similar syscall has
guaranteed that it's on disk.)

> But look at my example. tail(1) uses fstat64() to figure out if
> /var/log/messages changed. Your proposal for a patch will break that.

No, it won't.

> Again: the number of links of an inode is not a reason to break
> established semantics.

Correct.  And the semantics *will* change with this patch, but in a
subtle way.

Ext3 happens to guarantee that after fsync(), *all* metadata for a file
--- including directory metadata --- are synchronised to disk.  So if
you unlink an open file and then fsync() it, you are guaranteed that the
unlink has been committed to disk.  This is not, strictly speaking, a
behaviour required by POSIX; but it's still useful, and would be broken
if we disabled fsync() for files with i_nlink==0.

--Stephen





More information about the Ext3-users mailing list