[Ext2-devel] Re: Linux performance bug: fsync() for files with zero links
Stephen C. Tweedie
sct at redhat.com
Tue Feb 28 17:58:17 UTC 2006
On Tue, 2006-02-28 at 17:30 +0100, Erik Mouw wrote:
> > From man write(2):
> > write writes up to count bytes to the file referenced by the file
> > descriptor fd from the buffer starting at buf. POSIX requires that a
> > read() which can be proved to occur after a write() has returned
> > returns the new data. Note that not all file systems are POSIX con-
> > forming.
> AFAIK that's read() from the same process, not read() from another
No, it's read() from any process. fsync() has absolutely no effect in
the scenario you describe. This is different from fflush() of buffered
IO written by fwrite(): the fflush() *is* needed when using buffered IO
if you want to make this guarantee.
> Otherwise there would be no need for fsync()/fdatasync().
No -- f[data]sync() is there only to force the flush to disk. The
effects of fsync are completely invisible to running processes (apart
from some indirect effects, such as performance side-effects incurred
due to the disk accesses.) But we still need fsync() to be able to
guarantee that data is stable on disk, if we want to support
applications that have guaranteed consistency properties over power
failure (eg. a mail spooler should not tell a remote mail-sending host
that an email has been accepted until an fsync() or similar syscall has
guaranteed that it's on disk.)
> But look at my example. tail(1) uses fstat64() to figure out if
> /var/log/messages changed. Your proposal for a patch will break that.
No, it won't.
> Again: the number of links of an inode is not a reason to break
> established semantics.
Correct. And the semantics *will* change with this patch, but in a
Ext3 happens to guarantee that after fsync(), *all* metadata for a file
--- including directory metadata --- are synchronised to disk. So if
you unlink an open file and then fsync() it, you are guaranteed that the
unlink has been committed to disk. This is not, strictly speaking, a
behaviour required by POSIX; but it's still useful, and would be broken
if we disabled fsync() for files with i_nlink==0.
More information about the Ext3-users