[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: inconsistent file content after killing nfs daemon


I look at the same problem in synchronous mode now, on Linux nfs source basis.  

Even in synchronous mode (O_SYNC ...), it seems the nfs client sends as many write request to the server as the user data is splitted into cache pages on client side (ref.  nfs_writepage_sync() , nfs_writepage() in fs/nfs/write.c ,nfs_updatepage() in fs/nfs/write.c , nfs_commit_write() in fs/nfs/file.c , generic_file_write() in mm/filemap.c , nfs_file_write in fs/nfs/file.c)

For instance if i request a synchronous write of 16 bytes at offset PAGE_SIZE - 8 of my file, i think nfs client
will send two WRITE messages to the server. Even if it uses "stable = NFS_FILE_SYNC"  for these two messages,
a failure of the server can occur after the first one has been writen by ext3 on stable storage and not the second one.
Then, if the client restart upon server failure (this is the case in some project) the file is found with
only 8 bytes updated instead of 16.

I propose the following conditions to provide atomicity of write through nsf + ext3, with the current implementation: 
- ext3 journaled mode
- wsize mount option >= PAGE_SIZE
- O_SYNC on open() 
- data size < = PAGE_SIZE
- file offset (PAGE_SIZE) + data-size <= PAGE_SIZE
Another possibility can be to modify the nfs implementation to have only one WRITE message when the
the total size in less than wsize (whatever the number of cache pages used) ? 


"Stephen C. Tweedie" a écrit :


On Fri, Jan 11, 2002 at 10:37:37AM +0100, eric chacron wrote:

> To answer your question, the problem seems to be reproductible only in
> asynchronous mode (without O_SYNC).
> I have reproduced the case ( without O_SYNC) using different record sizes:
> from 1 K to 64 K, but not with 512 bytes.
> It makes sense that the the zeroed holes in the file are caused by the nfs
> client absence of serialisation/ ordering as the file is used in extension.
> With O_SYNC i haven't reproduced the same problem for the moment.

Right --- that's standard unix semantics for writeback.  Writes to
backing store are completely unordered unless you request ordering
with O_SYNC or f[data]sync.


Ext3-users mailing list
Ext3-users redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]