how does ext3 handle no communication to storage
tytso at mit.edu
Mon Aug 28 20:58:22 UTC 2006
On Mon, Aug 28, 2006 at 03:04:26PM -0400, Sev Binello wrote:
> Can anyone tell us what the expected behavior is,
> in the event that ext3 loses total contact with the storage system ?
> We have found that the file system is put into read only mode,
> it is then found to contain errors, and requires an fsck.
> Sometimes the fsck finds numerous (some serious looking) errors,
> and that running without fsck doesn't seem like a safe option.
> We are trying to understand why exactly this is.
> Why do we get errors ? Why serious ones ?
The filesystem should go read-only when you try to modify it.
HOWEVER, the problem comes when connectivity is restored. When an
attempt to modify the filesystem fails, the journal is aborted and an
I/O is returned. However, there may be modified blocks left hanging
about in the buffer cache before the kernel realized that connectivity
has been lost, and what we need to do is to make sure that all dirty
blocks in the buffer cache and page cache are dropped.
Basically, if I'm right, this is a bug, which we need to fix. That
patch would require flushing all modified buffers and page cache pages
when the filesystem goes read-only. The modified buffers is the more
important thing, since that's what causes the filesystem corruption,
although for correctness's sake we should be flushing any modified
page cache pages as well. I don't have time to code this right now,
but I'll try to get a patch out to relatively soonish, if you're
willing to try it to see if it addresses your observed problem.
More information about the Ext3-users