[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: part of files in another file after crash


On Mon, Sep 24, 2001 at 07:52:08PM +0200, Guenther Starnberger wrote:

> the problem is that after 3 crashes at startup, when my notebook finally 
> worked i got the msg:
> Sep 23 23:29:17 blackbox kernel: EXT3-fs warning (device ide0(3,3)): 
> ext3_clear_journal_err: detected journal error -5 from previous mount

Nice, that's the first report I've had of that code working in the
field. :-)

What has happened is that ext3 has, on a previous mount, detected a
fatal IO error (EIO) during journal operations, and has taken the
entire journal offline as a result, marking the journal with the error

On the subsequent mount, e2fsck detected the error marked in the
journal, and forced a full fsck of the filesystem.  So far, that's
all working as expected.

> the problem is that if found a part of my dpkg package list in my motd and 
> the first line of the motd in my resolv.conf :/ (i haven't found any other 
> corrupted files yet)

I have had precisely 3 other reports of weird data corruption on
recent kernels, all of which were on laptops.  It really sounds as if
there's dodgy hardware involved here.  Some of those prior cases seem
to go away if you avoid the suspend-to-ram function, by the way.

> if ext3 detects a journal error - why does it still use the journal (it did a 
> fsck after the recovering)?

The journal, up to the point at which the error occurred, should still
be valid and may well contain information which is _much_ more
uptodate than that in the rest of the filesystem.  After the error
is detected in the journal, we absolutely refuse to generate any new
commit records so there is no way for the IO which was in progress at
the time to be replayed, so we're not in danger of recovering the
data being logged at the time of the error.

> why are files corrupted which i don't edit very often (motd, the dpkg list, i 
> changed the resolv.conf before the crashes).

I've got fairly cast-iron evidence of at least one laptop disk drive
writing data to the wrong blocks on disk under load.  It could be
that.  If you're getting weird crashes then basically any part of
memory might be getting corrupted, which can confuse any filesystem
about where to write to disk.  It's not usually particularly easy to
track down a specific chain of events leading to the corruption when
there is bad hardware involved.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]