2.4.24 I/O error breakage

Stephen C. Tweedie sct at redhat.com
Tue Jul 6 16:24:36 UTC 2004


Hi,

On Tue, 2004-07-06 at 16:10, Alex Bligh wrote:

> > Of course, if this hits the partition with /var on it, your logs stop
> > being recorded too.
> 
> Can this happen due to a /single/ corruption? I would have thought I
> would see the controller/drive being reset and a retry or two before
> this happened.

"corruption" and "drive retry" indicate two completely different
things.  Corruption can happen silently whenever anything on the _long_
path between platter and memory doesn't get copied right --- it can be
software, or bad CPU, memory, cache, controller, cable, disk etc.

If the drive is unable to read a sector, then yes, you'd expect e
retry.  But any other sort of corruption isn't detectable at the time so
there's no retry involved.

Corruption that hits the journal itself can cause an instant journal
abort.  One or two other cases can, too.  Most cases where the drive is
simply unable to read a block will _not_ cause an abort (EIO hitting the
journal itself is an exception, though.)

--Stephen





More information about the Ext3-users mailing list