How does ext3 handle drive failures?

Philip Molter philip at staff.texas.net
Thu Mar 18 01:15:09 UTC 2004


We want to run multi-drive systems we have in a JBOD mode, where
each drive is basically a filesystem to itself.  With the drives
we currently have, we expect to have multiple failures, primarily
unrecoverable ECC read errors or sometimes the drive just dying
altogether.

How does ext[23] handle these two primary conditions?  Using them
in a software RAID mode, I have sometimes seen problems with disks
hang all access to the filesystem and even the entire system, but
I'm not sure at what level that's happening (low-level driver?
scsi layer?  raid layer?  filesystem layer?).

If I have a drive fail taking out the entire ext3 filesystem, will
I be able to stop using the filesystem (say, my application gets
the error from the fs indicating some sort of problem in whatever
system call it's made, who cares what), forcibly unmount the
filesystem, and replace the drive?  Or will the system panic?  Or
worse, will my application just enter an uninterruptible sleep
never to return success or error?

Obviously, we'll be doing our own testing, but any knowledge of
these scenarios would be most appreciated.

Philip

* Philip Molter
* Texas.Net Internet
* http://www.texas.net/
* philip at texas.net





More information about the Ext3-users mailing list