How does ext3 handle drive failures?

Andreas Dilger adilger at
Fri Mar 19 05:46:51 UTC 2004

On Mar 17, 2004  19:15 -0600, Philip Molter wrote:
> We want to run multi-drive systems we have in a JBOD mode, where
> each drive is basically a filesystem to itself.  With the drives
> we currently have, we expect to have multiple failures, primarily
> unrecoverable ECC read errors or sometimes the drive just dying
> altogether.
> How does ext[23] handle these two primary conditions?  Using them
> in a software RAID mode, I have sometimes seen problems with disks
> hang all access to the filesystem and even the entire system, but
> I'm not sure at what level that's happening (low-level driver?
> scsi layer?  raid layer?  filesystem layer?).

This is entirely an issue with the bus or SCSI layer, and not the

> If I have a drive fail taking out the entire ext3 filesystem, will
> I be able to stop using the filesystem (say, my application gets
> the error from the fs indicating some sort of problem in whatever
> system call it's made, who cares what), forcibly unmount the
> filesystem, and replace the drive?  Or will the system panic?  Or
> worse, will my application just enter an uninterruptible sleep
> never to return success or error?

Of all Linux filesystems, I think you'll find that ext2/ext3 probably
handle media and device errors the most gracefully (i.e. not panicing
because of cascading errors, unless you want that with errors=panic).
Whether you'll be able to unmount is really dependent on a lot of
factors so it's hard to comment.  When our storage servers (running
ext3) have some catastrophic disk problem we can usually unmount.

Cheers, Andreas
Andreas Dilger

More information about the Ext3-users mailing list