How does ext3 handle drive failures?
Andreas Dilger
adilger at clusterfs.com
Fri Mar 19 05:46:51 UTC 2004
On Mar 17, 2004 19:15 -0600, Philip Molter wrote:
> We want to run multi-drive systems we have in a JBOD mode, where
> each drive is basically a filesystem to itself. With the drives
> we currently have, we expect to have multiple failures, primarily
> unrecoverable ECC read errors or sometimes the drive just dying
> altogether.
>
> How does ext[23] handle these two primary conditions? Using them
> in a software RAID mode, I have sometimes seen problems with disks
> hang all access to the filesystem and even the entire system, but
> I'm not sure at what level that's happening (low-level driver?
> scsi layer? raid layer? filesystem layer?).
This is entirely an issue with the bus or SCSI layer, and not the
filesystem.
> If I have a drive fail taking out the entire ext3 filesystem, will
> I be able to stop using the filesystem (say, my application gets
> the error from the fs indicating some sort of problem in whatever
> system call it's made, who cares what), forcibly unmount the
> filesystem, and replace the drive? Or will the system panic? Or
> worse, will my application just enter an uninterruptible sleep
> never to return success or error?
Of all Linux filesystems, I think you'll find that ext2/ext3 probably
handle media and device errors the most gracefully (i.e. not panicing
because of cascading errors, unless you want that with errors=panic).
Whether you'll be able to unmount is really dependent on a lot of
factors so it's hard to comment. When our storage servers (running
ext3) have some catastrophic disk problem we can usually unmount.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
More information about the Ext3-users
mailing list