[Linux-cluster] I/O Error management in GFS
David Teigland
teigland at redhat.com
Mon May 21 15:12:01 UTC 2007
On Fri, May 18, 2007 at 03:49:14PM +0200, Mathieu Avila wrote:
> Sorry for my late reply,
>
> I've performed the following tests with cluster-1.03:
> - mount GFS on more than 1 node, using Gulm as the lock manager.
> - cp'ing something big (a kernel) into it on each node,
> - while it does that, manage to have the device returning I/O errors.
> The result is not what you described: sometimes my "cp" finishes with
> I/O errors (that's good), but most of the times it is blocked in the
> kernel. I cannot perform any action, including umount. Syscalls like
> "df" are blocked, too.
>
> I've done the same test with DLM and got the same results.
Is there anything about "withdraw" in dmesg or /var/log/messages after you
cause the i/o errors? If not, then the i/o errors are not being reported
back to gfs for some reason. Perhaps there are some block/scsi drivers
that don't properly return i/o errors to the fs? Once gfs sees i/o errors
and does the withraw, it should usually work, although it does have
problems occasionally.
Dave
More information about the Linux-cluster
mailing list