[Linux-cluster] GFS/SCSI Lost

Robert Peterson rpeterso at redhat.com
Mon Nov 6 15:25:34 UTC 2006


isplist at logicore.net wrote:
> I posted about this last night but found some additional info. My first post 
> was not very useful, showing a paste of SCSI errors after it was disconnected.
>
> I see that something just times out and the storage is lost. I find that I can 
> just get on the node, unmount the lost mount, remount and it's back. I also 
> notice that the mount is set as non permanent? 
>
> Do I need a keep alive script or is there a configuration somewhere I've 
> missed? Here is a snippet from where SCSCI errors started overnight.
>
> Nov  5 21:16:02 qm250 kernel: SCSI error : <0 0 2 1> return code = 0x10000
> Nov  5 21:16:02 qm250 kernel: end_request: I/O error, dev sdf, sector 655
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: fatal: I/O error
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   block = 26
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   function = gfs_dreread
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   file = 
> /home/xos/gen/updates-2006-08/xlrpm21122/rpm/BUILD/gfs-kerne
> l-2.6.9-58/up/src/gfs/dio.c, line = 576
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0:   time = 1162782962
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: about to withdraw from 
> the cluster
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: waiting for outstanding 
> I/O
> Nov  5 21:16:02 qm250 kernel: GFS: fsid=vgcomp:qm.0: telling LM to withdraw
> Nov  5 21:16:05 qm250 kernel: lock_dlm: withdraw abandoned memory
> Nov  5 21:16:05 qm250 kernel: GFS: fsid=vgcomp:qm.0: withdrawn
>   
Hi Mike,

This one can't be blamed on GFS or the cluster infrastructure. 
The messages indicate that GFS withdrew because of underlying SCSI errors,
which could mean a number of things underneath GFS, like flaky hardware, 
cables, etc.
Maybe even the storage adapter or possibly even its device driver. 
The problem is not that your mount is temporary, and you shouldn't need any
kind of keepalive script, that I'm aware of.

Regards,

Bob Peterson
Red Hat Cluster Suite




More information about the Linux-cluster mailing list