[Linux-cluster] GFS failure

Thu Jun 15 18:26:25 UTC 2006

On Thu, Jun 15, 2006 at 07:05:39PM +0200, Anthony wrote:
> Hello,
> 
> yesterday,
> we had a full GFS system Fail,
> all partitions were unaccessible from all the 32 nodes.
> and now all the cluster is inaccessible.
> did any one had already seen this problem?
> 
> 
> GFS: Trying to join cluster "lock_gulm", "gen:ir"
> GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS...
> GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock...
> GFS: fsid=gen:ir.32: jid=32: Looking at journal...
> GFS: fsid=gen:ir.32: jid=32: Done
> 
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> 
> GFS: fsid=gen:ir.32: fatal: filesystem consistency error
> GFS: fsid=gen:ir.32:   function = trans_go_xmote_bh
> GFS: fsid=gen:ir.32:   file = 
> /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, 
> line = 542
> GFS: fsid=gen:ir.32:   time = 1150223491
> GFS: fsid=gen:ir.32: about to withdraw from the cluster
> GFS: fsid=gen:ir.32: waiting for outstanding I/O
> GFS: fsid=gen:ir.32: telling LM to withdraw

This looks like
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164331

which was fixed back in March and should be in the latest rpm's or source
tarball.

Dave