[Linux-cluster] GFS join hang

David Teigland teigland at redhat.com
Thu Apr 20 14:32:47 UTC 2006


On Thu, Apr 20, 2006 at 09:56:20AM +0200, Fernando Nino wrote:
> I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of 
> dual-headed Opterons and RHEL4U3. Because of some problems (kernel 
> panic...) I had to hard boot some nodes of the cluster.  Now, some gfs 
> partitions won't mount.  They will simply keep waiting forever for the 
> "join" of the GFS group:
> 
> So... three questions:
> 
> - What is the join exactly doing ? Cluster status is fine, everybody is 
> member ...

>From all 5 nodes it would be good to see:
- cman_tool services
- /var/log/messages
- /proc/cluster/lock_dlm/debug

> - What does the status code mean in the cman_tool output ?
> S-2,2,4

S-2: join event state is SEST_JOIN_ACKWAIT
,2: join event flag is SEFL_ALLOW_JOIN
,4: number of acks to our join request is 4

So, the node is waiting for acks to its join request.  It needs 5 but has
only got 4, someone hasn't sent a reply for some reason.  We might be able
to figure out who and why given all the info from the other nodes.
Rebooting the node that's not replied might resolve things.

Dave




More information about the Linux-cluster mailing list