[Linux-cluster] gfs on 2.6.8.1 -- tar hung

Daniel McNeil daniel at osdl.org
Mon Oct 18 22:51:41 UTC 2004


On Fri, 2004-10-15 at 16:21, Daniel McNeil wrote:
> Hey all,
> 
> I am testing gfs on 2.6.8.1.  I have 3 machines connected
> to shared fibre channel storage.  Currently I have 2 nodes
> in the cluster and gfs on a 5 disk stripe mounted on 2 nodes.
> 
> I was running 'tar xvf /views/linux-2.6.8.1' on each
> each node in separate directories of the same gfs file system.
> 
> 1 tar finished, but the other is stuck.
> 
> cat /proc/6601/wchan shows glock_wait_internal
> 
> Is there anyway to pull off more info that might be useful
> in debugging this?

I have a bit more info, but not as much as I wanted.

I left the cluster with the tar hung over the weekend.

The cluster was configured for 3 nodes, but with only
2 nodes up with gfs mounted on each.

Sometime over the weekend, the nodes lost communication
(maybe network glitch) Both nodes got

CMAN: quorum lost, blocking activity

I did a cman_tool join on the 3rd node and it joined with
the node that happened to have the tar hung and it got:
CMAN: quorum regained, resuming activity
However, the tar remained hung.

I rebooted the 1st and had it join the cluster.

cat /proc/cluster/{status,nodes} showed all three had
joined the cluster, but the tar was still hung.
On the node with the hung tar, cat /proc/cluster/services
also hung.

Started a 2nd mount of the gfs file system on the 2nd node,
the the mount hung.

rebooted the node with the hung tar.  The mount on the 2nd
node completed.

I did not have SYSRQ enabled, so I could not get stack traces
from the hung tar.  I'm rebooting on a kernel with SYSRQ enabled
and will keep testing.

Daniel







More information about the Linux-cluster mailing list