[Linux-cluster] gfs on 2.6.8.1 -- tar hung
Daniel McNeil
daniel at osdl.org
Mon Oct 18 22:51:41 UTC 2004
On Fri, 2004-10-15 at 16:21, Daniel McNeil wrote:
> Hey all,
>
> I am testing gfs on 2.6.8.1. I have 3 machines connected
> to shared fibre channel storage. Currently I have 2 nodes
> in the cluster and gfs on a 5 disk stripe mounted on 2 nodes.
>
> I was running 'tar xvf /views/linux-2.6.8.1' on each
> each node in separate directories of the same gfs file system.
>
> 1 tar finished, but the other is stuck.
>
> cat /proc/6601/wchan shows glock_wait_internal
>
> Is there anyway to pull off more info that might be useful
> in debugging this?
I have a bit more info, but not as much as I wanted.
I left the cluster with the tar hung over the weekend.
The cluster was configured for 3 nodes, but with only
2 nodes up with gfs mounted on each.
Sometime over the weekend, the nodes lost communication
(maybe network glitch) Both nodes got
CMAN: quorum lost, blocking activity
I did a cman_tool join on the 3rd node and it joined with
the node that happened to have the tar hung and it got:
CMAN: quorum regained, resuming activity
However, the tar remained hung.
I rebooted the 1st and had it join the cluster.
cat /proc/cluster/{status,nodes} showed all three had
joined the cluster, but the tar was still hung.
On the node with the hung tar, cat /proc/cluster/services
also hung.
Started a 2nd mount of the gfs file system on the 2nd node,
the the mount hung.
rebooted the node with the hung tar. The mount on the 2nd
node completed.
I did not have SYSRQ enabled, so I could not get stack traces
from the hung tar. I'm rebooting on a kernel with SYSRQ enabled
and will keep testing.
Daniel
More information about the Linux-cluster
mailing list