[Linux-cluster] How to determine what is causing GFS to hang?
Brett Cave
brettcave at gmail.com
Mon Aug 4 09:02:51 UTC 2008
Hi,
I have a GFS cluster set up on a fibre SAN.
<?xml version="1.0"?>
<cluster name="mydisk1" config_version="8">
<quorumd interval="3" tko="10" label="myqdisk" votes="5"/>
<cman expected_votes="11" port="6809">
</cman>
<fence_daemon post_join_delay="60" post_fail_delay="30">
</fence_daemon>
<clusternodes>
<clusternode name="worker1" nodeid="1">
<fence>
<method name="fabric">
<device name="ilo-worker1"/>
</method>
</fence>
</clusternode>
<!-- repeated for nodes through to worker6 -->
</clusternodes>
<fencedevices>
<fencedevice name="ilo-worker1" agent="fence_ilo"
hostname="192.168.0.101" login="fence" passwd="fencerPass"/>
<!-- repeated through to ilo-worker6 -->
</fencedevices>
Selected output from cman_tool status:
Membership state: Cluster-Member
Nodes: 6
Expected votes: 11
Total votes: 11
Quorum: 6
Active subsystems: 7
Flags:
cman_tool nodes (0 = qdisk):
Node Sts Inc Joined Name
0 M 0 2008-07-25 03:00:29 /dev/sda1
1 M 1156 2008-07-25 02:59:16 worker1
2 M 1160 2008-07-25 02:59:20 worker2
# and so on, all sts columns = M, all have valid Joined time, all have
different Inc column.
cman_tool services - think there might be something here, not sure
what to make of this - is this fencing trying to take place??
[root at hecate ~]# cman_tool services
type level name id state
fence 0 default 00010001 none
[1 2 3 4 5 6]
dlm 1 storage 00030001 none
[1 2 3 4 5 6]
dlm 1 cache1 00050001 none
[1 2 3 4 5 6]
gfs 2 storage 00020001 none
[1 2 3 4 5 6]
gfs 2 cache1 00040001 none
[1 2 3 4 5 6]
cache1 and storage are the 2 GFS volumes in the cluster.
when I run an "ls" on a directory in storage, it just hangs. How would
I get GFS to recover from this?
Regards.
Brett
More information about the Linux-cluster
mailing list