[Linux-cluster] How to determine what is causing GFS to hang?

Mon Aug 4 09:02:51 UTC 2008

Hi,

I have a GFS cluster set up on a fibre SAN.
<?xml version="1.0"?>
<cluster name="mydisk1" config_version="8">

<quorumd interval="3" tko="10" label="myqdisk" votes="5"/>
<cman expected_votes="11" port="6809">
</cman>

<fence_daemon post_join_delay="60" post_fail_delay="30">
</fence_daemon>

<clusternodes>
         <clusternode name="worker1" nodeid="1">
                <fence>
                        <method name="fabric">
                                <device name="ilo-worker1"/>
                        </method>
                </fence>
        </clusternode>
        <!-- repeated for nodes through to worker6 -->
</clusternodes>
<fencedevices>
        <fencedevice name="ilo-worker1" agent="fence_ilo"
hostname="192.168.0.101" login="fence" passwd="fencerPass"/>
         <!-- repeated through to ilo-worker6 -->
</fencedevices>

Selected output from cman_tool status:
Membership state: Cluster-Member
Nodes: 6
Expected votes: 11
Total votes: 11
Quorum: 6
Active subsystems: 7
Flags:

cman_tool nodes (0 = qdisk):
Node  Sts   Inc   Joined               Name
   0   M      0   2008-07-25 03:00:29  /dev/sda1
   1   M   1156   2008-07-25 02:59:16  worker1
   2   M   1160   2008-07-25 02:59:20  worker2
# and so on, all sts columns = M, all have valid Joined time, all have
different Inc column.

cman_tool services - think there might be something here, not sure
what to make of this - is this fencing trying to take place??
[root at hecate ~]# cman_tool services
type             level name     id       state
fence            0     default  00010001 none
[1 2 3 4 5 6]
dlm              1     storage  00030001 none
[1 2 3 4 5 6]
dlm              1     cache1   00050001 none
[1 2 3 4 5 6]
gfs              2     storage  00020001 none
[1 2 3 4 5 6]
gfs              2     cache1   00040001 none
[1 2 3 4 5 6]

cache1 and storage are the 2 GFS volumes in the cluster.

when I run an "ls" on a directory in storage, it just hangs. How would
I get GFS to recover from this?

Regards.
Brett