[Linux-cluster] Two-node clusters using GFS and shared storage

Fri Jul 13 16:18:36 UTC 2007

Greetings,

I've been trying to setup a two-node cluster using a shared SAN (via
Fibre Channel) and GFS. I've previously tried OCFS2, and I don't want to
use NFS yet. The cluster must be an active-active one, and it runs on
Itanium2 machines with Debian 4.0. I'm using cman 1.03.00

I've setup a cluster using Red Hat tools, and my
/etc/cluster/cluster.conf looks like:

-- my cluster.conf --

<?xml version="1.0"?>
<cluster name="correo" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>

<clusternode name="node1" votes="1">
</clusternode>

<clusternode name="node2" votes="1">
</clusternode>

</clusternodes>

</cluster>

-- end my cluster.conf --

Note that I've removed entries related to fencing, but I previously had
a 'manual' fencing method. So I've an LVM volume which contains a GFS
filesystem, and I'm able to start ccsd, cman, fenced, clvmd and all the
other related applications.

Syslog reports that the cluster is quorate, and I'm able to mount the
filesystem in both of my nodes. They need to write to the shared storage
in an active-active fashion.

I expect that removing the network cable in node1 would do the following:

a) node1 would be disabled (right, it doesn't have a network cable)
b) node2 would notice node1 is not there and will keep writing to the
shared storage
c) Eventually node1 will come back, and node2 will notice it, so it will
hopefully start writing again

And this it what happens when I unplug the network cable:

a) node1 is disabled (no connectivity)
b) node2 is also disabled! (trying to write to /home and /var/mail
stalls the machine, and then logins and other processes are stalled)
c) Plugging the cable back does nothing (both machines are hanged now,
so I need to reboot them)

I'm probably missing something, since this solution using OCFS2 also has
the same problem! Our last-resort solution is active-active NFS using
Heartbeat, but then we wouldn't be writing to the SAN through FC (2Gbps)
but through Ethernet (1Gbps) since we don't have any other media around ATM.

Is this a configuration related problem? Or is this a design feature in
 both GFS/OCFS2? Or maybe I'm just missing the whole picture...

Thank you very much for any advice,
Jose