[Linux-cluster] CLVM/GFS will not mount or communicate with cluster

Mon Dec 4 15:03:42 UTC 2006

Barry Brimer wrote:
> This is a repeat of the post I made a few minutes ago.  I thought adding a
> subject would be helpful.
>
>
> I have a 2 node cluster for a shared GFS filesystem.  One of the nodes fenced
> the other, and the node that got fenced is no longer able to communicate with
> the cluster.
>
> While booting the problem node, I receive the following error message:
> Setting up Logical Volume Management:  Locking inactive: ignoring clustered
> volume group vg00
>
> I have compared /etc/lvm/lvm.conf files on both nodes.  They are identical.  The
> disk (/dev/sda1) is listed when typing "fdisk -l"
>
> There are no iptables firewalls active (although /etc/sysconfig/iptables exists,
> iptables is chkconfig'd off).  I have written a simple iptables logging rule
> (iptables -I INPUT -s <problem node> -j LOG) on the working node to verify that
> packets are reaching the working node, but no messages are being logged in
> /var/log/messages on the working node that acknowledge any cluster activity
> from the problem node.
>
> Both machines have the same RH packages installed and are mostly up to date,
> they are missing the same packages, none of which involve the kernel, RHCS or
> GFS.
>
> When I boot the problem node, it successfully starts ccsd, but it fails after a
> while on cman and fails after a while on fenced.  I have given the clvmd
> process an hour, and it still will not start.
>
> vgchange -ay on the problem node returns:
>
> # vgchange -ay
>   connect() failed on local socket: Connection refused
>   Locking type 2 initialisation failed.
>
> I have the contents of /var/log/messages on the working node and the problem
> node at the time of the fence, if that would be helpful.
>
> Any help is greatly appreciated.
>
> Thanks,
> Barry
>   
Hi Barry,

Well, vgchange and other lvm functions won't work on the clustered volume
unless clvmd is running, and clvmd won't run properly until the node is 
talking
happily through the cluster infrastructure.  So as I see it, your 
problem is that
cman is not starting properly.  Unfortunately, you haven't told us much 
about
the system to determine why.  There can be many reasons.
For now, let me assume that the two were working properly in a cluster 
before
it was fenced, and therefore I'll assume that the software and 
configurations are
all okay.  I think one reason this might happen is if you're using 
manual fencing
and haven't yet done your:

fence_ack_manual -n <fenced_node>

on the remaining node to acknowledge that the reboot actually happened.

Also, you might want to test communications between the boxes to make
sure they can communicate with each other in general. 

You might also get this kind of problem if you had updated the cluster 
software,
so that the cman on one node is incompatible with the cman on the other.
Ordinarily, there are no problems or incompatibilities with upgrading, but
if you upgraded cman from RHEL4U1 to RHEL4U4, for example, you might
get this because the cman protocol changed slightly between RHEL4U1 and U2.

Next time, it would also be helpful to post what version of the cluster 
software
you're running and possibly snippets from /var/log/messages showing why
cman is not connecting.

Regards,

Bob Peterson
Red Hat Cluster Suite