[Linux-cluster] CLVM/GFS will not mount or communicate with cluster
Patrick Caulfield
pcaulfie at redhat.com
Tue Dec 5 14:34:31 UTC 2006
Barry Brimer wrote:
>
>
> On Mon, 4 Dec 2006, Robert Peterson wrote:
>
>> Barry Brimer wrote:
>>> This is a repeat of the post I made a few minutes ago. I thought
>>> adding a
>>> subject would be helpful.
>>>
>>>
>>> I have a 2 node cluster for a shared GFS filesystem. One of the
>>> nodes fenced
>>> the other, and the node that got fenced is no longer able to
>>> communicate with
>>> the cluster.
>>>
>>> While booting the problem node, I receive the following error message:
>>> Setting up Logical Volume Management: Locking inactive: ignoring
>>> clustered
>>> volume group vg00
>>>
>>> I have compared /etc/lvm/lvm.conf files on both nodes. They are
>>> identical. The
>>> disk (/dev/sda1) is listed when typing "fdisk -l"
>>>
>>> There are no iptables firewalls active (although
>>> /etc/sysconfig/iptables exists,
>>> iptables is chkconfig'd off). I have written a simple iptables
>>> logging rule
>>> (iptables -I INPUT -s <problem node> -j LOG) on the working node to
>>> verify that
>>> packets are reaching the working node, but no messages are being
>>> logged in
>>> /var/log/messages on the working node that acknowledge any cluster
>>> activity
>>> from the problem node.
>>>
>>> Both machines have the same RH packages installed and are mostly up
>>> to date,
>>> they are missing the same packages, none of which involve the kernel,
>>> RHCS or
>>> GFS.
>>>
>>> When I boot the problem node, it successfully starts ccsd, but it
>>> fails after a
>>> while on cman and fails after a while on fenced. I have given the clvmd
>>> process an hour, and it still will not start.
>>>
>>> vgchange -ay on the problem node returns:
>>>
>>> # vgchange -ay
>>> connect() failed on local socket: Connection refused
>>> Locking type 2 initialisation failed.
>>>
>>> I have the contents of /var/log/messages on the working node and the
>>> problem
>>> node at the time of the fence, if that would be helpful.
>>>
>>> Any help is greatly appreciated.
>>>
>>> Thanks,
>>> Barry
>>>
>> Hi Barry,
>>
>> Well, vgchange and other lvm functions won't work on the clustered volume
>> unless clvmd is running, and clvmd won't run properly until the node
>> is talking
>> happily through the cluster infrastructure. So as I see it, your
>> problem is that
>> cman is not starting properly. Unfortunately, you haven't told us
>> much about
>> the system to determine why. There can be many reasons.
>
> Agreed. Although it did not seem relevant at the time of the post,
> there were network outages around the time of the failure. What happens
> now is that on the problem node, ccsd starts, but when starting cman it
> sends membership requests but they are never received acknowledged the
> working node. Again, I see packets received in the /var/log/messages on
> the working node on UDP 6809 from the problem node, but watching
> /var/log/messages on the working node, cman never acknowledges them.
>
> The problem node had this in its /var/log/messages at the time of the
> problem:
>
> Dec 1 14:29:38 server1 kernel: CMAN: Being told to leave the cluster by
> node 1
> Dec 1 14:29:38 server1 kernel: CMAN: we are leaving the cluster.
If you're running the cman from RHEL4 Update 3 then there's a bug in there you might be hitting.
You'll need to upgrade all the nodes in the cluster to get rid of it. I can't tell for sure if it is that problem you're having
without seeing more kernel messages though.
--
patrick
More information about the Linux-cluster
mailing list