[Linux-cluster] quorum device not getting a vote causes 2-node cluster to be inquorate
bergman at merctech.com
bergman at merctech.com
Tue Mar 15 15:42:09 UTC 2011
The pithy ruminations from "Fabio M. Di Nitto" <fdinitto at redhat.com> on "Re: [Linux-cluster] quorum device not getting a vote causes 2-node cluster to be inquorate" were:
=> On 03/15/2011 05:11 AM, bergman at merctech.com wrote:
=> > I have been using a 2-node cluster with a quorum disk successfully for
=> > about 2 years. Beginning today, the cluster will not boot correctly.
=> > The RHCS services start, but fencing fails with:
=> > dlm: no local IP address has been set
=> > dlm: cannot start dlm lowcomms -107
=> > This seems to be a symtpom of the fact that the cluster votes do not include votes from the quorum
=> > device:
=> > # clustat
=> > Cluster Status for example-infra @ Tue Mar 15 00:02:35 2011
=> > Member Status: Inquorate
=> > Member Name ID Status
=> > ------ ---- ---- ------
=> > example-infr2-admin.domain.com 1 Online, Local
=> > example-infr1-admin.domain.com 2 Offline
=> > /dev/mpath/quorum 0 Offline
=> > [root at example-infr2 ~]# cman_tool status
=> > Version: 6.2.0
=> > Config Version: 239
=> > Cluster Name: example-infra
=> > Cluster Id: 42813
=> > Cluster Member: Yes
=> > Cluster Generation: 676844
=> > Membership state: Cluster-Member
=> > Nodes: 1
=> > Expected votes: 2
=> > Total votes: 1
=> > Quorum: 2 Activity blocked
=> > Active subsystems: 7
=> > Flags:
=> > Ports Bound: 0
=> > Node name: example-infr2-admin.domain.com
=> > Node ID: 1
=> > Multicast addresses: 126.96.36.199
=> > Node addresses: 192.168.110.3
=> You should check the output from cman_tool nodes. It appears that the
=> nodes are not seeing each other at all.
That's correct...at the time I ran cman_tool and clustat, one node was down (deliberately, in an attempt to troubleshoot the issue, but this would also be the case in the event of a hardware failure).
As I see it, the problem is not with the inter-node communication, but with the quorum device. Note that there is only one vote registered--there are no votes from the quorum device. The quorum device should provide sufficient votes to make the "cluster" quorate if only one node is running.
If I understand it correctly, this should also let the "cluster" start with a single node (as long as that node can write to the quorum device). If my understanding is wrong, then how can a 2-node cluster start if one node is down?
=> The first things I would check are iptables, node names resolves to the
=> correct ip addresses, selinux and eventually if the switch in between
=> the nodes support multicast.
SElinux is disabled (as it has been for the 2 years this cluster has been operational).
There have been no switch changes.
Node names & IPs resolve correctly.
IPtables permits all communication between the "admin" address on the servers.
=> Linux-cluster mailing list
=> Linux-cluster at redhat.com
More information about the Linux-cluster