[Linux-cluster] qdiskd master election and loss of quorum
Alain RICHARD
alain.richard at equation.fr
Mon Nov 2 16:59:18 UTC 2009
I am currently using a n nodes configuration with a qdiskd process to
sustain a n-1 node failure.
The simplest case is a two node :
<cluster config_version="79" name="xxx">
<totem token="42000"/>
<clusternodes>
<cman expected_votes="3" two_node="0"/>
<clusternode name="n1" nodeid="1" votes="1">
<fence>
...
</fence>
</clusternode>
<clusternode name="n2" nodeid="2" votes="1">
<fence>
...
</fence>
</clusternode>
</clusternodes>
<quorumd cman_label="qdisk1" device="/dev/yyy" interval="2"
tko="10" votes="1" reboot="0" allow_kill="0" status_file="/qdiskstat">
</quorumd>
<rm>
...
</rm>
</cluster>
I am experiencing some times a loss of quorum on the over node when I
shutdown gracefully a node using the following :
# service rgmanager stop
# service gfs2 stop
# service clvmd stop
# service qdiskd stop
# service cman stop
After looking more precisely to the problem, I just discover that the
problem is that the node I shutdown is the master qdisk node, so when
I shutdown qdiskd and cman on the first node, the second node
experience a loss of qdisk vote (because the second node sees that
qdisk master is not avail and start the election of the new master)
and almost simultaneouly a loss of the first node vote because it has
leaved the cluster.
The effect is that the second node experience a loss of quorum during
about 20 seconds, the time to elect himself as qdisk master. The
problem is that rgmanager sees the loss of quorum and shutdowns all
the virtual machines that are under its control !!!
If I wait 20 seconds between the "service qdiskd stop" and "service
cman stop", I don't get the problem because the second node get the
time to elect himself master.
I was thinking qdiskd is supposed to be a process to maintain the
quorum independently of the cman communication.
Either I make a mistake or misuse of qdiskd, or there is something to
change in the handling of qdiskd votes.
One solution may be for a node that was not qdiskd master, and was
issuing votes to cman to maintain this vote until a new master
election succeeds instead of removing its vote until the master
reelection succeeds ?
Regards,
--
Alain RICHARD <mailto:alain.richard at equation.fr>
EQUATION SA <http://www.equation.fr/>
Tel : +33 477 79 48 00 Fax : +33 477 79 48 01
E-Liance, Opérateur des entreprises et collectivités,
Liaisons Fibre optique, SDSL et ADSL <http://www.e-liance.fr>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091102/af9bbe86/attachment.htm>
More information about the Linux-cluster
mailing list