[Linux-cluster] qdiskd master election and loss of quorum

Alain RICHARD alain.richard at equation.fr
Mon Nov 2 16:59:18 UTC 2009


I am currently using a n nodes configuration with a qdiskd process to  
sustain a n-1 node failure.

The simplest case is a two node :

<cluster config_version="79" name="xxx">
         <totem token="42000"/>
         <clusternodes>
         <cman expected_votes="3" two_node="0"/>
                 <clusternode name="n1" nodeid="1" votes="1">
                         <fence>
...
                         </fence>
                 </clusternode>
                 <clusternode name="n2" nodeid="2" votes="1">
                         <fence>
...
                         </fence>
                 </clusternode>
         </clusternodes>
         <quorumd cman_label="qdisk1" device="/dev/yyy" interval="2"  
tko="10" votes="1" reboot="0" allow_kill="0" status_file="/qdiskstat">
         </quorumd>
         <rm>
...
         </rm>
</cluster>

I am experiencing some times a loss of quorum on the over node when I  
shutdown gracefully a node using the following :
# service rgmanager stop
# service gfs2 stop
# service clvmd stop
# service qdiskd stop
# service cman stop


After looking more precisely to the problem, I just discover that the  
problem is that the node I shutdown is the master qdisk node, so when  
I shutdown qdiskd and cman on the first node, the second node  
experience a loss of qdisk vote (because the second node sees that  
qdisk master is not avail and start the election of the new master)  
and almost simultaneouly a loss of the first node vote because it has  
leaved the cluster.

The effect is that the second node experience a loss of quorum during  
about 20 seconds, the time to elect himself as qdisk master. The  
problem is that rgmanager sees the loss of quorum and shutdowns all  
the virtual machines that are under its control !!!

If I wait 20 seconds between the "service qdiskd stop" and "service  
cman stop", I don't get the problem because the second node get the  
time to elect himself master.

I was thinking qdiskd is supposed to be a process to maintain the  
quorum independently of the cman communication.

Either I make a mistake or misuse of qdiskd, or there is something to  
change in the handling of qdiskd votes.

One solution may be for a node that was not qdiskd master, and was  
issuing votes to cman to maintain this vote until a new master  
election succeeds instead of removing its vote until the master  
reelection succeeds ?

Regards,

-- 
Alain RICHARD <mailto:alain.richard at equation.fr>
EQUATION SA <http://www.equation.fr/>
Tel : +33 477 79 48 00     Fax : +33 477 79 48 01
E-Liance, Opérateur des entreprises et collectivités,
Liaisons Fibre optique, SDSL et ADSL <http://www.e-liance.fr>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091102/af9bbe86/attachment.htm>


More information about the Linux-cluster mailing list