[Linux-cluster] Rebooting qdisk master causes quorum to dissolve.

Peter Tiggerdine peter.tiggerdine at uq.edu.au
Mon Dec 21 03:26:38 UTC 2009


Hi,

I have a five node cluster with a shared quorum disk without heuristics.
Because of the a hardware problem I need to move the services off the
host in question and replace some ram. The services moved without a
hitch, but soon as I rebooted the nodes the cluster came down.

The relevant configuration is 

<cluster alias="Services" config_version="150" name="Services">
        <quorumd interval="5" tko="12" device="/dev/emcpowere" votes="3"
log_level="9" log_facility="local4" status_file="/qdisk_status"/>
        <fence_daemon clean_start="1" post_fail_delay="15"
post_join_delay="30"/>
        <cman deadnode_timeout="90" expected_nodes="4"/> 

The relevant logs are below from an adjacent node:

Dec 21 11:40:15 io2 clurgmgrd[7271]: <notice> Member 1 shutting down 
Dec 21 11:40:40 io2 qdiskd[6820]: <info> Node 1 shutdown 
Dec 21 11:40:47 io2 openais[6801]: [CMAN ] lost contact with quorum
device 
Dec 21 11:40:47 io2 openais[6801]: [CMAN ] quorum lost, blocking
activity 
Dec 21 11:40:47 io2 clurgmgrd[7271]: <emerg> #1: Quorum Dissolved 
Dec 21 11:40:47 io2 kernel: dlm: closing connection to node 1

Have I configured this in-correctly or is the a known problem with
rebooting the qdisk master? It's just occurred to me that I did lock the
resource groups to prevent the moved services from returning to the
node.

Thanks in-advance and look forward to your replies, 

Peter Tiggerdine
HPC & eResearch Specialist
High Performance Computing Group
Information Technology Services
University of Queensland





More information about the Linux-cluster mailing list