[Linux-cluster] clurgmgrd doesn't work with quorum disk

Jos Vos jos at xos.nl
Sat Sep 22 16:03:56 UTC 2007


Hi,

After adding a quorum disk to a two-node configuration (and restarting,
rebooting the cluster), everything *seems* to be ok, except that
clurgmgrd doesn't work properly anymore and that no services are started.
It times out when clustat asks the status, it can't get any service
status anymore, and so it doesn't start any service too...

This is on RHEL 5.0.

Some info (I masked most names/IP address by ****'s):

Old config:

     <cman expected_votes="1" two_node="1"/>

New config (this is the only change in the config!):

     <cman expected_votes="3" two_node="0"/>
     <quorumd interval="1" tko="10" votes="1" label="qdisk1">
             <heuristic program="ping ****** -c1 -t1" score="1" interval="2"/>
             <heuristic program="ping ****** -c1 -t1" score="1" interval="2"/>
             <heuristic program="ping ****** -c1 -t1" score="1" interval="2"/>
     </quorumd>

cman_tool status (old config):

Version: 6.0.1
Config Version: 36
Cluster Name: ***********
Cluster Id: 21428
Cluster Member: Yes
Cluster Generation: 8
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1  
Active subsystems: 8
Flags: 2node 
Ports Bound: 0 11 177  
Node name: *******
Node ID: 1
Multicast addresses: 239.192.83.8 
Node addresses: ************** 

cman_tool status (new config):

Version: 6.0.1
Config Version: 35
Cluster Name: ***********
Cluster Id: 21428
Cluster Member: Yes
Cluster Generation: 8
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 3
Quorum: 2  
Active subsystems: 9
Flags: 
Ports Bound: 0 11 177  
Node name: *******
Node ID: 1
Multicast addresses: 239.192.83.8 
Node addresses: ************** 

clustat (old config):

Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  *******                               1 Online, rgmanager
  *******                               2 Online, Local, rgmanager

  Service Name         Owner (Last)                   State         
  ------- ----         ----- ------                   -----         
  service:***********  *******                        started         
  service:***********  *******                        started         
  service:***********  *******                        started         
  service:***********  *******                        started      


clustat (new config):

Timeout waiting for a response from Resource Group Manager
Member Status: Quorate

  Member Name                        ID   Status
  ------ ----                        ---- ------
  *******                               1 Online
  *******                               2 Online, Local
  /dev/sdc                              0 Online, Quorum Disk

With the new condig, I see in the messages file a.o.:

Sep 22 17:15:15 ******* clurgmgrd[4465]: <err> #34: Cannot get status for service service:***********

Also, clurgmgrd can't be stopped by its service script in that case,
it just doesn't react.

The quorum disk is seen correctly on both systems with "mkqdisk -L"
and it looks like cman etc. do take it into account.

Any clues on what's going on here?
Thanks,

--
--    Jos Vos <jos at xos.nl>
--    X/OS Experts in Open Systems BV   |   Phone: +31 20 6938364
--    Amsterdam, The Netherlands        |     Fax: +31 20 6948204




More information about the Linux-cluster mailing list