[Linux-cluster] cman & qdiskd

Tue Oct 17 20:43:37 UTC 2006

Okay. My bad.
Using older cman/dlm kernel modules with newer cman/qdisk binaries
works, but I guess the older kernel module doesn't know how to accept a
"quorum  node".
After using the same version all around, it works.

Thanks for the help
Katriel

Josef Whiter wrote:
> Hmm well thats odd.  Stop qdiskd on both nodes and then start it on one and
> watch /var/log/messages and see what it spits out.  If it doesn't spit out any
> obvious errors do the same on the next node.  If it still doesnt work copy the
> relevant parts of the logs into something like http://pastebin.com and post the
> url so i can take a look.
> 
> Josef
> 
> On Tue, Oct 17, 2006 at 08:42:59PM +0200, Katriel Traum wrote:
>> qdisk is running:
>> [root at n1 ~]# service qdiskd status
>> qdiskd (pid 1199) is running...
>> [root at n2 ~]# service qdiskd status
>> qdiskd (pid 873) is running...
>>
>> /tmp/qdisk-status:
>> [root at n1 ~]# cat /tmp/qdisk-status
>> Node ID: 1
>> Score (current / min req. / max allowed): 3 / 2 / 3
>> Current state: Master
>> Current disk state: None
>> Visible Set: { 1 2 }
>> Master Node ID: 1
>> Quorate Set: { 1 2 }
>>
>>
>> Both nodes see /dev/etherd/e0.0 and can access it (tcpdump shows both
>> accessing it for timestamps I suppose)
>> /proc/cluster/nodes shows the same as "cman_tool nodes":
>> [root at n1 ~]# cman_tool nodes
>> Node  Votes Exp Sts  Name
>>    1    1    2   M   n1
>>    2    1    2   M   n2
>>
>> Everything looks OK, it's just not working.
>>
>> cluster.conf:
>> <?xml version="1.0"?>
>> <cluster config_version="9" name="alpha_cluster">
>>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>         <clusternodes>
>>                 <clusternode name="n1" votes="1" nodeid="1">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="man_fence"
>> nodename="n1"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>                 <clusternode name="n2" votes="1" nodeid="2">
>>                         <fence>
>>                                 <method name="1">
>>                                         <device name="man_fence"
>> nodename="n2"/>
>>                                 </method>
>>                         </fence>
>>                 </clusternode>
>>         </clusternodes>
>>         <cman/>
>>         <fencedevices>
>>                 <fencedevice agent="fence_manual" name="man_fence"/>
>>         </fencedevices>
>>         <rm log_level="7">
>>                 <failoverdomains/>
>>                 <resources>
>>                         <ip address="192.168.22.250" monitor_link="1"/>
>>                         <script file="/etc/init.d/httpd" name="httpd"/>
>>                 </resources>
>>                 <service autostart="1" name="apache" recovery="relocate">
>>                         <ip ref="192.168.22.250"/>
>>                         <script ref="httpd"/>
>>                 </service>
>>         </rm>
>>         <quorumd interval="1" tko="5" votes="3" log_level="7"
>> device="/dev/etherd/e0.0" status_file="/tmp/qdisk-status">
>>                 <heuristic program="ping 192.168.22.1 -c1 -t1" score="1"
>> interval="2"/>
>>                 <heuristic program="ping 192.168.22.60 -c1 -t1"
>> score="1" interval="2"/>
>>                 <heuristic program="ping 192.168.22.100 -c1 -t1"
>> score="1" interval="2"/>
>>         </quorumd>
>> </cluster>
>>
>> Katriel
>>
>> Josef Whiter wrote:
>>> What does you're cluster.conf look like?  What about /proc/cluster/nodes?  Are
>>> you sure qdiskd is starting?  Your quorum stuff looks fine.  Do both nodes see
>>> /dev/etherd/e0.0 as the same disk?
>>>
>>> Josef
>>>
>>> On Tue, Oct 17, 2006 at 08:11:42PM +0200, Katriel Traum wrote:
>>> Hello.
>>>
>>> I've seen this subject on the list, but no real solutions.
>>> I'm using Cluster 4 update 4, with qdiskd and a shared disk.
>>> I've understood from the documentation and list that a "cman_tool
>>> status" should reflect the number of votes the quorum daemon holds.
>>>
>>> My setup is pretty straight forward, 2-node cluster, shared storage (AoE
>>> for testing).
>>> qdiskd configuration:
>>> <quorumd interval="1" tko="5" votes="3" log_level="7" device="/dev/ether
>>> d/e0.0" status_file="/tmp/qdisk-status">
>>>                 <heuristic program="ping 192.168.22.1 -c1 -t1" score="1"
>>> interval="2"/>
>>>                 <heuristic program="ping 192.168.22.60 -c1 -t1"
>>> score="1" interval="2"/>
>>>                 <heuristic program="ping 192.168.22.100 -c1 -t1"
>>> score="1" interval="2"/>
>>>         </quorumd>
>>>
>>> cman_tool status shows:
>>> [root at n1 ~]# cman_tool status
>>> Protocol version: 5.0.1
>>> Config version: 8
>>> Cluster name: alpha_cluster
>>> Cluster ID: 50356
>>> Cluster Member: Yes
>>> Membership state: Cluster-Member
>>> Nodes: 2
>>> Expected_votes: 2
>>> Total_votes: 2
>>> Quorum: 2
>>> Active subsystems: 4
>>> Node name: n1
>>> Node addresses: 192.168.22.201
>>>
>>> qdiskd is running, scoring a perfect 3 out of 3, but no votes...
>>> When disconnecting one of the nodes, the other will loose quorum. Am I
>>> missing something?
>>>
>>> Any insight appreciated.
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> -- 
>> Katriel Traum, PenguinIT
>> RHCE, CLP
>> Mobile: 054-6789953
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Katriel Traum, PenguinIT
RHCE, CLP
Mobile: 054-6789953