[Linux-cluster] Two-Node Cluster Problem
Marco Nietz
m.nietz-redhat at iplabs.de
Thu May 28 08:06:03 UTC 2009
Just gave the whole configuration a new try and setuped the whole
cluster once again.
This is the resulting cluster.conf with a very basic configuration.
[root at ipsdb01 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ips_database" config_version="7" name="ips_database">
<fence_daemon clean_start="1" post_fail_delay="10"
post_join_delay="30"/>
<clusternodes>
<clusternode name="10.102.10.51" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipsdb01.drac"/>
</method>
</fence>
</clusternode>
<clusternode name="10.102.10.28" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ips08.drac"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_drac" ipaddr="10.102.10.128"
login="root" name="ips08.drac" passwd="xxx"/>
<fencedevice agent="fence_drac" ipaddr="10.102.10.151"
login="root" name="ipsdb01.drac" passwd="xxx"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources>
<ip address="10.209.170.55" monitor_link="1"/>
</resources>
<service autostart="1" exclusive="0" name="ips_database"
recovery="relocate">
<ip ref="10.209.170.55"/>
</service>
</rm>
</cluster>
Services running on 10.102.10.28. I've did a 'powerdown' with the
drac-interface but the service is not taken over by the second node.
clustat on the remaining node gave an interessting output
[root at ipsdb01 ~]# clustat
Cluster Status for ips_database @ Thu May 28 09:31:30 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
10.102.10.51 1
Online, Local, rgmanager
10.102.10.28 2
Offline
Service Name Owner
(Last) State
------- ---- -----
------ -----
service:ips_database
10.102.10.28 started
The service is 'started' but the Owner (10.102.10.28) is offline.
These are the last lines from /var/log/messages
May 28 09:27:03 ipsdb01 kernel: dlm: closing connection to node 2
May 28 09:27:03 ipsdb01 openais[5295]: [CLM ] Members Joined:
May 28 09:27:03 ipsdb01 fenced[5315]: 10.102.10.28 not a cluster member
after 0 sec post_fail_delay
May 28 09:27:03 ipsdb01 openais[5295]: [SYNC ] This node is within the
primary component and will provide service.
May 28 09:27:03 ipsdb01 openais[5295]: [TOTEM] entering OPERATIONAL state.
May 28 09:27:03 ipsdb01 openais[5295]: [CLM ] got nodejoin message
10.102.10.51
May 28 09:27:03 ipsdb01 openais[5295]: [CPG ] got joinlist message from
node 1
The remaining system recognizes the failure, but don't start any
takeover-action.
Anyone an Idea what can cause such a Problem ?
Marco Nietz schrieb:
> Tiago Cruz schrieb:
>> Did you have:
>>
>> <cman two_node="1" expected_votes="1"/>
>>
>> ?
>>
>
> yes, have this in my config.
>
>
>
More information about the Linux-cluster
mailing list