[Linux-cluster] Service Recovery Failure
Rahul Borate
Rahul.Borate at sailpoint.com
Thu Jun 30 05:57:43 UTC 2011
Hi all,
I just performed a test which fail miserably. I have two nodes node-1 and
node-2
Global file system /gfs is on node-1.
Two HA services running on node-1. If I unplug the cables for node 1 then
those two services should transfers to Node-2. But node-2 did not take over
the services.
But if I do proper shutdown/reboot on node-1 then those two services are
transferring to node-2 without problem.
Please Help!
clustat from node-2 before unplug of cable for node-1:
[root at Node-2 ~]# clustat
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
Node-1 1
Online, rgmanager
Node-2 2
Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------
-----
service:nfs Node-1
started
service:ESS_HA Node-1 started
clustat from node-2 After unplug of cable for node-1:
[root at Node-2 ~]# clustat
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
Node-1 1
Offline
Node-2 2
Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------
-----
service:nfs Node-1
started
service:ESS_HA Node-1 started
/etc/cluster/cluster.conf:
[root at Node-2 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="54" name="idm_cluster">
<fence_daemon post_fail_delay="0" post_join_delay="120"/>
<clusternodes>
<clusternode name="Node-1" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="Node-2" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="nfs" ordered="0"
restricted="1">
<failoverdomainnode name="Node-1"
priority="1"/>
<failoverdomainnode name="Node-2"
priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<clusterfs device="/dev/vg00/mygfs"
force_unmount="0" fsid="59408" fstype="gfs" mountpoint="/gfs" name="gfs"
options=""/>
<ip address="10.128.107.229" monitor_link="1"/>
<script file="/gfs/ess_clus/HA/clusTest.sh"
name="ESS_HA_test"/>
<script file="/gfs/clusTest.sh" name="Clus_Test"/>
</resources>
<service autostart="1" name="nfs">
<clusterfs ref="gfs"/>
<ip ref="10.128.107.229"/>
</service>
<service autostart="1" domain="nfs" name="ESS_HA"
recovery="restart">
<script ref="ESS_HA_test"/>
<clusterfs ref="gfs"/>
<ip ref="10.128.107.229"/>
</service>
</rm>
</cluster>
[root at Node-2 ~]#
Node2: tail –f /var/log/message
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] CLM CONFIGURATION CHANGE
Jun 29 18:20:49 vm-idm02 fenced[1706]: vm-idm01 not a cluster member after 0
sec post_fail_delay
Jun 29 18:20:49 vm-idm02 kernel: dlm: closing connection to node 1
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] New Configuration:
Jun 29 18:20:49 vm-idm02 fenced[1706]: fencing node "vm-idm01"
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] r(0)
ip(10.128.107.224)
Jun 29 18:20:49 vm-idm02 fenced[1706]: fence "vm-idm01" failed
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] Members Left:
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] r(0)
ip(10.128.107.223)
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] Members Joined:
Jun 29 18:20:49 vm-idm02 openais[1690]: [SYNC ] This node is within the
primary component and will provide service.
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] CLM CONFIGURATION CHANGE
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] New Configuration:
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] r(0)
ip(10.128.107.224)
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] Members Left:
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] Members Joined:
Jun 29 18:20:49 vm-idm02 openais[1690]: [SYNC ] This node is within the
primary component and will provide service.
Jun 29 18:20:49 vm-idm02 openais[1690]: [TOTEM] entering OPERATIONAL state.
Jun 29 18:20:49 vm-idm02 openais[1690]: [CLM ] got nodejoin message
10.128.107.224
Jun 29 18:20:49 vm-idm02 openais[1690]: [CPG ] got joinlist message from
node 2
Jun 29 18:20:54 vm-idm02 fenced[1706]: fencing node "Node-1"
Jun 29 18:20:54 vm-idm02 fenced[1706]: fence "Node-1" failed
Jun 29 18:20:59 vm-idm02 fenced[1706]: fencing node "Node-1"
Regards,
Rahul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110630/e41d2116/attachment.htm>
More information about the Linux-cluster
mailing list