[Linux-cluster] Service not relocated after successful fence of its owner
Maykel Moya
moya at latertulia.org
Sun Apr 26 06:19:47 UTC 2009
El vie, 24-04-2009 a las 13:41 -0400, Lon Hohberger escribió:
> On Wed, 2009-04-22 at 01:40 -0400, Maykel Moya wrote:
> > I still can't get my service automatically relocated after
> > _successfully_ fencing its owner node.
> >
> > I have a 4 node cluster n{1,2,3,4} and 4 services s{1,2,3,4}. My fence
> > device uses 'off' as action, so a successful fence means the node is
> > off.
> >
> > Say, s4 running on n4 and I do a 'ip link set eth0 down' on n4. n4 get
> > successfully fenced but s4 is never relocated to one of the other
> > available nodes which means s4 is not available.
> >
> > Find attached the cluster.conf.
>
> Conf looks okay, what do the logs say? Any other errors? It looks like
> things should be working correctly.
The relevant part
----
Apr 26 02:08:29 e1b01 kernel: [345031.041719] dlm: closing connection to
node 4
Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event
Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <info> State change: e1b04 DOWN
Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event
Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event
Apr 26 02:08:29 e1b01 clurgmgrd[3880]: <debug> Membership Change Event
Apr 26 02:08:29 e1b01 fenced[3850]: e1b04 not a cluster member after 0
sec post_fail_delay
Apr 26 02:08:29 e1b01 fenced[3850]: fencing node "e1b04"
Apr 26 02:08:40 e1b01 fenced[3850]: can't get node number for node
�ҋ#010Pҋ#010#020
Apr 26 02:08:40 e1b01 fenced[3850]: fence "e1b04" success
----
> 'cman_tool services' and 'cman_tool nodes' output would be helpful,
> too.
It's a bit odd, clustat saying that node e1b04 is offline but service4
is owned by e1b04 and started.
e1b01:/var/log# clustat
Cluster Status for cinfomed @ Sun Apr 26 02:09:07 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
e1b01 1 Online, Local, rgmanager
e1b02 2 Online, rgmanager
e1b03 3 Online, rgmanager
e1b04 4 Offline
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:vmail1_svc e1b01 started
service:vmail2_svc e1b04 started
service:vmail3_svc e1b03 started
service:vmail4_svc e1b04 started
e1b01:/var/log# cman_tool services
type level name id state
fence 0 default 00010001 none
[1 2 3]
dlm 1 rgmanager 00010004 none
[1 2 3]
e1b01:/var/log# cman_tool nodes
Node Sts Inc Joined Name
1 M 1404 2009-04-22 02:17:09 e1b01
2 M 1432 2009-04-22 02:51:31 e1b02
3 M 1412 2009-04-22 02:17:11 e1b03
4 X 1408 e1b04
Forgot to mention
e1b01:/var/log# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 5.0.1 (lenny)
Release: 5.0.1
Codename: lenny
e1b01:/var/log# cman_tool -V
cman_tool 2.03.09 (built Nov 3 2008 18:22:25)
Copyright (C) Red Hat, Inc. 2004-2008 All rights reserved.
e1b01:/var/log# uname -r
2.6.26-2-686
----
This is the only thing I'm missing to deploy, have tried fencing with
'reboot', with 'off', setting service recovery policy and 'relocate' and
nothing solves it. If a node goes down, the service is not migrated
after fence it.
Regards,
maykel
More information about the Linux-cluster
mailing list