[Linux-cluster] Clustat shows wrong service status
Agnieszka Kukałowicz
qqlka at nask.pl
Thu Feb 28 08:39:03 UTC 2008
> > Clustat shows something like that:
> >
> > Member Name ID Status
> > ------ ---- ---- ------
> > w2.local 1 Offline
> > w1.local 2 Online, Local,
> > rgmanager
>
> One of two things is likely:
>
> (a) The node has not been fenced yet, or
I've checked fencing. I have 2 methods of fencing: Apc Power Switch, and
manual. This is from my log file (manual fencing):
Feb 28 07:50:21 w1 kernel: dlm: connecting to 1
Feb 28 07:50:23 w1 openais[11475]: [TOTEM] The token was lost in the
OPERATIONAL state.
Feb 28 07:50:28 w1 openais[11475]: [CLM ] CLM CONFIGURATION CHANGE
Feb 28 07:50:28 w1 openais[11475]: [CLM ] New Configuration:
Feb 28 07:50:28 w1 kernel: dlm: closing connection to node 1
Feb 28 07:50:28 w1 fenced[11494]: w2.local not a cluster member after 0
sec post_fail_delay
Feb 28 07:50:28 w1 openais[11475]: [CLM ] r(0) ip(10.0.200.1)
Feb 28 07:50:28 w1 fenced[11494]: fencing node "w2.local"
Feb 28 07:50:28 w1 openais[11475]: [CLM ] Members Left:
Feb 28 07:50:28 w1 openais[11475]: [CLM ] r(0) ip(10.0.200.2)
Feb 28 07:50:28 w1 fence_manual: Node w2.local.polska.pl needs to be
reset before recovery can procede. Waiting for w2.local to rejoin the
cluster ....
I've rebooted the node manually and done "fence_ack_manual". The log
shows:
Feb 28 07:52:01 w1 fenced[11494]: fence "w2.local.polska.pl" success
But clustat is still wrong.
[root at w1 ~]group_tool dump fence
1204181252 start default 151 members 1 2
1204181252 do_recovery stop 136 start 151 finish 136
1204181252 finish default 151
1204181428 stop default
1204181428 start default 166 members 2
1204181428 do_recovery stop 151 start 166 finish 151
1204181428 add node 1 to list 1
1204181428 node "w2.local" not a cman member, cn 1
1204181428 node "w2.local" has not been fenced
1204181428 fencing node w2.local.polska.pl
1204181526 finish default 166
[root at w1 ~]# cman_tool nodes -f
Node Sts Inc Joined Name
1 X 464 w2.local
Last fenced: 2008-02-28 07:52:01 by manual_fence_w2
2 M 420 2008-02-27 11:57:10 w1.local
Last fenced: 2008-02-27 12:23:17 by apc_power_switch
I did the same with Apc Power Switch and still have the problem.
> (b) you did the manual override trick and hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=435189
The steps to reproduce the bug were:
1. start cman, clvmd, gfs, rgmanager w1.local and w2.local
2. power off w2.local
3. on w1.local do "clustat"
And I don't have situation that cman_tool nodes says:
" Last fenced 2008-02-27 15:24:16 by override"
Agnieszka Kukalowicz
More information about the Linux-cluster
mailing list