[Linux-cluster] Clustat shows wrong service status

Agnieszka Kukałowicz qqlka at nask.pl
Thu Feb 28 08:39:03 UTC 2008



> > Clustat shows something like that:
> >
> > Member Name                        ID   Status
> >   ------ ----                        ---- ------
> >   w2.local					1 Offline
> >   w1.local					2 Online, Local,
> > rgmanager
> 
> One of two things is likely:
> 
> (a) The node has not been fenced yet, or


I've checked fencing. I have 2 methods of fencing: Apc Power Switch, and
manual. This is from my log file (manual fencing):

Feb 28 07:50:21 w1 kernel: dlm: connecting to 1
Feb 28 07:50:23 w1 openais[11475]: [TOTEM] The token was lost in the
OPERATIONAL state.
Feb 28 07:50:28 w1 openais[11475]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 28 07:50:28 w1 openais[11475]: [CLM  ] New Configuration:
Feb 28 07:50:28 w1 kernel: dlm: closing connection to node 1
Feb 28 07:50:28 w1 fenced[11494]: w2.local not a cluster member after 0
sec post_fail_delay
Feb 28 07:50:28 w1 openais[11475]: [CLM  ]      r(0) ip(10.0.200.1)
Feb 28 07:50:28 w1 fenced[11494]: fencing node "w2.local"
Feb 28 07:50:28 w1 openais[11475]: [CLM  ] Members Left:
Feb 28 07:50:28 w1 openais[11475]: [CLM  ]      r(0) ip(10.0.200.2)
Feb 28 07:50:28 w1 fence_manual: Node w2.local.polska.pl needs to be
reset before recovery can procede.  Waiting for w2.local to rejoin the
cluster ....
I've rebooted the node manually and done "fence_ack_manual". The log
shows:

Feb 28 07:52:01 w1 fenced[11494]: fence "w2.local.polska.pl" success

But clustat is still wrong.

[root at w1 ~]group_tool dump fence
1204181252 start default 151 members 1 2
1204181252 do_recovery stop 136 start 151 finish 136
1204181252 finish default 151
1204181428 stop default
1204181428 start default 166 members 2
1204181428 do_recovery stop 151 start 166 finish 151
1204181428 add node 1 to list 1
1204181428 node "w2.local" not a cman member, cn 1
1204181428 node "w2.local" has not been fenced
1204181428 fencing node w2.local.polska.pl
1204181526 finish default 166

[root at w1 ~]# cman_tool nodes -f
Node  Sts   Inc   Joined               Name
   1   X    464                        w2.local
       Last fenced:   2008-02-28 07:52:01 by manual_fence_w2
   2   M    420   2008-02-27 11:57:10  w1.local
       Last fenced:   2008-02-27 12:23:17 by apc_power_switch

I did the same with Apc Power Switch and still have the problem.


> (b) you did the manual override trick and hit this bug:
>     https://bugzilla.redhat.com/show_bug.cgi?id=435189

The steps to reproduce the bug were:
1. start cman, clvmd, gfs, rgmanager w1.local and w2.local
2. power off w2.local
3. on w1.local do "clustat"

And I don't have situation that cman_tool nodes says: 
" Last fenced   2008-02-27 15:24:16 by override"

Agnieszka Kukalowicz




More information about the Linux-cluster mailing list