[Linux-cluster] Possible cman init script race condition

Fri Sep 28 14:27:30 UTC 2007

On Fri, Sep 28, 2007 at 11:12:40AM +0200, Borgstr?m Jonas wrote:
> Anyone with an idea why a "sleep 30" is needed for fenced to be able to
> join the fence group properly?
> 
> Even though this workaround appears to work it would be nice to have a
> more solid solution. Since now I will need to remember to patch the init
> script every time it's updated.

We never got to the bottom of what the problem is AFAIK.

> > > 1190645954 client 3: dump    <--- Before killing prod-db1
> > > 1190645985 stop default
> > > 1190645985 start default 3 members 2
> > > 1190645985 do_recovery stop 2 start 3 finish 1
> > > 1190645985 finish default 3
> > > 1190646008 client 3: dump    <--- After killing prod-db1
> > 
> > Node 1 isn't fenced here because it never completed joining the fence 
> > group above.

This is the problem we need to debug.  Here's what I suggested before to
do that:

"A 'group_tool -v' here should show the state of the fence group still in
transition.  Could you run that, plus a 'group_tool dump' at this point,
in addition to the 'dump fence' you have.  And please run those commands
on both nodes."

Dave