[Linux-cluster] Possible cman init script race condition
teigland at redhat.com
Fri Sep 28 14:27:30 UTC 2007
On Fri, Sep 28, 2007 at 11:12:40AM +0200, Borgstr?m Jonas wrote:
> Anyone with an idea why a "sleep 30" is needed for fenced to be able to
> join the fence group properly?
> Even though this workaround appears to work it would be nice to have a
> more solid solution. Since now I will need to remember to patch the init
> script every time it's updated.
We never got to the bottom of what the problem is AFAIK.
> > > 1190645954 client 3: dump <--- Before killing prod-db1
> > > 1190645985 stop default
> > > 1190645985 start default 3 members 2
> > > 1190645985 do_recovery stop 2 start 3 finish 1
> > > 1190645985 finish default 3
> > > 1190646008 client 3: dump <--- After killing prod-db1
> > Node 1 isn't fenced here because it never completed joining the fence
> > group above.
This is the problem we need to debug. Here's what I suggested before to
"A 'group_tool -v' here should show the state of the fence group still in
transition. Could you run that, plus a 'group_tool dump' at this point,
in addition to the 'dump fence' you have. And please run those commands
on both nodes."
More information about the Linux-cluster