[Linux-cluster] qdiskd master election and loss of quorum
Lon H. Hohberger
lhh at redhat.com
Wed Nov 11 17:06:31 UTC 2009
On Wed, 2009-11-11 at 11:49 -0500, Lon H. Hohberger wrote:
> On Thu, 2009-11-05 at 15:28 +0100, Gianluca Cecchi wrote:
> > Nov 5 12:52:53 mork clurgmgrd: <notice> Member 2 shutting down
> > Nov 5 12:52:57 mork qdiskd: <info> Node 2 shutdown
> > Nov 5 12:55:41 mork openais: [TOTEM] The token was lost in the
> > OPERATIONAL state.
> That's very interesting. It looks like the what happened to cause the
> state change failures was the huge lag time between when rgmanager sent
> its "good bye kiss" and the time openais noticed the node was offline.
> The timeout was large enough that rgmanager gave up.
> This isn't actually the quorum disk master election problem at all...
> It's also very strange.
> - rgmanager should have known this was unnecessary. The other node said
> it was going away.
> - cman probably should have caused a transition sooner, I think (??)
So... rgmanager treats a node which sends the 'EXITING' message as
offline. It makes no sense why it would do this and subsequently fail
to update the cluster state.
logt_print(LOG_NOTICE, "Member %d shutting down\n",
node_event_q(0, msg_hdr->gh_arg1, 0, 1);
You said in your previous mail that mindy shut down cleanly -- so I'm
More information about the Linux-cluster