[Linux-cluster] qdiskd master election and loss of quorum
Lon H. Hohberger
lhh at redhat.com
Wed Nov 11 17:06:31 UTC 2009
On Wed, 2009-11-11 at 11:49 -0500, Lon H. Hohberger wrote:
> On Thu, 2009-11-05 at 15:28 +0100, Gianluca Cecchi wrote:
>
> > Nov 5 12:52:53 mork clurgmgrd[2633]: <notice> Member 2 shutting down
> > Nov 5 12:52:57 mork qdiskd[2214]: <info> Node 2 shutdown
>
> > Nov 5 12:55:41 mork openais[2185]: [TOTEM] The token was lost in the
> > OPERATIONAL state.
>
> That's very interesting. It looks like the what happened to cause the
> state change failures was the huge lag time between when rgmanager sent
> its "good bye kiss" and the time openais noticed the node was offline.
> The timeout was large enough that rgmanager gave up.
>
> This isn't actually the quorum disk master election problem at all...
> It's also very strange.
>
> - rgmanager should have known this was unnecessary. The other node said
> it was going away.
> - cman probably should have caused a transition sooner, I think (??)
So... rgmanager treats a node which sends the 'EXITING' message as
offline. It makes no sense why it would do this and subsequently fail
to update the cluster state.
case RG_EXITING:
if (!member_online(msg_hdr->gh_arg1))
break;
logt_print(LOG_NOTICE, "Member %d shutting down\n",
msg_hdr->gh_arg1);
member_set_state(msg_hdr->gh_arg1, 0);
node_event_q(0, msg_hdr->gh_arg1, 0, 1);
break;
You said in your previous mail that mindy shut down cleanly -- so I'm
really stumped...
-- Lon
More information about the Linux-cluster
mailing list