[Linux-cluster] Node is randomly fenced

Christine Caulfield ccaulfie at redhat.com
Tue Jun 17 12:41:07 UTC 2014


On 12/06/14 20:06, Digimer wrote:
> Hrm, I'm not really sure that I am able to interpret this without making
> guesses. I'm cc'ing one of the devs (who I hope will poke the right
> person if he's not able to help at the moment). Lets see what he has to
> say.
>
> I am curious now, too. :)
>
> On 12/06/14 03:02 PM, Schaefer, Micah wrote:
>> Node4 was fenced again, I was able to get some debug logs (below), a new
>> message :
>>
>> "Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the OPERATIONAL
>> state.“
>>
>>
>> Rest of corosync logs
>>
>> http://pastebin.com/iYFbkbhb
>>
>>
>> Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state.
>> Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>> flushing membership messages.
>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>> flushing membership messages.
>> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
>> flushing membership messages.
>> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
>> flushing membership messages.
>> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
>> flushing membership messages.


I'm concerned that the pause messages are repeating like that, it looks 
like it might be a fixed bug. What version of corosync do you have?

Chrissie




More information about the Linux-cluster mailing list