[Linux-cluster] Node is randomly fenced

Schaefer, Micah Micah.Schaefer at jhuapl.edu
Tue Jun 17 14:27:29 UTC 2014


I am running Red Hat 6.4 with the HA/ load balancing packages from the
install DVD. 


-bash-4.1$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.4 (Santiago)

-bash-4.1$ corosync -v
Corosync Cluster Engine, version '1.4.1'
Copyright (c) 2006-2009 Red Hat, Inc.







On 6/17/14, 8:41 AM, "Christine Caulfield" <ccaulfie at redhat.com> wrote:

>On 12/06/14 20:06, Digimer wrote:
>> Hrm, I'm not really sure that I am able to interpret this without making
>> guesses. I'm cc'ing one of the devs (who I hope will poke the right
>> person if he's not able to help at the moment). Lets see what he has to
>> say.
>>
>> I am curious now, too. :)
>>
>> On 12/06/14 03:02 PM, Schaefer, Micah wrote:
>>> Node4 was fenced again, I was able to get some debug logs (below), a
>>>new
>>> message :
>>>
>>> "Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the
>>>OPERATIONAL
>>> state.³
>>>
>>>
>>> Rest of corosync logs
>>>
>>> http://pastebin.com/iYFbkbhb
>>>
>>>
>>> Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state.
>>> Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the
>>> membership and a new membership was formed.
>>> Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
>>> flushing membership messages.
>>> Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
>>> flushing membership messages.
>
>
>I'm concerned that the pause messages are repeating like that, it looks
>like it might be a fixed bug. What version of corosync do you have?
>
>Chrissie
>
>-- 
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster





More information about the Linux-cluster mailing list