[Linux-cluster] "openais[XXXX]" [TOTEM] Retransmit List: XXXXX" in /var/log/messages

Bernard Chew bernardchew at gmail.com
Tue Apr 20 04:14:08 UTC 2010


> On Fri, Apr 9, 2010 at 4:51 PM, Bernard Chew <bernardchew at gmail.com> wrote:
>> On Thu, Apr 8, 2010 at 12:58 AM, Steven Dake <sdake at redhat.com> wrote:
>> On Wed, 2010-04-07 at 18:52 +0800, Bernard Chew wrote:
>>> Hi all,
>>>
>>> I noticed "openais[XXXX]" [TOTEM] Retransmit List: XXXXX" repeated
>>> every few hours in /var/log/messages. What does the message mean and
>>> is it normal? Will this cause fencing to take place eventually?
>>>
>> This means your network environment dropped packets and totem is
>> recovering them.  This is normal operation, and in future versions such
>> as corosync no notification is printed when recovery takes place.
>>
>> There is a bug, however, fixed in revision 2122 where if the last packet
>> in the order is lost, and no new packets are unlost after it, the
>> processor will enter a failed to receive state and trigger fencing.
>>
>> Regards
>> -steve
>>> Thank you in advance.
>>>
>>> Regards,
>>> Bernard Chew
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> Thank you for the reply Steve!
>
> The cluster was running fine until last week where 3 nodes restarted
> suddenly. I suspect fencing took place since all 3 servers restarted
> at the same time but I couldn't find any fence related entries in the
> log. I am guessing we hit the bug you mentioned? Will the log indicate
> fencing has taken place with regards to the bug you mentioned?
>
> Also I noticed the message "kernel: clustat[28328]: segfault at
> 0000000000000024 rip 0000003b31c75bc0 rsp 00007fff955cb098 error 4"
> occasionally; is this related to the TOTEM message or they indicate
> another problem?
>
> Regards,
> Bernard Chew
>

Hi Steve.

Just wondering if you can point me to the bug mentioned?

Thank you.

Regards,
Bernard




More information about the Linux-cluster mailing list