[EnMasse] Disconnect from time to time
Gordon Sim
gsim at redhat.com
Thu Mar 28 15:12:11 UTC 2019
On 28/03/2019 12:27 pm, Bob Claerhout wrote:
> Hi Gordon,
>
> I'm sorry for the delay.
> You can find the extra logging in a wetransfer: https://we.tl/t-ych9qXsceF
Thanks! The logs show the broker specifies an idle timeout of 2.5
seconds. The protocol trace does not contain timestamps (enabling
tracing in the router log as opposed to PN_TRACE_FRM would give the
timestamps), so it is hard to be completely certain, it does look like
the router is indeed not sending heartbeats as expected at the point the
connections is ended (based on the closest log entries that do have
timestamps, and on the heartbeat from broker to router).
The connection prior to closing seems to have been active for nearly 5
days (2019-03-22 11:14 to 2019-03-27 03:40) without issue. There is only
one other occurrence of a broker timing out a connection to the router.
It is for a different broker, and the connection there lasted from
2019-03-22 11:14 to 2019-03-23 17:20 (again the evidence suggests the
broker was within its rights to close the connection).
My suggestion would be to increase that idle-timeout a little (I believe
there is an upcoming fix to enmasse to allow this to be done in config),
but also for clients to be able to handle the detach that a broker
disconnection may cause.
I have no obvious explanation for *why* the router did not respond in
time. Of the 44 connections for which traces are logged in a 10 second
period around the time of the last timeout, only two were doing anything
other than heartbeats. Of the rest 29 sent at least one heartbeat.
More information about the enmasse
mailing list