[EnMasse] Disconnect from time to time

Gordon Sim gsim at redhat.com
Thu Mar 28 15:12:11 UTC 2019


On 28/03/2019 12:27 pm, Bob Claerhout wrote:
> Hi Gordon,
> 
> I'm sorry for the delay.
> You can find the extra logging in a wetransfer: https://we.tl/t-ych9qXsceF

Thanks! The logs show the broker specifies an idle timeout of 2.5 
seconds. The protocol trace does not contain timestamps (enabling 
tracing in the router log as opposed to PN_TRACE_FRM would give the 
timestamps), so it is hard to be completely certain, it does look like 
the router is indeed not sending heartbeats as expected at the point the 
connections is ended (based on the closest log entries that do have 
timestamps, and on the heartbeat from broker to router).

The connection prior to closing seems to have been active for nearly 5 
days (2019-03-22 11:14 to 2019-03-27 03:40) without issue. There is only 
one other occurrence of a broker timing out a connection to the router. 
It is for a different broker, and the connection there lasted from 
2019-03-22 11:14 to 2019-03-23 17:20 (again the evidence suggests the 
broker was within its rights to close the connection).

My suggestion would be to increase that idle-timeout a little (I believe 
there is an upcoming fix to enmasse to allow this to be done in config), 
but also for clients to be able to handle the detach that a broker 
disconnection may cause.

I have no obvious explanation for *why* the router did not respond in 
time. Of the 44 connections for which traces are logged in a 10 second 
period around the time of the last timeout, only two were doing anything 
other than heartbeats. Of the rest 29 sent at least one heartbeat.









More information about the enmasse mailing list