[Linux-cluster] Add option SO_LINGER to dlm sctp socket when the other endpoint is down.

David Teigland teigland at redhat.com
Tue Nov 19 16:51:44 UTC 2013


On Tue, Nov 19, 2013 at 03:21:41PM +0100, Lars Marowsky-Bree wrote:
> The goal here is that we know the other endpoint is down (we received a
> node down event and have completed fencing at that stage). Hence,
> SO_LINGER to speed up the shutdown of the socket seems appropriate.

Should your patch do the same with tcp?
Is this problem especially prevalent with sctp?

With the patch, how much more likely would it be for data from a previous
connection to interfere with a new connection?  (I had this problem some
years ago, and added some safeguards to deal with it, but I don't think
they are perfect.  There are cases where a very short time separates
connections being closed and new connections being created.)

> (We may actually only want to set SO_LINGER for the node down event
> case, not generally. On receiving node down, set SO_LINGER as described
> here. Otherwise, we may hit the corner cases in the first reference; but
> we're already exposed to that today.)

I'd suggest giving this a try.

> I really would love to know how we can avoid it. We have a few customers
> who can reproduce this.

Then perhaps this happens in more realistic and unavoidable cases than the
'echo b > /proc/sysrq-trigger' example.

Dave




More information about the Linux-cluster mailing list