[Linux-cluster] Add option SO_LINGER to dlm sctp socket when the other endpoint is down.

Thu Nov 28 16:50:09 UTC 2013

On 2013-11-19T11:51:44, David Teigland <teigland at redhat.com> wrote:

> With the patch, how much more likely would it be for data from a previous
> connection to interfere with a new connection?  (I had this problem some
> years ago, and added some safeguards to deal with it, but I don't think
> they are perfect.  There are cases where a very short time separates
> connections being closed and new connections being created.)

We've not seen this during testing. We now have positive confirmation
not just from our tests but also customers testing this on multiple
nodes.

And I still don't see how this could happen - we close the socket once
the other node has been fenced or stopped. Short of a false-positive
fence, we shouldn't see what you describe, right?

Setting SO_LINGER just before close really doesn't make a big
difference, since we'd always want to set it. We close the connection
because we don't want to talk to the other side any more, hence we might
as well discard anything that is still in the queue.

> Then perhaps this happens in more realistic and unavoidable cases than the
> 'echo b > /proc/sysrq-trigger' example.

The actual example for the customer was a simple node crash/hard reboot
that we simulated like this.

I'm quite interested in driving this discussion forward. Anything more
we can provide?

Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde