[Linux-cluster] [RFC] dlm: keep listening connection alive with sctp mode

Lidong Zhong lzhong at suse.com
Thu Jun 12 06:42:58 UTC 2014


Currently when a node close a connection, it will send a user initiated
ABORT instead of gracefully shut down(ece35848c184). Sadly it also could
close the listening connection, so this node will fail to rejoin the
cluster.

I setup two node of cluster to do this test. While the cluster works
fine, the connection looks like this:
clt-n2-sles12b7-2:~ # netstat -apn|grep sctp
sctp   147.2.208.197:21064  LISTEN      -
sctp   0   4 0.0.82.72:62887   147.2.208.197:21064  ESTABLISHED -

and if I reboot the other node or stop running dlm, and all the
connections get lost:
clt-n2-sles12b7-2:~ # netstat -apn | grep sctp
clt-n2-sles12b7-2:~ #

so if the other node tries to rejoin the cluster, the following message
flushes because of no listening port now.

dlm: Trying to connect to 192.168.3.4
dlm: Can't start SCTP association - retrying
dlm: Retry sending 64 bytes to node id 318951621
dlm: Retrying SCTP association init for node 318951621

Signed-off-by: Lidong Zhong <lzhong at suse.com>
---
 fs/dlm/lowcomms.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 1e5b453..d08e079 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -617,6 +617,11 @@ static void retry_failed_sctp_send(struct connection *recv_con,
 	int nodeid = sn_send_failed->ssf_info.sinfo_ppid;
 
 	log_print("Retry sending %d bytes to node id %d", len, nodeid);
+	
+	if (!nodeid) {
+		log_print("Shouldn't resend data via listening connection.");
+		return;
+	}
 
 	con = nodeid2con(nodeid, 0);
 	if (!con) {
-- 
1.8.1.4




More information about the Linux-cluster mailing list