[Linux-cluster] [RFC] dlm: keep listening connection alive with sctp mode
Lidong Zhong
lzhong at suse.com
Thu Jun 12 06:42:58 UTC 2014
Currently when a node close a connection, it will send a user initiated
ABORT instead of gracefully shut down(ece35848c184). Sadly it also could
close the listening connection, so this node will fail to rejoin the
cluster.
I setup two node of cluster to do this test. While the cluster works
fine, the connection looks like this:
clt-n2-sles12b7-2:~ # netstat -apn|grep sctp
sctp 147.2.208.197:21064 LISTEN -
sctp 0 4 0.0.82.72:62887 147.2.208.197:21064 ESTABLISHED -
and if I reboot the other node or stop running dlm, and all the
connections get lost:
clt-n2-sles12b7-2:~ # netstat -apn | grep sctp
clt-n2-sles12b7-2:~ #
so if the other node tries to rejoin the cluster, the following message
flushes because of no listening port now.
dlm: Trying to connect to 192.168.3.4
dlm: Can't start SCTP association - retrying
dlm: Retry sending 64 bytes to node id 318951621
dlm: Retrying SCTP association init for node 318951621
Signed-off-by: Lidong Zhong <lzhong at suse.com>
---
fs/dlm/lowcomms.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 1e5b453..d08e079 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -617,6 +617,11 @@ static void retry_failed_sctp_send(struct connection *recv_con,
int nodeid = sn_send_failed->ssf_info.sinfo_ppid;
log_print("Retry sending %d bytes to node id %d", len, nodeid);
+
+ if (!nodeid) {
+ log_print("Shouldn't resend data via listening connection.");
+ return;
+ }
con = nodeid2con(nodeid, 0);
if (!con) {
--
1.8.1.4
More information about the Linux-cluster
mailing list