[Cluster-devel] [PATCHv4 dlm/next 00/20] fs: dlm: introduce dlm re-transmission layer
Alexander Aring
aahringo at redhat.com
Mon Jan 11 18:02:50 UTC 2021
Hi,
this is the final patch-series to make dlm reliable when re-connection
occurs. You can easily generate a couple of re-connections by running:
tcpkill -9 -i $IFACE port 21064
on your own to test these patches. At some time dlm will detect message
drops and will re-transmit messages if necessary. It introduces a new dlm
protocol behaviour and increases the dlm protocol version. I tested it
with SCTP as well and tried to be backwards compatible with dlm protocol
version 3.1. However I don't recommend at all to mix these versions
in a setup since dlm version 3.2 fixes long-term issues.
- Alex
changes since v4:
- add big midcomms file header comment about what's the idea about
midcomms layer and how it works.
- add the close mutex lock to prevent running close API call while
connection is being terimanted. However when a close call occurs
it will terminate the current termination wait until the close
lock is released. If the node is removed from the nodes hash the
lowcomms close call will occur anyway.
I added a define to insert some sleep to test this behaviour.
changes since v3:
- make dlm messages to 8 byte boundary size (more pads), because there
exists uint64_t fields and we should prepared for future 8 byte fields.
This will make it directly aligned to 4 and 2 as well.
- change unaligned memory access handling. I will not fix it yet. It
seems nobody is using dlm on an architecture which cannot handle
unaligned memory access at all (panics). However I added a note that
this is a known problem. There is a slightly performance improvement
(depends on many things e.g. if another message gets allocated after a
(len % 8) != 0 message length got allocated). However I saw that such
cases are rarely (for now some user space messages only) occur.
The receiving side is not the problem here, the sending side is it
and we run in a unaligned memory access in dlm messages fields there
as well. However, fixing sending side will fix the receiving side and
more length checks can be applied then to drop invalid message
lengths.
- be sure to remove node from hash at first at close call
I am a little bit worried about the midcomms/lowcomms close call and
the timer is running at exactly this time and maybe begins to
re-transmit messages. I thought about to stop/start the timer but now
I ended up to remove the node from the hash at first and be sure that
no readers are left when calling lowcomms close. I think this should
be fine because we "should" not receive any dlm messages from this
node while close is running.
- add patch "fs: dlm: add per node receive flush"
As I was worried about that the lowcomms close call flushes the receive
work on a socket close and we already removed the node from the hash,
I added a functionality to flush the receive work right before we remove
the node. With this functionality we male sure we don't receive any
messages after we removed the node from the hash.
- add patch "fs: dlm: remove obsolete code and comment"
- add patch "fs: dlm: check for invalid namelen"
changes since v2:
- add patch "fs: dlm: set connected bit after accept"
- add patch "fs: dlm: set subclass for othercon sock_mutex"
- change title "fs: dlm: public utils header utils" to
"fs: dlm: public header in out utility"
- squash "fs: dlm: add check for minimum allocation length" into
"fs: dlm: remove unaligned memory access handling"
- make the midcomms timeout a little bit longer, because I saw
sometimes it's not enough (I hope that was the reason)
- midcomms: fix version mismatch handling
- remove DLM_ACK in invalid sequence handling
- add additional length check in dlm_opts_check_msglen()
- use optlen to skip DLM_OPTS header
- add DLM_MSGLEN_IS_NOT_ALIGNED to check if msglen is proper
aligned before parsing
- change dlm_midcomms_close() to close first then cut queues,
because lowcomms close will may flush some messages which
need to be dropped afterwards if seq doesn't fit.
- remove newline in "fs: dlm: add more midcomms hooks"
- may more changes which I don't have on track.
- change defines handling for calculating max application buffer
size vs max allocation size
- run aspell on my commit msgs
Alexander Aring (20):
fs: dlm: set connected bit after accept
fs: dlm: set subclass for othercon sock_mutex
fs: dlm: add errno handling to check callback
fs: dlm: add check if dlm is currently running
fs: dlm: change allocation limits
fs: dlm: public header in out utility
fs: dlm: use GFP_ZERO for page buffer
fs: dlm: simplify writequeue handling
fs: dlm: add more midcomms hooks
fs: dlm: make buffer handling per msg
fs: dlm: make new buffer handling softirq ready
fs: dlm: add functionality to re-transmit a message
fs: dlm: move out some hash functionality
fs: dlm: remove unaligned memory access handling
fs: dlm: add union in dlm header for lockspace id
fs: dlm: add per node receive flush
fs: dlm: add reliable connection if reconnect
fs: dlm: don't allow half transmitted messages
fs: dlm: remove obsolete code and comment
fs: dlm: check for invalid namelen
fs/dlm/config.c | 60 +-
fs/dlm/dlm_internal.h | 41 +-
fs/dlm/lock.c | 16 +-
fs/dlm/lockspace.c | 5 +-
fs/dlm/lowcomms.c | 288 +++++++---
fs/dlm/lowcomms.h | 27 +-
fs/dlm/member.c | 16 +
fs/dlm/member.h | 1 +
fs/dlm/midcomms.c | 1266 +++++++++++++++++++++++++++++++++++++++--
fs/dlm/midcomms.h | 10 +
fs/dlm/rcom.c | 61 +-
fs/dlm/recoverd.c | 3 +
fs/dlm/user.c | 3 +
fs/dlm/util.c | 10 +-
fs/dlm/util.h | 2 +
15 files changed, 1628 insertions(+), 181 deletions(-)
--
2.26.2
More information about the Cluster-devel
mailing list