[Cluster-devel] [PATCHv2 dlm/next 0/8] fs: dlm: introduce dlm re-transmission layer

Fri Mar 12 14:52:26 UTC 2021

Hi Paolo,

On Thu, Mar 11, 2021 at 4:09 AM Paolo Abeni <pabeni at redhat.com> wrote:
>
> Hello,
>
> Thank you for the new version.
>
> On Wed, 2021-03-10 at 14:17 -0500, Alexander Aring wrote:
> > this is the final patch-series to make dlm reliable when re-connection
> > occurs. You can easily generate a couple of re-connections by running:
> >
> > tcpkill -9 -i $IFACE port 21064
> >
> > on your own to test these patches. At some time dlm will detect message
> > drops and will re-transmit messages if necessary. It introduces a new dlm
> > protocol behaviour and increases the dlm protocol version. I tested it
> > with SCTP as well and tried to be backwards compatible with dlm protocol
> > version 3.1. However I don't recommend at all to mix these versions
> > in a setup since dlm version 3.2 fixes long-term issues.
> >
> > - Alex
> >
> > changes since v2:
> >  - make timer handling pending only if messages are on air, the sync
> >    isn't quite correct there but doesn't need to be precise
> >  - use before() from tcp to check if seq is before other seq with
> >    respect of overflows
> >  - change srcu handling to hold srcu in all places where nodes are
> >    referencing - we should not get a disadvantage of holding that
> >    lock. We should update also lowcomms regarding to that.
> >  - add some WARN_ON() to check that nothing in send/recv is going
> >    anymore otherwise it's likely an issue.
> >  - add more future work regarding to fencing of nodes if over
> >    cluster manager timeout/bad seq happens
> >  - add note about missing length size check of tail payload
> >    (resource name length) regarding to the receive buffer
> >  - remove some include which isn't necessary in recoverd.c
>
> I plan/hope to go through this iteration at the very end of this week
> or early next one.
>
> I just noticed that some email from you targeting netdev landed in my
> spam folder thanks to our corporate anti-spam filter. So I possibly
> lost some replies from you. If you already answered the following, I'm
> sorry I lost that but it's not my fault! Please kindly resend the
> message ;)
>

ok. I will answer it below.

> The relevant questions was/are:
>
> - is there git tree avail with all the series applied, to help with the
> review?

They are based on [0]. You can apply them or I can upload them in
gitlab if you want.

> - DEFAULT_BUFFER_SIZE == LOWCOMMS_MAX_TX_BUFFER_LEN in current net-
> next, so looks like a change below is actually a no op ?!?

It's LOWCOMMS_MAX_TX_BUFFER_LEN updated in patch 7/8:

#define LOWCOMMS_MAX_TX_BUFFER_LEN     (DEFAULT_BUFFER_SIZE -
DLM_MIDCOMMS_OPT_LEN)

whereas DEFAULT_BUFFER_SIZE is the maximum possible dlm message size
at socket layer level. Mainly this is limited because we can get a
maximum page buffer allocation for the dlm application layer only and
the application layer cannot deal with any additional logic to handle
more. However that is also an architecture thing, because it needs to
be the minimal page buffer size of all architecture due compatibility
with other architectures. I introduced a check which should report
problems if architectures doesn't support it:

BUILD_BUG_ON(PAGE_SIZE < DEFAULT_BUFFER_SIZE);

LOWCOMMS_MAX_TX_BUFFER_LEN is the length after the encapsulation
header, the size "what the dlm application layer can maximum put in a
message" on application layer level.

The names are misnamed, it's on my list to update them. Any
recommendations (MAX_DLM_SOCKET_BUFFER_SIZE/MAX_DLM_APP_BUFFER_SIZE)?

> - Could you please add more info WRT the reference to unaligned memory
> access in the code comments? Which field[s] is[are] subject to that?
>

none of these fields. Some of the DLM messages have a variable sized
payload at the tail. It's a string value. This string length needs to
be 8 byte aligned. The kernel violate this, however there exists an
userspace API and the user needs to define this string value. One user
is lvm, lvm uses an unique identifier the UUID of the volume for this
string value, the UUID string value is 36 chars. It will end in a
message boundary issue which will/can end in unaligned memory access
at the sending side (because of some internal buffer handling) and the
receiving side. I tried to fix it by filling zero bytes to this string
but the problem is more complicated as David Teigland pointed out to
still have backwards compatibility. I would like to schedule to fix
the issue in a next major version bump of dlm.

Thanks for your review.

- Alex

[0] https://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm.git/log/?h=next