[Cluster-devel] [RFC PATCH dlm/next 15/16] fs: dlm: add reliable connection if reconnect

Alexander Ahring Oder Aring aahringo at redhat.com
Tue Nov 17 18:48:08 UTC 2020


Hi,

On Fri, Nov 13, 2020 at 5:58 PM Alexander Aring <aahringo at redhat.com> wrote:
>
> This patch introduce to make a tcp lowcomms connection reliable even if
> reconnects occurs. This is done by an application layer retransmission
> handling and sequence numbers in dlm protocols. There are three new dlm
> commands:
>
> DLM_OPTS:
>
..
> +                       /* we only alloc a new node at receiving for the above
> +                        * RCOM messages. It can be that the other side is
> +                        * already gone and we cannot ack FIN messages anymore,
> +                        * we ignore it until the other side runs into an
> +                        * timeout. FIN messages are application stateless and
> +                        * it's not imortant to be acked since it is the last
> +                        * message before disconnect.
> +                        *
> +                        * we don't print a warning in this case.
> +                        */
> +                       switch (p->opts.o_nextcmd) {
> +                       case DLM_ACK:
> +                               /* ignore ACK as well */
> +                               fallthrough;

This can't happen, DLM_ACK is never encapsulated by DLM_OPTS. I think
I saw once DLM_ACK messages arriving but the node was already
disconnected. I think we should print warnings in this case, the
warning is here when we are already disconnected and we don't see one
of the initial dlm messages as a new connection. This should always be
the case and if not something weird is going on.

One of my last changes in this patchset was to change the hook for the
remove member function, may that fixed the issue why I saw the DLM_ACK
but no node was "active".

I will remove this case.

- Alex




More information about the Cluster-devel mailing list