[Cluster-devel] Why does dlm_lock function fails when downconvert a dlm lock?

Alexander Aring aahringo at redhat.com
Wed Aug 11 20:35:16 UTC 2021


On Wed, Aug 11, 2021 at 6:41 AM Gang He <GHe at suse.com> wrote:
> Hello List,
> I am using kernel 5.13.4 (some old version kernels have the same problem).
> When node A acquired a dlm (EX) lock, node B tried to get the dlm lock, node A got a BAST message,
> then node A downcoverted the dlm lock to NL, dlm_lock function failed with the error -16.
> The function failure did not always happen, but in some case, I could encounter this failure.
> Why does dlm_lock function fails when downconvert a dlm lock? there are any documents for describe these error cases?
> If the code ignores dlm_lock return error from node A, node B will not get the dlm lock permanently.
> How should we handle such situation? call dlm_lock function to downconvert the dlm lock again?

What is your dlm user? Is it kernel (e.g. gfs2/ocfs2/md) or user (libdlm)?

I believe you are running into case [0]. Can you provide the
corresponding log_debug() message? It's necessary to insert
"log_debug=1" in your dlm.conf and it will be reported on KERN_DEBUG
in your kernel log then.


- Alex

[0] https://elixir.bootlin.com/linux/v5.14-rc5/source/fs/dlm/lock.c#L2886

More information about the Cluster-devel mailing list