[Cluster-devel] "->ls_in_recovery" not released
Menyhart Zoltan
Zoltan.Menyhart at bull.net
Wed Nov 24 16:13:40 UTC 2010
> I'd suggest getting it from cluster.git STABLE3 or RHEL6 branches instead.
Could you please indicate the exact URL?
I have got a concern about the robustness of the DLM.
The Linux rules say: one should not return to user mode while holding a lock.
This is because one should not trust the user mode programs whether they
eventually re-enter the kernel or not, in order to release the lock.
For the very same reason (one should not trust the user mode programs),
I think, the DML kernel module is not sufficiently robust.
If you have a closer look, the situation of the "dlm_recoverd" kernel thread
is quite similar to waiting for a user mode program to trigger setting free
a lock.
I can agree: it does not return to user mode.
Yet it holds the lock and goes to sleep, in an um-interruptible way, waiting
for a user action: it trusts 100 % a user mode program, that can be killed,
can bee swapped out and no room to swap it in, etc.
Instead, the DLM should always return in a few seconds, saying the caller
cannot be granted a given "dlm_lock" for a given reason.
E.g. the ocfs2 is able to handle refused lock request. It is up to the
caller to decide if s/he wants to wait more.
I think whatever the user land does, the DLM kernel module should give
a response to a "dlm_lock()" request within a short (for a human operator)
time.
Thanks for your response,
Zoltan Menyhart
More information about the Cluster-devel
mailing list