[Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful
Eric Ren
zren at suse.com
Wed May 18 06:53:00 UTC 2016
Hi David,
Ken Gaillot got me with this question:
Since corosync/pcmk can be healed from such a case, why not DLM?
Please look at detailed discussion here:
[1] https://github.com/ClusterLabs/pacemaker/pull/839
Here is my thoughts, but I'm not sure, CMIIW please:
time: T; cluster:A, B, C; and if we have a lockspace named after $uuid
for a shared disk volume, and a CPG for lockspace $uuid; $uuid CPG has
members of A, B and C when things are OK, but:
T: quorum lost; cluster partitions into 3 parts; lockspace $uuid cannot
perform any lockspace operations because cluster is not quorate;
T+1: quorum regained; dlm_controld daemon CPG has not done its
merging/fencing stuff; so here are 2 questions:
Q1: what's stateful merged node?
I've seen the comments within code;-) It means a lockspace has been on
the node before it sends protocol message?
Q2: what if we add the stateful merged nodes to dlm_controld daemon cpg
instead of fencing them?
if so, CPG $uuid now, e.g. from the perspective of A, may has only one
memeber - A itself, it can perform lockspace now because cluster is
quorate now (and if we skip fencing); B and C do likewise; then for each
node, it looks like every node own this volume; so corruption may happen?
Thanks a lot,
Eric
On 05/17/2016 08:10 PM, Eric Ren wrote:
> Hi David,
> This is just a draft patch for you to review;-) There's an issue I'm
> not sure: where should we clear "stateful_merge_wait"?
>
> And I need more communications with pacemaker guys and more time for testing.
> I will send you the formal patch if things get done;-)
More information about the Cluster-devel
mailing list