[Cluster-devel] [DLM PATCH] dlm_controld: outputs explicit info about stateful

Eric Ren zren at suse.com
Wed May 18 06:53:00 UTC 2016


Hi David,

Ken Gaillot got me with this question:
Since corosync/pcmk can be healed from such a case, why not DLM?
Please look at detailed discussion here:
        [1] https://github.com/ClusterLabs/pacemaker/pull/839

Here is my thoughts, but I'm not sure, CMIIW please:
time: T; cluster:A, B, C; and if we have a lockspace named after $uuid 
for a shared disk volume, and a CPG for lockspace $uuid; $uuid CPG has
members of A, B and C when things are OK, but:

T: quorum lost; cluster partitions into 3 parts; lockspace $uuid cannot 
perform any lockspace operations because cluster is not quorate;

T+1: quorum regained; dlm_controld daemon CPG has not done its 
merging/fencing stuff; so here are 2 questions:
Q1: what's stateful merged node?
I've seen the comments within code;-) It means a lockspace has been on 
the node before it sends protocol message?

Q2: what if we add the stateful merged nodes to dlm_controld daemon cpg 
instead of fencing them?

if so, CPG $uuid now, e.g. from the perspective of A, may has only one 
memeber - A itself, it can perform lockspace now because cluster is 
quorate now (and if we skip fencing); B and C do likewise; then for each 
node, it looks like every node own this volume; so corruption may happen?

Thanks a lot,
Eric

On 05/17/2016 08:10 PM, Eric Ren wrote:
> Hi David,
> This is just a draft patch for you to review;-) There's an issue I'm
> not sure: where should we clear "stateful_merge_wait"?
>
> And I need more communications with pacemaker guys and more time for testing.
> I will send you the formal patch if things get done;-)





More information about the Cluster-devel mailing list