[Cluster-devel] [PATCH] dlm_controld.pcmk: Fix membership change judging issue
Jiaju Zhang
jjzhang.linux at gmail.com
Fri May 14 11:28:59 UTC 2010
I'll test the patch later on. But I may not finished this testing by
today because
I have some problem to access the hardwre at this moment.
Thanks a lot ;-)
Jiaju
On Fri, May 14, 2010 at 6:15 PM, Andrew Beekhof <andrew at beekhof.net> wrote:
> On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong at novell.com> wrote:
>> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew at beekhof.net> wrote:
>>>
>>> Does the behavior still occur with pacemaker 1.1.2?
>>>
>>
>> Yes.
>>
>> For the record, the most minimal testcase I've managed for this
>> so far is as follows (substitute "/etc/init.d/corosync start" or
>> whatever for "rcopenais start" if you're not on something SUSE-based):
>>
>> 1) Configure corosync/openais on two nodes.
>> Do not start the cluster yet.
>>
>> 2) On one node:
>>
>> # rm /var/lib/heartbeat/crm/*
>> # rcopenais start
>> # while ! crm_mon -1 | grep -qi online; do \
>> echo -n "." ; sleep 5 ; done
>>
>> 3) Now we have one node online, configure Pacemaker:
>>
>> # cat <<CONF | crm configure
>> primitive dlm ocf:pacemaker:controld
>> primitive clvm ocf:lvm2:clvmd
>> group g dlm clvm
>> clone c g meta interleave="true"
>> property stonith-enabled="false"
>> property no-quorum-policy="ignore"
>> commit
>> CONF
>>
>> Watch "crm_mon -r" until that clone comes online.
>> Should only take a few seconds.
>>
>> 4) On the other node:
>>
>> # rm /var/lib/heartbeat/crm/*
>> # rcopenais start
>>
>> The first node will now either wedge up spectacularly, and/or
>> dlm_recoverd and clvmd will be stuck in D state on both nodes.
>
> Presumably each thinks the other node isn't a member?
> Perhaps something like this will help:
>
> diff -r b59c27dc114a lib/ais/plugin.c
> --- a/lib/ais/plugin.c Wed May 12 10:51:56 2010 +0200
> +++ b/lib/ais/plugin.c Fri May 14 12:12:33 2010 +0200
> @@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
> ais_notice("Respawning failed child process: %s",
> pcmk_children[lpc].name);
> spawn_child(&(pcmk_children[lpc]));
> - } else {
> - send_cluster_id();
> }
> + send_cluster_id();
> }
> }
> sched_yield ();
> @@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
> }
> }
> }
> + send_cluster_id();
>
> return 0;
> }
>
>
More information about the Cluster-devel
mailing list