[Linux-cluster] Two-node cluster: Node attempts stateful merge after clean reboot

Digimer lists at alteeve.ca
Wed Sep 11 17:33:57 UTC 2013


On 11/09/13 11:50, Alan Brown wrote:
> On 11/09/13 13:37, Digimer wrote:
>
>> The problem is that, if you enable cman on boot, the fenced node will
>> try to join the cluster, fail to reach it's peer after post_join_delay
>> (default 6 seconds, iirc) and fence it's peer. That peer reboots, starts
>> cman, tries to connect, fenced it's peer...
>
> Qdisk is a good way of preventing this kind of problem.

If you have a SAN.

>> The easiest way to avoid this in 2-node clusters is to not let
>> cman/rgmanager start automatically.
>
> For some values of "easy"
>
> Your solution means every startup requires manual intervention.
>
> Qdisk will let the cluster come up/restart nodes without needing human
> help at startup.

The way I see it, and I've had the clusters in production for years in 
various locations, fencing happens extremely rarely. If a node gets 
fenced, *something* went wrong and I will want to investigate before I 
rejoin the node. So the fact that I have to manually start 
cman/rgmanager is a trivial cost.

Out of about 20 2-node clusters, I've had maybe three or four fence 
events in four years, and all of them where from failing equipment. So 
in all cases, not rejoining the cluster was safest anyway.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list