[Linux-cluster] Starter Cluster / GFS

Thu Nov 11 05:48:10 UTC 2010

Digimer,

Again, the heuristic you gave does not pass the data centre operational sanity test.

First of all, in data centres everything is redundant, so you have 2 core switches.  Of course you could ping both of them and have some NAND logic. That is not important. 

The point is that no matter what you'd do, your cluster cannot fix the network. So, fencing nodes on network failure is the last thing you want to do. You loose warm database caches, user sessions and incomplete transactions. Disk quorum times out in 10 seconds or so. A typical network meltdown due to spanning tree recalculation is 40 seconds. If the proposed heuristic was applied to the 7 node clusters they all will murder each other and there will be nothing left. You'd convert a localised, short term network problem into a cluster wide disaster.

In fact, I have yet to see a heuristic that would make sense in real world. I cannot think of one.

Regards,

Chris Jankowski

-----Original Message-----
From: Digimer [mailto:linux at alteeve.com] 
Sent: Thursday, 11 November 2010 15:30
To: linux clustering
Cc: Jankowski, Chris
Subject: Re: [Linux-cluster] Starter Cluster / GFS

On 10-11-10 10:29 PM, Jankowski, Chris wrote:
> Digimer,
> 
> 1.
> Digimer wrote:
>>>> Both partitions will try to fence the other, but the slower will lose and get fenced before it can fence.
> 
> Well, this is certainly not my experience in dealing with modern rack mounted or blade servers where you use iLO (on HP) or DRAC (on Dell).
> 
> What actually happens in two node clusters is that both servers issue the fence request to the iLO or DRAC. It gets processed and *both* servers get powered off.  Ouch!!  Your 100% HA cluster becomes 100% dead cluster.

That is somewhat frightening. My experience is limited to stock IPMI and Node Assassin. I've not seen a situation where both die. I'd strongly suggest that a bug be filed.

> 2.
> Your comment did not explain what role the quorum disk plays in the cluster.  Also, if there are any useful cluster quorum disk heuristics that can be used in this case.
> 
> Thanks and regards,
> 
> Chris Jankowski

Ah, the idea is that, with the quorum disk (ignoring heuristics for the moment), if only one node is left alive, the quorum disk will contribute sufficient votes for quorum to be achieved. Of course, this depends on the node(s) having access to the qdisk still.

Now for heuristics; Consider this;

you have a 7-node cluster;
- Each node gets 1 vote.
- The qdisk gets 6 votes.
- Total votes is 13, quorum then is >= 7.

You cluster partitions, say from a network failure. Six nodes separate from a core switch, while one happens to still have access to a critical route (say, to the Internet). The heuristic test (ie: pinging an external server) will pass for the 1 node and fail for the six others.

The one node with access to the critical route will be the one to get the votes of the quorum disk (1 + 6 = 7, quorum!) while the other six will get six votes (1 + 1 + 1 + 1 + 1 + 1 = 6, no quorum). The six nodes will lose and be fenced and will not be able to rejoin the cluster until they regain access to that critical route.

--
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org