[Linux-cluster] Starter Cluster / GFS

Thu Nov 11 03:29:44 UTC 2010

Digimer,

1.
Digimer wrote:
>>>Both partitions will try to fence the other, but the slower will lose and get fenced before it can fence.

Well, this is certainly not my experience in dealing with modern rack mounted or blade servers where you use iLO (on HP) or DRAC (on Dell).

What actually happens in two node clusters is that both servers issue the fence request to the iLO or DRAC. It gets processed and *both* servers get powered off.  Ouch!!  Your 100% HA cluster becomes 100% dead cluster.

2.
Your comment did not explain what role the quorum disk plays in the cluster.  Also, if there are any useful cluster quorum disk heuristics that can be used in this case.

Thanks and regards,

Chris Jankowski

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Digimer
Sent: Thursday, 11 November 2010 03:41
To: linux clustering
Subject: Re: [Linux-cluster] Starter Cluster / GFS

On 10-11-10 11:09 AM, Gordan Bobic wrote:
> Digimer wrote:
>> On 10-11-10 07:17 AM, Gordan Bobic wrote:
>>>>> If you want the FS mounted on all nodes at the same time then all 
>>>>> those nodes must be a part of the cluster, and they have to be 
>>>>> quorate (majority of nodes have to be up). You don't need a quorum 
>>>>> block device, but it can be useful when you have only 2 nodes.
>>>> At term, I will have 7 to 10 nodes, but 2 at first for initial 
>>>> setup and testing. Ok, so if I have a 3 nodes cluster for exemple, 
>>>> I need at least 2 nodes for the cluster, and thus the gfs, to be up 
>>>> ? I cannot have a running gfs with only one node ?
>>> In a 2-node cluster, you can have running GFS with just one node up. 
>>> But in that case it is advisble to have a quorum block device on the SAN.
>>> With a 3 node cluster, you cannot have quorum with just 1 node, and 
>>> thus you cannot have GFS running. It will block until quorum is 
>>> re-established.
>>
>> With a quorum disk, you can in fact have one node left and still have 
>> quorum. This is because the quorum drive should have (node-1) votes, 
>> thus always giving the last node 50%+1 even with all other nodes 
>> being dead.
> 
> I've never tried testing that use-case extensively, but I suspect that 
> it is only safe to do with SAN-side fencing. Otherwise two nodes could 
> lose contact with each other and still both have access to the SAN and 
> thus both be individually quorate.
> 
> Gordan

Clustered storage *requires* fencing. To not use fencing is like driving tired; It's just a matter of time before something bad happens. That said, I should have been more clear in specifying the requirement for fencing.

Now that said, the fencing shouldn't be needed at the SAN side, though that works fine as well.

The way it works is:

In normal operation, all nodes communicate via corosync. Corosync in turn manages the distributed locking and ensures that locks are ordered across all nodes (virtual synchrony).

As soon as communication fails on one or more nodes, locks are no longer issued and all I/O is blocked until:
a) The node responds finally
or
b) A timeout is reached and corosync issues a fence against the incommunicado node(s).

Once a fence is issued, nothing will proceed until, and only until, the fence agent returns a successful fence message to the fence daemon.

In the case of a split brain (nodes partition and are up but not talking to each other), both partitions will issue a fence against the other node(s). This is now a race, often described as an old-west style duel.
Both partitions will try to fence the other, but the slower will lose and get fenced before it can fence.

With a successful fence, the surviving partition (which could be just one node), will reconfigure and then begin restoring the clustered file system (GFS2 in this case). Once recovery is complete, I/O unblocks and continues.

With SAN-side fencing, a fence is in the form of a logic disconnection from the storage network. This has no inherent mechanism for recovery, so the sysadmin will have to manually recover the node(s). For this reason, I do not prefer it.

With power fencing, by far the most common method which can be implemented via IPMI, addressable PDUs, etc, the node that is fenced is rebooted. The benefit of this method is that the node may well reboot "healthy" and then be able to rejoin the cluster automatically. Of course, if you prefer, you can have nodes powered off and left off.

--
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster