[Linux-cluster] RHCS: Multi site cluster

Mon Apr 5 03:39:22 UTC 2010

Another comment:

You can certainly do it, but you may be surprised that the result may not be neither as resilient generally nor as highly available as initially hoped, due to the limitations of the cluster subsystems.

I'll give just two examples where you may hit unresolvable difficulties - the first of them is obvious and the second one much more subtle:

1. Fencing.

Assume that the link between sites A and B is severed and site A retained quorum whereas site B for the sake of the simple example did not. A node on site A will try to fence nodes on site B, but it cannot complete it, as the links are down. So fencing will bleck and you would have no access to GFS2 filesystems on site A, as they are awaiting recovery of locks that can only commence when fencing is completed.

This problem is inherent to the whole philosophy of fencing. Fencing model assumes that the fencing node has full control over the whole environment no matter what happens to the environment. IMHO, this works reasonably well for two computers under the desk with a power switch, but fails miserably over distance. 

Please note that clusters other than RHEL Cluster Suite do not implement the fencing stance i.e. they do not assume that a node is omnipotent. Their stance is much more conservative with regard of what a node can achieve within the environment. The difference looks minor, but consequences are dramatic when working with stretched clusters.

There are many kludges you can apply to bend fencing to your will, but you really cannot change the basic design philosophy, I believe.

2. CLVM
This is more subtle,
If you have a single logical volume built out of 4 physical volumes spread between the sites and you built it as a mirrored volume. Imagine a fire in the data centre. There are scenarios in which the CLVM will report the volume still being OK, but it lost left part of the first half and right part of the second half.  Then the last fibre melts and you have two sets in each site, but none of them is consistent.  You GFS filesystem built ofn the volume will be corrupted on either site.

Again this limitations are architectural:

1. 
LVM2 and CLVM are missing an intermediate layer between physical volume and logical volume.  This layer is called plex layer in some UNIX LVMs for example veritas VxVM and handles characteristics and state of an independent component of a volume.

2.
CLVM is missing any notion where the physical volume is located. Once you have this notion you can build the plex quorum in a location and guarantee that a plex in a site will be consistent. Note that this allows the LVM tah has this feature to deal with multiple, non simultaneous failures affecting multiple locations. Standard LVMs are not designed to do that. To the best of my knowledge only the Open VMS cluster has this feature. Perhaps the newest VxVM has it too; I have not worked with it in years. Anybody knows?

3.
There are other constraints in the management layer that IMHO result in uncertain recovery scenarios. Again, they mostly stem from lacking the plex layer.

--------------

This is why I believe that trying to do a stretched RHEL cluster is a bad idea, unless you do not care about its integrity and resiliency.

I always recommend (also for process reasons) splitting of the local HA (automatic replacement of a failed, single, redundant component in a single site) with DR (human in the loop, push button, failower solution).  Normally you need an independent HA cluster in each site.

RHEL cluster can deliver the HA part, whereas disk replication in storage with integrated push-button failover solution can deliver the other part. Note that if properly impleneted both solutions can deliver crash consistent recovery.

Regards,

Chris Jankowski

________________________________

	From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Paul Morgan
	Sent: Monday, 5 April 2010 11:08
	To: linux clustering
	Subject: Re: [Linux-cluster] RHCS: Multi site cluster

	You can do it, but you need to ensure low latency and design the stack to minimize the wide area replication. In other words, use fiber. Put the replication traffic on a separate segment if possible, and definitely in a separate  layer from the application traffic. 

	Also: double check with your Red  Hat sales team on support. 

			On Apr 4, 2010 12:11 PM, "Hoang, Alain" <Alain.Hoang at hp.com> wrote:

				Hello,

		With RHCS 5.4, is it possible to build a cluster on multiple site?
		Does CLVM 5.4 allows SAN replication across 2 sites?

		Best Regards,
		Kiến Lâm Alain Hoang, 

		--
		Linux-cluster mailing list
		Linux-cluster at redhat.com
		https://www.redhat.com/mailman/listinfo/linux-cluster