[Linux-cluster] How do you HA your storage?

Jankowski, Chris Chris.Jankowski at hp.com
Mon May 30 11:14:37 UTC 2011

There is a school of thought among practitioners of Business Continuity that says:

	HA != DR

The two cover different domains and mixing the two concepts leads to horror stories.

Essentially, HA covers a single (small or large) component failure. If components can be duplicated and work in parallel (e.g. disks, paths, controllers) then failure of one component may be transparent to the end users.  If they carry state e.g. a server then you replace the element and recover stable state - hence a HA cluster.
The action taken is automatic and the outcome can be guaranteed if only one component failed.

DR covers multiple and not necessarily simultaneous component failures.  They may result from large catastrophic events such as a fire in the data centre.  As the extent of damage is not known then a human must be in the loop - to declare a disaster and initiate execution of a disaster recovery plan. Software has horrible problems distinguishing between a hole in the ground from a massive bomb blast and a puff of smoke from a little short circuit in a power supply (:-)). Humans do better here. The results of execution of a disaster recovery plan can be achieved by very careful design for geographical separation, so a disaster does not invalidate redundancy. The execution itself can be automated, but is initiated by a human - push button solution.

Typically DR is layered on top of HA e.g. HA clusters in each location to protect against single component failures and data replication from the active to the DR site to maintain complete state in geographically distant location.

The typical cost ratios are 1=>4=>16 for single system => HA cluster => complete DR solution. That is why there are very few properly designed, built, tested and maintained DR solutions based on two HA clusters and replication.


I believe that you are trying to configure a stretched cluster that would provide some automatic DR capabilities.

The problem with stretched cluster solutions is that they do not normally take into consideration multiple, non-simultaneous component failures. I suggest that you think carefully what happens in such system depending on which fibre melts first and which disk seizes up first in a fire. You will soon find out that the software lacks the notion of locally consistent groups. The only cluster that ever did that location stuff right was DEC VMS cluster 25 years ago. Stretched VMS clusters did work correctly. The cost was horrendous though.


You can also try to make storage somebody else's problem by using a storage array that enables you to build HA geographically extended configuration. Believe it or not, there is one like that - P4000 from HP (formerly from Left Hand Networks). Of course, you still would need to properly design and configure such extended configuration, but it is a fully supported solution from the vendor.  You can play with it by downloading evaluation copies of the software - VSA - Virtual Storage Appliance from HP site.


Chris Jankowski

Once you ado

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Tom Lanyon
Sent: Monday, 30 May 2011 19:45
To: linux clustering
Subject: Re: [Linux-cluster] How do you HA your storage?

On 30/04/2011, at 6:31 PM, urgrue wrote:
> I'm struggling to find the best way to deal with SAN failover.
> By this I mean the common scenario where you have SAN-based mirroring.
> It's pretty easy with host-based mirroring (md, DRBD, LVM, etc) but how
> can you minimize the impact and manual effort to recover from losing a
> LUN, and needing to somehow get your system to realize the data is now
> on a different LUN (the now-active mirror)?


As others have mentioned, this may be a little off-topic for the list. However, I reply in support of hopefully providing an answer to your original question.

In my experience the destination array of storage-based (i.e. array-to-array) replication is able to present the replication target LUN with the same ID (e.g. WWN) as that of the source LUN on the source array.

In this scenario, you would present the replicated LUN on the destination array to your server(s), and your multipathing (i.e. device-mapper-multipath) software would essentially see it as another path to the same device. You obviously need to ensure that the priority of these paths are such that no I/O operations will traverse them unless the paths to the source array have failed.

In the case of a failure on the source array, it's paths will (hopefully!) be marked as failed, your multipath software will start queueing I/O, the destination array will detect the source array failure and switch its LUN presentation to read/write and your multipathing software will resume I/O on the new paths.

There's a lot to consider here. Such live failover can often be asking for trouble, and given the total failure rates of high-end storage equipment is quite minimal, I'd only implement if absolutely required.

The above assumes synchronous replication between the arrays.

Hope this helps somewhat.


Linux-cluster mailing list
Linux-cluster at redhat.com

More information about the Linux-cluster mailing list