[Linux-cluster] How do you HA your storage?

Sat Apr 30 16:42:21 UTC 2011

I do have RAID, multipath over multiple fabrics, etc. But what you're 
not at all protected from is major SAN failure, or a datacenter outage, 
for example. Which happens, and if you've got more than a few 
datacenters and dozens of SAN filers, you know they happen actually way 
too often for you to not miss a graceful, predictable recovery procedure.

So like everyone else, you've got cluster nodes in each datacenter, and 
all of them connected to the same SAN. Everything will recover quite 
nicely from just about every type of failure - except failure of the SAN 
itself. Your cluster nodes in your backup datacenter will not be happy 
to see the disks disappear. You can activate your backup filer(s) in 
seconds - all your hundreds of passive nodes actually do now have 
functioning copies of the data and could/should be able to get back to 
work - but getting all of them to actually realize it and get back to 
work, can be hours of messy manual work.

I wouldn't think it'd be very difficult to handle this gracefully, all 
the basic functionality is already there in multipath and LVM. I think 
it would be a pretty big deal in the enterprise world to be able to 
transparently switch SANs like this. As far as I know only z/os can do 
this currently and even then it's built around a very specific, 
complicated and expensive storage configuration. And there's a whole 
industry around "san virtualization" just because of this kind of 
sitautions, that would become obsolete overnight if the OS itself could 
handle it natively.

On 30/4/11 16:29, Jankowski, Chris wrote:
> I am just wondering, why would you like to do it this way?
>
> If you have SAN then by implication you have a storage array on the SAN.  This storage array will normally have capability to give you highly available storage through RAID{1,5,6}. Moreover, any decent array will also provide redundancy in case of a failure of one of is controllers. Then standard dual fabric FC SAN configuration will give you multiple paths to the controllers of the array - normally at least 4 paths. What remains to be done on the servers is to configure device mapper multipath to fit your SAN configuration and capabilities of the array. Most modern arrays these days are active-active and support ALUA extensions.
>
> Nothing specifically needs to be done in the cluster software.  This works the same way as for a single host.
>
> Are you trying to build a stretched cluster across multiple sites with a SAN array in each?
>
> Regards,
>
> Chris Jankowski
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of urgrue
> Sent: Saturday, 30 April 2011 19:01
> To: linux-cluster at redhat.com
> Subject: [Linux-cluster] How do you HA your storage?
>
> I'm struggling to find the best way to deal with SAN failover.
> By this I mean the common scenario where you have SAN-based mirroring.
> It's pretty easy with host-based mirroring (md, DRBD, LVM, etc) but how
> can you minimize the impact and manual effort to recover from losing a
> LUN, and needing to somehow get your system to realize the data is now
> on a different LUN (the now-active mirror)?
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster