[Linux-cluster] strange requirements:non reboot of failed node, both shared and non-shared storage on SAN.

Tue Apr 8 04:31:31 UTC 2008

Hi All

I have a strange set of requirements:

A two node cluster:
services running on cluster nodes are not shared (ie not clustered).
cluster is only there for two GFS file systems on a SAN.
The same storage system hosts non GFS luns for individual use by the
cluster members.
The nodes run two applications, the critical app does NOT use the GFS. The
non critical ap uses the GFS.
The critical application uses storage from the SAN for ext3 file systems.

The requirement is that a failure of the cluster should not interupt the
critical application.
This means the failed node cannot be power cycled. Also the failed node must
continue to have access to it's non GFS luns on the storage.

The Storage are two HP EVAs. Each EVA has two controllers. There are two
brocade FC switches.

Fencing is required for GFS.

The only solution I can think of is:
GFS LUNs presented down one HBA only, while ext3 luns are presented down
both.
Use SAN fencing to block access by the fenced host to GFS luns by blocking
access to the controller that is handling this LUN.

repairing the cluster will be a manual operation that may involve a reboot.

does this look workable?

Thanks