[Linux-cluster] Graceful recover after connectivity failure

gordan at bobich.net gordan at bobich.net
Tue Jan 15 15:03:42 UTC 2008


On Tue, 15 Jan 2008, Cliff Hones wrote:

> Fajar A. Nugraha wrote:
>> AFAIK the prequisite for a cluster of any kind (be it RHEL or RAC) is that 
>> you have a failure-resistant network. This can be achieved for example by 
>> using dedicated heartbeat switches or cross-cables (in case of two nodes), 
>> plus ethernet bonding (in linux) for redundancy.
>> 
>> While I understand your requirement, I don't think an environment with 
>> (possibly) unreliable n/w is a good place for a cluster. Perhaps a simple 
>> thin client is more appropriate.
>
> We actually have two areas in which we wanted to use GFS - one is an
> office environment where the network, while not unreliable, is subject
> to occasional reconfiguration as machines/switches are moved.  The other
> is a datacentre environment where the infrastructure should be resilient.
>
> In both cases, our primary need for clustering is to enable GFS to be
> used.  Our local office setup could dispense with GFS/clustering and
> we could use other data sharing solutions such as NFS; however, we
> were planning to use a common solution so as to minimise maintenance
> costs and maximise our familiarity with the technology.

Are you saying that you are planning to use cluster / GFS across more than 
one switch (or a redunddant pair thereof)? That is quite unusual, as the 
performance would likely suffer.

If you require operation under unreliable conditions, you should probably 
look into using NFS over UDP, or for more transparent outages, Coda. Coda 
supports disconnected operation and files get cached locally on the 
clients. Provided multiple writers don't end up clobbering each other's 
files often during disconnection (which usually leads to a requirement for 
manual conflict resolution), you may find that it is a better solution for 
you than clustering.

Remember that GFS, NFS/CIFS and Coda are designed for three distinctly 
different environments. If you are thinking about using GFS for connecting 
desktops to the shared storage just to use the same technology for 
everything, not only is that the wrong tool for the job, it will stop 
working as soon as 1/2 of the machines are switched off/disconnected.

Gordan




More information about the Linux-cluster mailing list