[Linux-cluster] Failover after partial failure because of SAN?

Fajar A. Nugraha list at fajar.net
Fri Nov 4 13:25:58 UTC 2011


On Fri, Nov 4, 2011 at 6:11 PM, Jochen Schneider
<jochen.schneider at gmail.com> wrote:
>> > I'm not sure how much recovery can come out of a failover in case
>> > of a SAN failure, if it's not both network cards of the node which are
>> > defective or whatever.
>>
>> Exactly :)
>>
>> If no node can access the SAN, then it can't failover anywhere.
>
> If it is more likely that SAN access fails on the SAN side than on the
> node side, I guess that would mean it would be better to keep the
> application not needing the SAN running, i.e., not failing over. Or
> maybe failover should be tried once and then my service should go in
> the degraded mode described above? I'm not sure whether that is
> possible.

I recommend you just keep it simple: treat the two applications
differently. Don't put any dependcy between them. Period.

That way when a node dies, they will be migrated to other nodes. If
the SAN dies, the one that doesn't need external disk will still work
just fine, while the one that needs it will be marked as dead (I
assume you have some kind of monitoring script for this already). Then
the dead one will try to either restart or moved to another node, and
if the SAN is also not available there it will simply die.

-- 
Fajar




More information about the Linux-cluster mailing list