[Linux-cluster] Service Recovery Failure

Thu Jun 30 13:59:54 UTC 2011

On 06/30/2011 01:57 AM, Rahul Borate wrote:
> Hi all,
>
> I just performed a test which fail miserably. I have two nodes
> node-1 and node-2
>
> Global file system /gfs is on node-1.

You do not have fencing configured.

On the clean shut down, the node withdraws and the other node knows that 
it's safe to take over services. When the node simply disappears, it 
doesn't know what state the other node is in. The survivor's only safe 
action is to block I/O, fence the lost node (to put it in a known 
state), then after successful fence (and only then), I/O will resume.

http://wiki.alteeve.com/index.php/Red_Hat_Cluster_Service_2_Tutorial#Concept.3B_Fencing

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"I feel confined, only free to expand myself within boundaries."