R: [Linux-cluster] Disconnecting eth cable, cluster hung-uP!

Leandro Dardini l.dardini at comune.prato.it
Wed Mar 22 08:46:30 UTC 2006


 

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Alex 
> aka Magobin
> Inviato: mercoledì 22 marzo 2006 9.00
> A: linux-cluster at redhat.com
> Oggetto: [Linux-cluster] Disconnecting eth cable, cluster hung-uP!
> 
> hi,
> Cluster that I'm testing in lab with two node works fine, but 
> unfortunately I don't have any fence device for my test. It 
> can switch services from one node to other without any 
> problem and if I shut a node all services go to other 
> node...BUT Today I tried to simply disconnect ethernet cable 
> to one node and I saw that both node hung-up....I can't use 
> clustat anymore...
> 
> In log I can see that CMAN remove correctly node from cluster 
> (missed too many heartbeats) and at same time I have a 
> "fenced: nodo1 not a cluster member after 0 sec post_fail_delay"
> 
> After that..only a lot of "fence "nodo1" failed"

This is you problem. Fencing is the most important feature of a cluster!

> 
> but in this case, simply removing ethernet cable....other 
> node doesn't start services...why?

When you made access to a shared media, to grant data integrity all writes must be coordinated. If a node is not responding and the cluster is not sure it is not writing on the shared media, it pause the access to device to avoid or minimize data corruption.

Fencing is the action the device take against a "not responding node" to be sure it hasn't still access to the shared media. Fencing can be against the power of the "not responding node", turning it off or against the shared media, like blocking the port where the FC card is connected.

> 
> Plus...I maked a script to shut correctly the services in 
> case of emergency but in this case It'hung-up during "Waiting 
> for services to stop:"
> 

This happens for a short period of time, but after few seconds the service stop correctly. Are you running this script without connection between nodes? You cannot shutdown a service when the cluster is not quorate

Leandro

PS
Are you italian? Scrivimi pure in privato se hai ancora problemi


> 
> How can I resolve this problem....
> 
> Thanks in advance!
> Alex
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 




More information about the Linux-cluster mailing list