[Linux-cluster] Problems with cluster (fencing?)
Gary_Hunt at gallup.com
Wed Mar 18 20:47:25 UTC 2009
I was fighting a very similar issue today. I am not familiar with the fencing you are using, but I would guess your fence device is not working properly. If a node fails and the fencing doesn't succeed it will halt all gfs activity. If a clustat shows both nodes and the quorum disk online, but no rgmanager try running a fence_tool leave and fence_tool join on both nodes. That worked for me today.
Starting one node with the other node down is failing because it is trying to fence all nodes not present before proceeding. I am testing clean_start="1" in the cluster.conf. It has worked well so far. I would definitely read the man page for fenced about clean_start before using it. It does have some risks.
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Mikko Partio
Sent: Wednesday, March 18, 2009 2:43 AM
To: linux clustering
Subject: [Linux-cluster] Problems with cluster (fencing?)
I have a two-node cluster with a quorum disk.
When I pull off the power cord from one node, the other node freezes the shared gfs-volumes and all activity stops, even though the cluster maintains quorum. When the other node boots up, I can see that "starting fencing" takes many minutes and afterwards starting clvmd fails. That node therefore cannot mount gfs disks since the underlying lvm volumes are missing.
Also, if I shut down both nodes and start just one of them, the starting node still waits in the "starting fencing" part many minutes even though the cluster should be quorate (there's a quorum disk)!
Fencing method used is HP iLO 2. I don't remember seeing this in CentOS 5.1 (now running 5.2). Any clue what might cause this?
IMPORTANT NOTICE: This e-mail message and all attachments, if any, may contain confidential and privileged material and are intended only for the person or entity to which the message is addressed. If you are not an intended recipient, you are hereby notified that any use, dissemination, distribution, disclosure, or copying of this information is unauthorized and strictly prohibited. If you have received this communication in error, please contact the sender immediately by reply e-mail, and destroy all copies of the original message.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linux-cluster