[Linux-cluster] Fencing woes

David Teigland teigland at redhat.com
Tue Aug 23 03:46:09 UTC 2005


On Mon, Aug 22, 2005 at 09:19:52PM +0200, Jan Bruvoll wrote:
> Dear list,
> 
> I am having problems with a node where I can't get it to rejoin the
> fence domain. It has been rebooted before, and it has so far
> automatically joined the fence domain so that that it could pick up the
> rest of the depending services, but not this time. I upgraded the kernel
> and cluster/GFS suite (this is a Gentoo system) to
> gentoo-sources-2.6.12-r9 and cluster software v1.00.00.

Are the nodes running slightly different versions of the cluster software?
They must all be running the same version -- there was a change to the
cman message formats shortly before 1.00.00 was released.

> I guess the biggest problem is that I don't know what to actually do to
> unfence the node that has been shut out. Since I have set the cluster up
> to use manual fencing, I suppose the un-fence command to use is
> fence_ack_manual, however using that only produces a warning about a
> missing /tmp/fence_manual.fifo. Manually creating this fifo before
> running the command only removes the fifo -and- produces the warning.
> 
> This is what a cman_tool services emits:
> 
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           0   2 join      S-2,2,1
> []

Manual fencing is hard to use and get right, first recommendation is to
not use it.  You only need to run fence_ack_manual when instructed to do
so by a message in /var/log/messages on some node.

Dave




More information about the Linux-cluster mailing list