[Linux-cluster] Fencing woes
David Teigland
teigland at redhat.com
Tue Aug 23 03:46:09 UTC 2005
On Mon, Aug 22, 2005 at 09:19:52PM +0200, Jan Bruvoll wrote:
> Dear list,
>
> I am having problems with a node where I can't get it to rejoin the
> fence domain. It has been rebooted before, and it has so far
> automatically joined the fence domain so that that it could pick up the
> rest of the depending services, but not this time. I upgraded the kernel
> and cluster/GFS suite (this is a Gentoo system) to
> gentoo-sources-2.6.12-r9 and cluster software v1.00.00.
Are the nodes running slightly different versions of the cluster software?
They must all be running the same version -- there was a change to the
cman message formats shortly before 1.00.00 was released.
> I guess the biggest problem is that I don't know what to actually do to
> unfence the node that has been shut out. Since I have set the cluster up
> to use manual fencing, I suppose the un-fence command to use is
> fence_ack_manual, however using that only produces a warning about a
> missing /tmp/fence_manual.fifo. Manually creating this fifo before
> running the command only removes the fifo -and- produces the warning.
>
> This is what a cman_tool services emits:
>
> Service Name GID LID State Code
> Fence Domain: "default" 0 2 join S-2,2,1
> []
Manual fencing is hard to use and get right, first recommendation is to
not use it. You only need to run fence_ack_manual when instructed to do
so by a message in /var/log/messages on some node.
Dave
More information about the Linux-cluster
mailing list