[Linux-cluster] Node with failed service does not get fenced.
Jonas Helgi Palsson
jonas at linpro.no
Mon Jul 21 21:35:39 UTC 2008
Hi
Running CentOS 5.2, all current updates on x86_64 platform.
I have set up a 2node cluster with following resources in one service
* one shared MD device (the resource is a script that assembles and stops
the , device and checks its status).
* one shared filesystem,
* one shared NFS startup script,
* one shared ip.
Which are started in that order.
And the cluster works normaly, I can move the service between the two nodes.
But I have observed one behavior that is not good. Once when trying to move
the service from one node to another, the clustermanager could not "umount"
the filesystem.
Although "lsof | grep <mountpoint>" did not show anything, "umount -f
<mountpoint>" did not work. ("umount -l <mountpoint>" did the job)
But when the clustermanager failed on that, it also failes on the MD script
and goes into "failed" status, with a message that "manual intervention is
needed".
Why does the node not get fenced down?
Upon "reboot -f" the service does not start until the faulty node is back
online.
Are there any magical things one can put in cluster.conf to get the behavior I
want? That if a service does not want to stop cleanly, fence the node and
start the service on another node?
regards
Jonas
--
Jonas Helgi Palsson
More information about the Linux-cluster
mailing list