[Linux-cluster] Node with failed service does not get fenced.

Finnur Örn Guðmundsson - TM Software fog at t.is
Fri Jul 25 10:59:16 UTC 2008


Hi,

There is a flag you can use to force a reboot if unmount is not successful.

See: http://kbase.redhat.com/faq/FAQ_51_11753.shtm

Kær kveðja / Best regards,
Finnur Ö. Guðmundsson
MCP - RHCA - Linux+
System Engineer - System Operations
fog at t.is

TM Software - Skyggnir
Urðarhvarf 6, IS- 203 Kópavogur, Iceland
tel: + 354 545 3000-fax + 354 545 3001
www.t.is



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Jonas Helgi Palsson
Sent: Mon 7/21/2008 21:35
To: 'linux clustering'
Subject: [Linux-cluster] Node with failed service does not get fenced.
 
Hi

Running CentOS 5.2, all current updates on x86_64 platform.

I have set up a 2node cluster with following resources in one service

* one shared MD device (the resource is a script that assembles and stops 
the , device and checks its status).
* one shared filesystem,
* one shared NFS startup script,
* one shared ip.

Which are started in that order.

And the cluster works normaly, I can move the service between the two nodes.

But I have observed one behavior that is not good. Once when trying to move 
the service from one node to another, the clustermanager could not "umount" 
the filesystem. 
Although "lsof | grep <mountpoint>" did not show anything, "umount -f 
<mountpoint>" did not work. ("umount -l <mountpoint>" did the job)

But when the clustermanager failed on that, it also failes on the MD script 
and goes into "failed" status, with a message that "manual intervention is 
needed". 

Why does the node not get fenced down?

Upon "reboot -f" the service does not start until the faulty node is back 
online.

Are there any magical things one can put in cluster.conf to get the behavior I 
want? That if a service does not want to stop cleanly, fence the node and 
start the service on another node?

regards
Jonas
-- 
Jonas Helgi Palsson

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080725/eaba8041/attachment.htm>


More information about the Linux-cluster mailing list