[Linux-cluster] force fencing

Juan Ramon Martin Blanco robejrm at gmail.com
Mon Jul 6 08:22:23 UTC 2009


On Mon, Jul 6, 2009 at 10:08 AM, Armanet Stephane <armanets at ill.fr> wrote:

> Hello list
>
> I'm trying to setup a 3 nodes Cluster with 2 failover Domain for an HA
> mail solution.
> I want 1 run active for the Imap server in the Imap Failover domain , 1
> node active for the Smtp in the Smtp Failover domain and the 3rd in the
> 2 failover domain as a backup node.
>
> I run Centos 5.3
> My fence device is a wti power switch
>
> My cluster.conf is in attachement
>
> My SMTP service is composed of:
>        1 IP
>        1 amavisd scritp
>        1 postfix script
>        2 NFS mount for postfix and amavis
>
> If I manually kill the postfix master process (to simulate a crash), my
> node is not fence and the logs said:
>
> Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
> /etc/init.d/postfix status
> Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
> status of /etc/init.d/postfix failed (returned 3)
> Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> status on script
> "postfix" returned 1 (generic error)
> Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> Stopping service
> service:Postfix
> Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
> /etc/init.d/amavisd stop
> Jul  6 10:00:40 centos-smtp1 kernel: do_vfs_lock: VFS is out of sync
> with lock manager!
> Jul  6 10:00:40 centos-smtp1 last message repeated 8 times
> Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Executing
> /etc/init.d/postfix stop
> Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
> stop of /etc/init.d/postfix failed (returned 1)
> Jul  6 10:00:41 centos-smtp1 clurgmgrd[4228]: <notice> stop on script
> "postfix" returned 1 (generic error)
> Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Removing IPv4
> address 195.83.126.201/24 from bond0
> Jul  6 10:00:41 centos-smtp1 avahi-daemon[3552]: Withdrawing address
> record for 195.83.126.201 on bond0.
> Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
> /var/lib/amavis
> Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
> /var/spool/postfix
> Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <crit> #12: RG
> service:Postfix failed to stop; intervention required
> Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <notice> Service
> service:Postfix is failed
> Jul  6 10:00:52 centos-smtp1 ntpd[3322]: synchronized to 195.83.126.119,
> stratum 1
>
> Clustat said:
>
> Cluster Status for cluster-test @ Mon Jul  6 10:02:39 2009
> Member Status: Quorate
>
>  Member Name                                                     ID
> Status
>  ------ ----                                                     ----
> ------
>  centos-imap1.ill.fr                                                 1
> Online, Local, rgmanager
>  centos-imap2.ill.fr                                                 2
> Online, rgmanager
>  centos-smtp1.ill.fr                                                 3
> Online, rgmanager
>  /dev/disk/by-id/scsi-360a98000567247514634507447594661-part1        0
> Online, Quorum Disk
>
>  Service Name                                                   Owner
> (Last)                                                   State
>  ------- ----                                                   -----
> ------                                                   -----
>  service:Imap
> centos-imap2.ill.fr                                            started
>
>  service:Postfix
> (centos-smtp1.ill.fr)                                          failed
>
>
>
>
> So I have to disable the Postfix servcie with:
>        clusvcadm -d Postfix
> and re-enable
>        clusvcadm -e Postfix
>
>
>
> Could you explain my why my original smtp node is not fenced and why my
> service is not start on the 2nd node ???
>
Nodes are fenced only when they lost communications with the other nodes,
not when a service fails.
You should check the init scripts  to make sure it works fine outside the
cluster, return values are important. I think in your case is failing
because you killed postfix in a way it deleted the .pid file, and that made
the init script fail.
BTW you should configure the service as recovery="relocate" if you want them
to be started on a different node.

Greetings,
Juanra



> Is there a way to force the fencing ???
>
>
> --
> ARMANET Stephane
> Division Projet Technique
> Service Informatique
>  Groupe Infrastructure
>
> Institut Laue langevin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090706/8c3595a9/attachment.htm>


More information about the Linux-cluster mailing list