[Linux-cluster] force fencing
Juan Ramon Martin Blanco
robejrm at gmail.com
Mon Jul 6 08:22:23 UTC 2009
On Mon, Jul 6, 2009 at 10:08 AM, Armanet Stephane <armanets at ill.fr> wrote:
> Hello list
>
> I'm trying to setup a 3 nodes Cluster with 2 failover Domain for an HA
> mail solution.
> I want 1 run active for the Imap server in the Imap Failover domain , 1
> node active for the Smtp in the Smtp Failover domain and the 3rd in the
> 2 failover domain as a backup node.
>
> I run Centos 5.3
> My fence device is a wti power switch
>
> My cluster.conf is in attachement
>
> My SMTP service is composed of:
> 1 IP
> 1 amavisd scritp
> 1 postfix script
> 2 NFS mount for postfix and amavis
>
> If I manually kill the postfix master process (to simulate a crash), my
> node is not fence and the logs said:
>
> Jul 6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
> /etc/init.d/postfix status
> Jul 6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
> status of /etc/init.d/postfix failed (returned 3)
> Jul 6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> status on script
> "postfix" returned 1 (generic error)
> Jul 6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> Stopping service
> service:Postfix
> Jul 6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
> /etc/init.d/amavisd stop
> Jul 6 10:00:40 centos-smtp1 kernel: do_vfs_lock: VFS is out of sync
> with lock manager!
> Jul 6 10:00:40 centos-smtp1 last message repeated 8 times
> Jul 6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Executing
> /etc/init.d/postfix stop
> Jul 6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
> stop of /etc/init.d/postfix failed (returned 1)
> Jul 6 10:00:41 centos-smtp1 clurgmgrd[4228]: <notice> stop on script
> "postfix" returned 1 (generic error)
> Jul 6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Removing IPv4
> address 195.83.126.201/24 from bond0
> Jul 6 10:00:41 centos-smtp1 avahi-daemon[3552]: Withdrawing address
> record for 195.83.126.201 on bond0.
> Jul 6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
> /var/lib/amavis
> Jul 6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
> /var/spool/postfix
> Jul 6 10:00:51 centos-smtp1 clurgmgrd[4228]: <crit> #12: RG
> service:Postfix failed to stop; intervention required
> Jul 6 10:00:51 centos-smtp1 clurgmgrd[4228]: <notice> Service
> service:Postfix is failed
> Jul 6 10:00:52 centos-smtp1 ntpd[3322]: synchronized to 195.83.126.119,
> stratum 1
>
> Clustat said:
>
> Cluster Status for cluster-test @ Mon Jul 6 10:02:39 2009
> Member Status: Quorate
>
> Member Name ID
> Status
> ------ ---- ----
> ------
> centos-imap1.ill.fr 1
> Online, Local, rgmanager
> centos-imap2.ill.fr 2
> Online, rgmanager
> centos-smtp1.ill.fr 3
> Online, rgmanager
> /dev/disk/by-id/scsi-360a98000567247514634507447594661-part1 0
> Online, Quorum Disk
>
> Service Name Owner
> (Last) State
> ------- ---- -----
> ------ -----
> service:Imap
> centos-imap2.ill.fr started
>
> service:Postfix
> (centos-smtp1.ill.fr) failed
>
>
>
>
> So I have to disable the Postfix servcie with:
> clusvcadm -d Postfix
> and re-enable
> clusvcadm -e Postfix
>
>
>
> Could you explain my why my original smtp node is not fenced and why my
> service is not start on the 2nd node ???
>
Nodes are fenced only when they lost communications with the other nodes,
not when a service fails.
You should check the init scripts to make sure it works fine outside the
cluster, return values are important. I think in your case is failing
because you killed postfix in a way it deleted the .pid file, and that made
the init script fail.
BTW you should configure the service as recovery="relocate" if you want them
to be started on a different node.
Greetings,
Juanra
> Is there a way to force the fencing ???
>
>
> --
> ARMANET Stephane
> Division Projet Technique
> Service Informatique
> Groupe Infrastructure
>
> Institut Laue langevin
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090706/8c3595a9/attachment.htm>
More information about the Linux-cluster
mailing list