[Linux-cluster] restart or relocate?

Wed Nov 29 18:39:35 UTC 2006

Carlo Mandelli wrote:
> Hi all,
>
> I'm trying to test a 2 nodes cluster (RHCS U4) with apache and one
> monitored ip on eth1 (VIP 192.168.0.3), the hearthbeat is on eth0.
>
> When I unplug the cable (eth1) on active node, I get these errors:
>
> Nov 29 17:03:54 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd status
> Nov 29 17:04:24 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd status
> Nov 29 17:04:25 node1 kernel: tg3: eth1: Link is down.
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
> detected
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <warning> No link on eth1...
> Nov 29 17:04:44 node1 clurgmgrd[4368]: <notice> status on ip
> "192.168.0.3" returned 1 (generic error)
> Nov 29 17:04:44 node1 clurgmgrd[4368]: <notice> Stopping service http
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd stop
> Nov 29 17:04:44 node1 httpd: httpd shutdown succeeded
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <info> Removing IPv4 address
> 192.168.0.3 from eth1
> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Service http is recovering
> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Recovering failed
> service http
> Nov 29 17:04:54 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
> detected
> Nov 29 17:04:54 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd start
> Nov 29 17:04:54 node1 httpd: httpd startup succeeded
> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Service http started
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <warning> 192.168.0.3 is not
> configured
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> status on ip
> "192.168.0.3" returned 1 (generic error)
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Stopping service http
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd stop
> Nov 29 17:05:04 node1 httpd: httpd shutdown succeeded
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Service http is recovering
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Recovering failed
> service http
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
> detected
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd start
> Nov 29 17:05:04 node1 httpd: httpd startup succeeded
> <...>
>
> and it restarts the service continously.
>
> It performs failover only if I modify recovery mode in cluster.conf:
>
> <service autostart="1" name="http" recovery="relocate">
>
> Is there any way to set max number of retries before relocate service?
>
> Thanks
> Carlo
>   
Hi Carlo,

You're probably the victim of the init-script-not-returning-zero issue.  
See:
http://sources.redhat.com/cluster/faq.html#rgm_wontrestart

Regards,

Bob Peterson
Red Hat Cluster Suite