[Linux-cluster] restart or relocate?

Mon Dec 4 19:28:46 UTC 2006

On Wed, 2006-11-29 at 18:36 +0100, Carlo Mandelli wrote:
> Hi all,
> 
> I'm trying to test a 2 nodes cluster (RHCS U4) with apache and one
> monitored ip on eth1 (VIP 192.168.0.3), the hearthbeat is on eth0.
> 
> When I unplug the cable (eth1) on active node, I get these errors:
> 
> Nov 29 17:03:54 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd status
> Nov 29 17:04:24 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd status
> Nov 29 17:04:25 node1 kernel: tg3: eth1: Link is down.
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
> detected
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <warning> No link on eth1...
> Nov 29 17:04:44 node1 clurgmgrd[4368]: <notice> status on ip
> "192.168.0.3" returned 1 (generic error)
> Nov 29 17:04:44 node1 clurgmgrd[4368]: <notice> Stopping service http
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd stop
> Nov 29 17:04:44 node1 httpd: httpd shutdown succeeded
> Nov 29 17:04:44 node1 clurgmgrd: [4368]: <info> Removing IPv4 address
> 192.168.0.3 from eth1
> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Service http is recovering
> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Recovering failed
> service http
> Nov 29 17:04:54 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
> detected
> Nov 29 17:04:54 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd start
> Nov 29 17:04:54 node1 httpd: httpd startup succeeded
> Nov 29 17:04:54 node1 clurgmgrd[4368]: <notice> Service http started
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <warning> 192.168.0.3 is not
> configured
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> status on ip
> "192.168.0.3" returned 1 (generic error)
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Stopping service http
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd stop
> Nov 29 17:05:04 node1 httpd: httpd shutdown succeeded
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Service http is recovering
> Nov 29 17:05:04 node1 clurgmgrd[4368]: <notice> Recovering failed
> service http
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <warning> Link for eth1: Not
> detected
> Nov 29 17:05:04 node1 clurgmgrd: [4368]: <info> Executing
> /etc/init.d/httpd start
> Nov 29 17:05:04 node1 httpd: httpd startup succeeded
> <...>
> 
> and it restarts the service continously.
> 
> It performs failover only if I modify recovery mode in cluster.conf:
> 
> <service autostart="1" name="http" recovery="relocate">
> 
> Is there any way to set max number of retries before relocate service?

No...

However, the ip script should fail to start in the first place if it
detects that the link is not available, so the behavior you saw was not
correct.

Even with 'recovery="restart"', rgmanager should have relocated the
service after the first restart attempt failed.

-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061204/9319d86f/attachment.sig>