[Linux-cluster] Not restarting "max_restart" times before relocating failed service
Parvez Shaikh
parvez.h.shaikh at gmail.com
Tue Oct 30 08:54:22 UTC 2012
Hi experts,
I have defined a service as follows in cluster.conf -
<service autostart="0" domain="mydomain" exclusive="0"
max_restarts="5" name="mgmt" recovery="restart">
<script ref="myHaAgent"/>
<ip ref="192.168.51.51"/>
</service>
I mentioned max_restarts=5 hoping that if cluster fails to start service 5
times, then it will relocate to another cluster node in failover domain.
To check this, I turned down NIC hosting service's floating IP and got
following logs -
Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not
detected
Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1...
Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip
"192.168.51.51" returned 1 (generic error)
Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service
service:mgmt
*Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
recovering*
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed service
service:mgmt
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip "192.168.51.51"
returned 1 (generic error)
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start
service:mgmt; return value: 1
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service
service:mgmt
*Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
recovering
Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed
service service:mgmt*
Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
stopped
Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is
stopped
But from the log it appears that cluster tried to restart service only ONCE
before relocating.
I was expecting cluster to retry starting this service five times on the
same node before relocating
Can anybody correct my understanding?
Thanks,
Parvez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20121030/ea5b6243/attachment.htm>
More information about the Linux-cluster
mailing list