[Linux-cluster] killproc annoyance

Wed Apr 18 23:41:03 UTC 2007

So just started working with RH4's clustering services and have run  
into a bit of a "deadlock" problem that I'm trying to see if anyone  
else has seen/fixed.

1) Start off with working config, add httpd as a clustered service,  
and every thing is great. Fails over to other machines great.

2) Mess up the apache config (like adding a virtual IP that doesn't  
exist on the system). Even though configtest works, we have a broken  
config.

3) So you restart apache without knowing the config is bad, while the  
clustering service is running. Apache doesn't come back up. Okay,  
cool, well go fix the problem and try to tell clustering to restart  
the service.

Here is where things get annoying.
4) Now clustering says the service is failed. So it attempts to  
"service httpd stop" which killproc in /etc/init.d/functions returns  
a 1 since it wasn't running before. This causes the clustering  
software to fail the stop, and hence leave the service in a failed  
state. I can't get httpd up without the virtual IPs that are  
associated to the service, so I can't get killproc to ever return a 0  
when stopping the service. Shouldn't killproc return a 0 if none of  
the httpd daemons are still running?

I guess for now, I'll try and force some aliases for the IPs, get  
httpd up and running, disable the service, remove the aliases, and  
then enable the service. Lots of stuff to do if I was in a crisis  
mode in production.

Anyone have an opinion on killproc return codes?

Thanks,
Tarun