[Linux-cluster] killproc annoyance

Tarun Reddy treddy at rallydev.com
Thu Apr 19 17:37:26 UTC 2007


On Apr 19, 2007, at 6:31 AM, Scott McClanahan wrote:

> According to the link at the bottom, a stop argument to an already
> stopped service should return a success just like a start on an  
> already
> started service should be interpreted as a success.  So I have  
> rewritten
> most of my init scripts which are managed by clurgd to be spec
> compliant.  Unfortunately, how do you really know if the service is
> stopped?  How can you be certain the pid file got written or is even
> correct (race conditions)?  Anyway, I have written init scripts for
> apache, tomcat, and IBM MQ to name a few and have tested the hell  
> out of
> them so let me know if you want a working example.
>
> http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/ 
> iniscrptact.html
>
> On Wed, 2007-04-18 at 19:41 -0400, Tarun Reddy wrote:
>> So just started working with RH4's clustering services and have run
>> into a bit of a "deadlock" problem that I'm trying to see if anyone
>> else has seen/fixed.
>>
>> 1) Start off with working config, add httpd as a clustered service,
>> and every thing is great. Fails over to other machines great.
>>
>> 2) Mess up the apache config (like adding a virtual IP that doesn't
>> exist on the system). Even though configtest works, we have a broken
>> config.
>>
>> 3) So you restart apache without knowing the config is bad, while the
>> clustering service is running. Apache doesn't come back up. Okay,
>> cool, well go fix the problem and try to tell clustering to restart
>> the service.
>>
>> Here is where things get annoying.
>> 4) Now clustering says the service is failed. So it attempts to
>> "service httpd stop" which killproc in /etc/init.d/functions returns
>> a 1 since it wasn't running before. This causes the clustering
>> software to fail the stop, and hence leave the service in a failed
>> state. I can't get httpd up without the virtual IPs that are
>> associated to the service, so I can't get killproc to ever return a 0
>> when stopping the service. Shouldn't killproc return a 0 if none of
>> the httpd daemons are still running?
>>
>> I guess for now, I'll try and force some aliases for the IPs, get
>> httpd up and running, disable the service, remove the aliases, and
>> then enable the service. Lots of stuff to do if I was in a crisis
>> mode in production.
>>
>> Anyone have an opinion on killproc return codes?

Scott,

Thank you very much for the link! I thought I wasn't crazy. So some  
more testing shows that on RHEL5, /etc/init.d/httpd stop when apache  
is stopped, does the right thing and has RETVAL of 0, while RHEL4 is  
"broken" in this respect.

I think I'll look at where the differences are and possibly integrate  
the change back.

Thanks,
Tarun




More information about the Linux-cluster mailing list