[Linux-cluster] killproc annoyance

Thu Apr 19 17:54:21 UTC 2007

On Apr 19, 2007, at 11:37 AM, Tarun Reddy wrote:

>
> On Apr 19, 2007, at 6:31 AM, Scott McClanahan wrote:
>
>> According to the link at the bottom, a stop argument to an already
>> stopped service should return a success just like a start on an  
>> already
>> started service should be interpreted as a success.  So I have  
>> rewritten
>> most of my init scripts which are managed by clurgd to be spec
>> compliant.  Unfortunately, how do you really know if the service is
>> stopped?  How can you be certain the pid file got written or is even
>> correct (race conditions)?  Anyway, I have written init scripts for
>> apache, tomcat, and IBM MQ to name a few and have tested the hell  
>> out of
>> them so let me know if you want a working example.
>>
>> http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/ 
>> iniscrptact.html
>>
>> On Wed, 2007-04-18 at 19:41 -0400, Tarun Reddy wrote:
>>> So just started working with RH4's clustering services and have run
>>> into a bit of a "deadlock" problem that I'm trying to see if anyone
>>> else has seen/fixed.
>>>
>>> 1) Start off with working config, add httpd as a clustered service,
>>> and every thing is great. Fails over to other machines great.
>>>
>>> 2) Mess up the apache config (like adding a virtual IP that doesn't
>>> exist on the system). Even though configtest works, we have a broken
>>> config.
>>>
>>> 3) So you restart apache without knowing the config is bad, while  
>>> the
>>> clustering service is running. Apache doesn't come back up. Okay,
>>> cool, well go fix the problem and try to tell clustering to restart
>>> the service.
>>>
>>> Here is where things get annoying.
>>> 4) Now clustering says the service is failed. So it attempts to
>>> "service httpd stop" which killproc in /etc/init.d/functions returns
>>> a 1 since it wasn't running before. This causes the clustering
>>> software to fail the stop, and hence leave the service in a failed
>>> state. I can't get httpd up without the virtual IPs that are
>>> associated to the service, so I can't get killproc to ever return  
>>> a 0
>>> when stopping the service. Shouldn't killproc return a 0 if none of
>>> the httpd daemons are still running?
>>>
>>> I guess for now, I'll try and force some aliases for the IPs, get
>>> httpd up and running, disable the service, remove the aliases, and
>>> then enable the service. Lots of stuff to do if I was in a crisis
>>> mode in production.
>>>
>>> Anyone have an opinion on killproc return codes?
>
> Scott,
>
> Thank you very much for the link! I thought I wasn't crazy. So some  
> more testing shows that on RHEL5, /etc/init.d/httpd stop when  
> apache is stopped, does the right thing and has RETVAL of 0, while  
> RHEL4 is "broken" in this respect.
>
> I think I'll look at where the differences are and possibly  
> integrate the change back.
>
> Thanks,
> Tarun
>

For future reference, RH clearly saw the change is a violation of LSB  
and change the following in killproc between RHEL4 and RHEL5

         else
             failure $"$base shutdown"
             RC=1
         fi

changed to:

         else
                 if [ -n "${LSB:-}" -a -n "$killlevel" ]; then
                         RC=7 # Program is not running
                 else
                         failure $"$base shutdown"
                         RC=0
                 fi
         fi

I think I will change it to return RC=0 and hope nothing else breaks :-)

Tarun