[Linux-cluster] service restart problem

Tue Feb 14 18:58:36 UTC 2006

On Mon, 2006-02-13 at 16:40 -0800, Marc Lewis wrote:

> Since the service is already dead, the startup scripts return an error when
> trying to stop the service.  clurgmgrd fails the service and the service is
> now down.

The script should return "0" in stop-after-stop/stop-after-dead cases,
not "1".  It's a bug in /etc/init.d/functions of RHEL4's initscripts
package.

> The only way I've found around this is to force the "stop" to return 0 no
> matter what.  This way clurgmgrd will believe it has succeeded in shutting
> down the service and will restart it.

Please see this bugzilla:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104

There's a patch for /etc/init.d/functions inside it.

> > Anyone have any ideas on how to solve either of these two problems?  I've
> > been waiting to deploy the cluster we've put together until I could resolve
> > these two issues, but have run out of things to try.
> 
> I'm still seeing clurgmgrd die periodically for no reason, though.  I may
> have to write another script to monitor it as well and run that out of cron
> every so often.  That doesn't seem like a very good solution, though since
> it does restart all of the services that are running on that node.

The correct solution is to fix the problem with clurgmgrd, which has
been done in CVS and will be in an errata shortly.

-- Lon