[Linux-cluster] Init scripts and cluster suite

Wed Aug 30 07:22:41 UTC 2006

Hi,
I am on the same boat with my post [Linux-cluster] Init script files for
services..

> 
> On Mon, 2006-08-28 at 10:27 +0200, Jos Vos wrote:
> > Hi,
> >
> > Init scripts usually return a non-zero return code when they try to
> > stop a service that isn't running anymore.
> 
> According to the LSB, init scripts are supposed to return 0 in
> stop-after-stop situations.

I can't see this is true 'cause I tired following

[root at rhel4 ~]# /etc/init.d/httpd stop
Stopping httpd:                                            [  OK  ]
[root at rhel4 ~]# echo $?
0
[root at rhel4 ~]# /etc/init.d/httpd stop
Stopping httpd:                                            [FAILED]
[root at rhel4 ~]# echo $?
1

[root at rhel4 ~]# /etc/init.d/squid stop
Stopping squid: ................                           [  OK  ]
[root at rhel4 ~]# /etc/init.d/squid stop
Stopping squid:                                            [FAILED]
[root at rhel4 ~]# echo $?
1

[root at rhel4 ~]# /etc/init.d/postfix stop
Shutting down postfix:                                     [  OK  ]
[root at rhel4 ~]# /etc/init.d/postfix stop
Shutting down postfix:                                     [FAILED]
[root at rhel4 ~]# echo $?
1
[root at dsi-node1 ~]# /etc/init.d/postfix status
master is stopped
[root at dsi-node1 ~]# echo $?
3

> > When a cluster service has failed for some reason, the cluster suite
> > requires you to first disable a service, before enabling it again.
> > Disabling a service will try to stop the service, which will fail,
> > and thus the service can't be disabled (and also not enabled again).
> 
> Disabling (e.g. failed->disabled) should always work, even if a portion
> of the 'stop' phase returns nonzero.  It's really the only way to get a
> service out of the failed state - so the assumption is that you have
> cleaned up (or will clean up) the service before you try to enable it
> again.
> 

This is not always worked for me..

clurgmgrd: [2109]: <info> Executing /etc/init.d/rc.trend status
clurgmgrd: [2109]: <info> Executing /etc/init.d/postfix status
clurgmgrd[2109]: <notice> status on script "postfix_script" returned 3
(function not implemented)
clurgmgrd[2109]: <notice> Stopping service MAIL
clurgmgrd: [2109]: <info> Executing /etc/init.d/rc.trend stop
clurgmgrd: [2109]: <info> Executing /etc/init.d/postfix stop
postfix: postfix stop failed
clurgmgrd[2109]: <notice> stop on script "postfix_script" returned 1
(generic error)
clurgmgrd[2109]: <crit> #12: RG MAIL failed to stop; intervention required
clurgmgrd[2109]: <notice> Service MAIL is failed

# clusvcadm -d MAIL

clurgmgrd[2109]: <notice> Stopping service MAIL
clurgmgrd: [2109]: <info> Executing /etc/init.d/rc.trend stop
Aug 30 12:23:52 dsi-node1 clurgmgrd: [2109]: <info> Executing
/etc/init.d/postfix stop
postfix: postfix stop failed
clurgmgrd[2109]: <notice> stop on script "postfix_script" returned 1
(generic error)
clurgmgrd[2109]: <crit> #12: RG MAIL failed to stop; intervention required
clurgmgrd[2109]: <notice> Service MAIL is failed

Do I have to write scripts by myself?

- Hirantha

> If this is not working, please file a bugzilla -- failed->disable should
> work (maybe it should throw better warnings).
> 
> > The workaround is to either manually start the service and then
> > disabling it (bad idea for a cluster service) or to write all
> > cluster service scripts yourself, even if you just need to
> > control a standard service like httpd.
> 
> Well, for httpd, Marek Grac just wrote an agent which plugs in to
> rgmanager. ^^  On a more serious note, here's the bugzilla which talks
> about the problem you're seeing:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151104
> 
> > Is the latter the recommended solution for this problem?
> 
> :( Yes.  For now.
> 
> The patch included in the above bugzilla should fix the problem for most
> Red Hat (CentOS, etc.) installations, but will not be shipped in any
> updates of RHEL4 because of the fact that users / administrators might
> be erroneously relying on the "stop after stop returning failure"
> "feature" (even though it is not LSB compliant).
> 
> I'm fairly certain that RHEL5 and later releases will have the problem
> corrected (I'm pretty sure FC5 already has it fixed).
> 
> -- Lon
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster