[Linux-cluster] service restart problem

Tue Feb 14 18:47:00 UTC 2006

On Mon, 2006-02-13 at 11:56 -0800, Marc Lewis wrote:

> I'm seeing similar issues here.  The script entry doesn't seem to do
> anything when checking status.
> 
> For example, we have a MySQL service defined with an IP address, a shared
> SAN partition, and the /etc/init.d/mysqld script. 
> 
> The service starts up and shuts down fine when done manually via clusvcadm,
> but if I kill the mysql daemon with the script or manually, the clurgmgrd
> doesn't seem to care.  It just runs its status check, which does report it
> as "stopped" without ever restarting the service.

This should work.  The status check in the mysqld script needs to return
an error (i.e. non-zero) if the 'status' command is run and the service
is not running.

> Also, I've seen clurgmgrd die without logging anything anywhere.  

Probably a segfault which was fixed in CVS about two weeks ago...  It
was the cause of several problems, actually.

> I'll just
> check the cluster and it won't be running.  All of the services stay
> running, but the manager is dead.  Restarting it is problematic since it
> will restart each of the services causing a brief interruption.

> Anyone have any ideas on how to solve either of these two problems?  I've
> been waiting to deploy the cluster we've put together until I could resolve
> these two issues, but have run out of things to try.

If it's not the above, could you provide more information?  Ex. is the
service being reported as "stopped" in "clustat" output?

In the meantime, I will look at the script resource and see if there is
anything obvious.

-- Lon