[Linux-cluster] Expected behaviour when service fails to stop

Mon Aug 8 22:14:25 UTC 2011

Chris Alexander <chris.alexander at kusiri.com> wrote:
> I was wondering what the expected behaviour of the cluster would be when a
> service cannot be shutdown safely. For example, if you request a service
> group to be relocated to another node in the cluster, if one of the services
> in that group fails to stop (causing a timeout?), what would the result be?
> I should imagine that the service would be marked as Failed, is this the
> case? I have been unable to find this particular scenario documented anywhere.

This may be the documentation you're looking for:
  https://fedorahosted.org/cluster/wiki/ServiceOperationalBehaviors

Under "Service States", the "failed" state is documented as:
  failed - The service is presumed dead. This state occurs whenever a
  resource's stop operation fails. Administrator must verify that there
  are no allocated resources (mounted file systems, etc.) prior to
  issuing a disable request. The only action which can take place from
  this state is disable.

So your intuition that the service is marked as "failed" if the stop
fails, is correct.  However, I'm not sure what you mean by "causing a
timeout".  What defines a stop failure is up to the resource agent
script (located in /usr/share/cluster) corresponding to the resource
it's trying to stop.  If the "stop" operation from that script returns
a non-zero exit code, then the stop is considered to have failed.
  -- Cos