[Linux-cluster] $OCF_ERR_CONFIGURED - recovers service on another cluster node

Wed Feb 8 14:36:26 UTC 2012

On 01/27/2012 04:03 AM, Parvez Shaikh wrote:
> Hi guys,
>
> I am using Red Hat Cluster Suite which comes with RHEL 5.5 -
>
> cman_tool version
>  >>6.2.0 config xxx
>
> Now I have a script resource in which I return $OCF_ERR_CONFIGURED; in
> case of a Fatal irrecoverable error, hoping that my service would not
> start on another cluster node.
>
> But I see that cluster, relocates it to another cluster node and
> attempts to start it.
>
> I referred error code documentation from
> http://www.linux-ha.org/doc/dev-guides/_return_codes.html
>
> Is there any return code which makes RHCS to give up on recovering service?
>

The resource must fail during the 'stop' phase if you want rgmanager to 
not try to recover it.  There is no 'start' phase error condition that 
tells rgmanager to give up.

The history:  If you don't have a program installed or configured on 
host1 but try to enable a service there, it will obviously fail to start 
(rightfully so).  However, host2 may have the configuration.  So, 
rgmanager will then stop the service and try to start it on host2.  In 
fact, it will systematically try every host in the cluster until:

   - the service starts successfully

   - no more hosts are available (e.g. restricted failover domain,
     exclusive services, or simply all hosts were tried).  At this
     point, the service is placed in the 'stopped' state in
     the hopes that the next host to come online will be able to
     start the service

   - a failure during 'stop' occurs.  Most errors during the stop
     phase will trigger an abortion of the enable request (except
     'OCF_NOT_INSTALLED' when a <script> is missing)

-- Lon