[Linux-cluster] Monitoring services/customize failure criteria

Jeff Stoner jstoner at opsource.net
Mon Sep 15 23:29:49 UTC 2008

> -----Original Message-----
> It is also the detail of status/monitor which implementers get most
> frequently wrong. "But it's either running or not!" ... Which 
> is clearly
> not true, or at least such a case couldn't protect against certain
> failure modes. (Such as multiple-active on several nodes, which is
> likely to be _also_ failed.)

Ok. I think I understand where the confusion lies.

LSB is strictly for init scripts.
OCF is strictly for a cluster-managed resource.

They are similar but have significant differences. For example, LSB
scripts are required to implement a 'status' action while OCF scripts
are required to implement a 'monitor' action. This difference alone
means, technically, you can't interchange LSB and OCF scripts unless
they implement both (in some fashion.)

I think this is the missing link in our conversation: the script
resource type in Cluster Services is an attempt to make a LSB-compliant
script into a OCF-compliant script. So, the /usr/share/cluster/script.sh
expects the script you specify to behave like an LSB script, not an OCF
script. As such, the script resource type falls back to LSB conventions
and uses a binary approach to a resource's start/stop/status actions:
zero for success and non-zero for any failure. Other resource types
(file system, nfs, ip, mysql, samba, etc.) may implement full OCF RA API
exit codes.

Does this help?

Performance Engineer

OpSource, Inc.
"Your Success is Our Success"

More information about the Linux-cluster mailing list