[Linux-cluster] HA agents (cluster scripts)
Ofer Inbar
cos at aaaaa.org
Wed Sep 15 16:37:41 UTC 2010
Cleber Rosa <crosa at redhat.com> wrote:
> If they're bash scripts, you might want to try:
>
> #bash -x <script> <arguments>.
This won't work. The interface between the resource manager and the
resource agent script is more than just "run the script". It includes:
- What environment does the script get when run by the RM?
- Which includes a lot of parameters passed from the RM to the script
- What metadata must the agent script provide to the RM?
- ... and what the RM will do differently based on this metadata
- When does the RM call the agent, with which paremeters?
- What actions will it take in response to exit codes
- What happens to stdout and stderr (answer below)
As a very basic hack, you can sort of simulate running a resource
agent by doing something like this in bash:
$ sudo OCF_RESKEY_name=... OCF_RESKEY_otherparam=... /usr/share/cluster/myagent status
rg_test is better, but even it won't help troubleshoot all of this.
P.S. In RHCS at least, a resource agent's stdout and stderr are always
sent to the bitbucket, *except* that when calling meta-data, rgmanager
will read all of the agent's stdout as the metadata. One probelm here
is that if you put ocf_log statements in your script, they *will* write
to stdout in addition to syslog; if any of your ocf_log's are at the
top level of the script you have to test that $1 isn't "meta-data", so
you don't write extra debugging or status output along with the XML.
And here's another tricky and undocumented portion of the interface:
If your XML metadata doesn't validate (for example, if you write some
extra stuff to stdout when called with meta-data, such as ocf_log),
rgmanager will ignore your resource agent as invalid, and will ignore
its resources in your cluster.conf - which means that any service you
define that includes your custom resource will "successfully" start
without your custom resource, and rgmanager will treat that as okay!!
I think that behavior is stunningly awful and broken; a resource
group that includes a resource that failed to validate, should fail.
At least in RHEL/CentOS 5.3, it doesn't log anything to indicate this
condition. I've heard that in 5.5, you do get a log message when the
metadata doesn't validate.
-- Cos
More information about the Linux-cluster
mailing list