[Linux-cluster] logging from the resource agent script

Wed Sep 1 18:03:22 UTC 2010

I got the answers to the questions about ocf_log that I posted last
week, so I'm following up to the list in case anyone finds these in
the list archives and wonders what the solution was.  I wasn't able
to find good answers via Google before, so hopefully this email will
fix that :)

On Mon, Aug 23, 2010 at 08:38:16PM -0400, I wrote:
> Right now, the question that's vexing me is how to log custom messages
> from this resource agent script, to give the operator more information
> about what the cluster is doing (such as, for example, the exact
> commands that are run when starting and stopping the service, or what
> the real return code from the health check is, rather than just "did
> it fail?").

> 1. ocf_log statements I put at the top level of the script do log,
> but any that I put inside functions such as start() and stop() don't.
> Why don't my custom log messages appear in /var/log/messages when
> other messages at the same level (such as info or notice) from
> rgmanager do, and when the start() or stop() function is clearly being
> called?
> 
> 2. ocf_log seems to sometimes, or always, output to stdout, which
> means I have to take care *not* to let it run when meta-data is the
> argument, because it'd pollute the metadata XML.  But then how do I
> log anything from the times the script is run for metadata, if I want?
> 
> Should this work?  Is there another, better way of making resource
> agent scripts log custom messages?
> 
> And what happens to the resource agent script's stdout, anyway?

So, first of all, the resource agent script's stdout and stderr are
tied to /dev/null *except* when it's being called for meta-data. It
is not logged anywhere.

Secondly, the problem with ocf_log not logging was very simple, but
obfuscated by the fact that stderr was thrown to the bit bucket.

ocf_log is a shell function which always outputs to stdout and also
calls a separate program called clulog to send stuff to syslog.  It
assumes clulog is in the path, which means the resource agent needs
/usr/sbin in its path, which was missing from my script.  A simple
oversight, would've been obvious if I'd see then "clulog: command
not found" errors.

One potential hitch is that ocf_log just passes its string argument to
clulog on the command line enclosed in double quotes, so you could
have shell quoting issues.  Quoting once (in your call to ocf_log in
the resource agent string) is not necessarily enough, there's going to
be a second level of shell interpolation, though it's double-quoted.
One failure would be if you start your string with a - character,
because then clulog will think it's another command line switch.

Note: My confusion about ocf_log "sometimes" sending to stdout was
caused by the fact that the resource agent's stdout was going to
/dev/null except when it was being called for meta-data.  ocf_log
always writes to stdout, and rgmanager was sometimes looking at it
and sometimes bitbucketing it.

Finally, a very useful debugging tool I was not aware of when I first
asked the question, that makes it much easier to see what's going on:

rg_test test /etc/cluster/cluster.conf [status|start|stop] service [service]

(run as root, or with sudo)

This runs your resource agent as rgmanager would, but shows you stdout
and stderr.
  -- Cos