[Linux-cluster] meta-data problem: rg_test shows the wrong value

Mon Aug 8 23:24:34 UTC 2011

I'm having a perplexing issue with a resource with a custom resource
agent I've written.  Here's what the cluster.conf section for it looks
like (with some anonymization of names and IPs):

<rm log_level="6">
  <service autostart="1" name="customresource" recovery="relocate">
    <ip address="10.6.19.50" monitor_link="1">
      <customresource name="A" monitoringport="9100" status_timeout="10"/>
      <customresource name="B" monitoringport="9105" status_timeout="30"/>
    </ip>
  </service>
</rm>

In the resource agent script /usr/share/cluster/customresource.sh,
status interval is calculated to be status_timeout * 2 + 2.  So in
this case, customresource A should have an interval of 22, and B
should have an interval of 62.

When I run the resource agent by hand, I get the right values:

| # export OCF_RESKEY_name="A"
| # export OCF_RESKEY_monitoringport="9100"
| # export OCF_RESKEY_status_timeout="10"
| # /usr/share/cluster/customresource.sh meta-data
[...]
|     <actions>
|         <action name="meta-data" timeout="5s"/>   
|         <action name="methods" timeout="5s"/>
|         <action name="start" timeout="10s"/>
|         <action name="stop" timeout="30s"/>
|         <action name="status" interval="22s" timeout="10s"/>
|         <action name="monitor" interval="22s" timeout="10s"/>
|         <action name="verify" timeout="5s"/>
|     </actions>
| </resource-agent>

However, when I run rg_test on this same cluster.conf and agent script,
I get a different value:

| $ sudo rg_test test /tmp/cluster.conf
[...]
|     myresource {
|       name = "A";
|       monitoringport = "9100";
|       status_timeout = "10";
|       status_interval = "40";
|     }
|     myresource {
|       name = "B"
|       monitoringport = "9105";
|       status_timeout = "30";
|       status_interval = "40";
|     }

Where is it getting this "40" value from?

Well, the funny thing is that the correct value *used* to be 40.

That was the default the resource agent sets if you *don't* specify
status_timeout in cluster.conf.  To test my new change, I made a copy
of cluster.conf in /tmp, added the new status_timeout values, and ran
rg_test on it.  But somehow, rg_test seems to be giving me a value that
does not come from this run of the resource agent and this cluster.conf.

Anyone know what's going on here?
  -- Cos