[Linux-cluster] Determining failed node on another node of clusterduring failover

Parvez Shaikh parvez.h.shaikh at gmail.com
Thu Jan 13 04:34:03 UTC 2011


Hi,

I have been using clustat command. clustat -x -s servicename to get
following XML file -

<?xml version="1.0"?>
<clustat version="4.1.1">
  <groups>
    <group name="service:service_on_node1" state="112"
state_str="started" flags="0" flags_str="" owner="node1"
last_owner="none" restarts="0" last_transition="1294752663"
last_transition_str="Tue Jan 11 19:01:03 2011"/>
  </groups>
</clustat>

I was under impression that "last_owner" field in the above XML file
should give me node name where service was last running. I was parsing
this XML file to obtain this information.

Note that, this holds true if you migrate or relocate service from one
node to another using clusvcadm or from conga or system-config-luster
BUT if node is shutdown and service relocate to another node,
last_owner is either 'none' or same as current node on which service
is relocated.

Parsing var/messages/log is easy but not optimal solution, it will
need "grep"ing entire log file for some specific message where failed
node name is appearing in clumgr messages.



On Thu, Jan 13, 2011 at 4:35 AM, Kit Gerrits <kitgerrits at gmail.com> wrote:
>
> Hello,
>
> The Clustering software itself monitors nodes and devices in use by cluster
> services, but logs to /var/log/messages.
> A quick overview is presented by the 'clustat' command.
>
> Monitoring tools are freely available for any platform.
> Basic monitoring in Linux is available with Big Brother, Cacti, OpenNMS or
> Nagios (in order of increasing complexity).
> If you're bound to windows, maybe try ServersCheck .
>
>
> Parsing logs can be trivial, once you know how.
> What do you want to know and when do you want to know it?
>
> Have you looked at 'clustat' and 'cman_tool'?
>
>
> Regards,
>
> Kit
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
> Sent: woensdag 12 januari 2011 11:01
> To: linux clustering
> Subject: Re: [Linux-cluster] Determining failed node on another node of
> clusterduring failover
>
> Hi
>
> Is monitoring package part of RHCS? What is name of this component?
>
> Is there any other mechanism which doesn't require to parse log/messages to
> determine which node has left the cluster on stand by node before failover
> is complee?
>
> Thanks
>
> On Wed, Jan 12, 2011 at 2:58 PM, Kit Gerrits <kitgerrits at gmail.com> wrote:
>>
>> Hello,
>>
>> If you want to find out which cluster node has failed, you could
>> either check /var/log/messages and see which member has left the
>> cluster, or you can set up monitoring to check if your servers are all in
> good shape.
>>
>> If you are running a cluster, I would suggest also setting up monitoring.
>> The monitoring package can then notify you if any cluster member fails.
>>
>>
>> Regards,
>>
>> Kit
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
>> Sent: woensdag 12 januari 2011 7:04
>> To: linux clustering
>> Subject: [Linux-cluster] Determining failed node on another node of
>> clusterduring failover
>>
>> Hi all,
>>
>> Taking this question from another thread, here is a challenge that I
>> am facing -
>>
>> Following is simple cluster configuration -
>>
>> Node 1, node 2, node 3, and node4 are part of cluster, its
>> unrestricted unordered fail-over domain with active - active nxn
>> configuration
>>
>> So a node 2 can get services from node1, node3 or node4 when any of
>> these(1,3,4) node fails(e.g. power failure).
>>
>> In that event I want to find out which of the node has failed over
>> node2, I was invoking "clustat -x -S service name" on node2 in my
>> custom agent and was parsing for "last_owner" field to obtain name of
>> node on which service was previously running.
>>
>> This however doesn't seem to be working in case if I shutdown node(but
>> works if I migrate service from one node to another using clusvcadm)
>>
>> Is there anyway that I can find out which node has failed during
>> failover of service on a standby node? Any tool which I might have
>> missed or some command which I can send to ccsd to get this
>> information
>>
>> Thanks
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




More information about the Linux-cluster mailing list