[Linux-cluster] Determining failed node on another node of clusterduring failover

Parvez Shaikh parvez.h.shaikh at gmail.com
Fri Jan 14 04:27:48 UTC 2011


Hi

Any idea on how to get name of failed node using available cluster
tools or commands? I have tried clustat but it seems to be producing
unexpected output.

I will have to obtain this information on target host/node; to which
service is relocating as a part of failover.

Thanks in advance

Gratefully yours



On 1/13/11, Parvez Shaikh <parvez.h.shaikh at gmail.com> wrote:
> Hi,
>
> I have been using clustat command. clustat -x -s servicename to get
> following XML file -
>
> <?xml version="1.0"?>
> <clustat version="4.1.1">
>   <groups>
>     <group name="service:service_on_node1" state="112"
> state_str="started" flags="0" flags_str="" owner="node1"
> last_owner="none" restarts="0" last_transition="1294752663"
> last_transition_str="Tue Jan 11 19:01:03 2011"/>
>   </groups>
> </clustat>
>
> I was under impression that "last_owner" field in the above XML file
> should give me node name where service was last running. I was parsing
> this XML file to obtain this information.
>
> Note that, this holds true if you migrate or relocate service from one
> node to another using clusvcadm or from conga or system-config-luster
> BUT if node is shutdown and service relocate to another node,
> last_owner is either 'none' or same as current node on which service
> is relocated.
>
> Parsing var/messages/log is easy but not optimal solution, it will
> need "grep"ing entire log file for some specific message where failed
> node name is appearing in clumgr messages.
>
>
>
> On Thu, Jan 13, 2011 at 4:35 AM, Kit Gerrits <kitgerrits at gmail.com> wrote:
>>
>> Hello,
>>
>> The Clustering software itself monitors nodes and devices in use by
>> cluster
>> services, but logs to /var/log/messages.
>> A quick overview is presented by the 'clustat' command.
>>
>> Monitoring tools are freely available for any platform.
>> Basic monitoring in Linux is available with Big Brother, Cacti, OpenNMS
>> or
>> Nagios (in order of increasing complexity).
>> If you're bound to windows, maybe try ServersCheck .
>>
>>
>> Parsing logs can be trivial, once you know how.
>> What do you want to know and when do you want to know it?
>>
>> Have you looked at 'clustat' and 'cman_tool'?
>>
>>
>> Regards,
>>
>> Kit
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
>> Sent: woensdag 12 januari 2011 11:01
>> To: linux clustering
>> Subject: Re: [Linux-cluster] Determining failed node on another node of
>> clusterduring failover
>>
>> Hi
>>
>> Is monitoring package part of RHCS? What is name of this component?
>>
>> Is there any other mechanism which doesn't require to parse log/messages
>> to
>> determine which node has left the cluster on stand by node before
>> failover
>> is complee?
>>
>> Thanks
>>
>> On Wed, Jan 12, 2011 at 2:58 PM, Kit Gerrits <kitgerrits at gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> If you want to find out which cluster node has failed, you could
>>> either check /var/log/messages and see which member has left the
>>> cluster, or you can set up monitoring to check if your servers are all
>>> in
>> good shape.
>>>
>>> If you are running a cluster, I would suggest also setting up
>>> monitoring.
>>> The monitoring package can then notify you if any cluster member fails.
>>>
>>>
>>> Regards,
>>>
>>> Kit
>>>
>>> -----Original Message-----
>>> From: linux-cluster-bounces at redhat.com
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Parvez Shaikh
>>> Sent: woensdag 12 januari 2011 7:04
>>> To: linux clustering
>>> Subject: [Linux-cluster] Determining failed node on another node of
>>> clusterduring failover
>>>
>>> Hi all,
>>>
>>> Taking this question from another thread, here is a challenge that I
>>> am facing -
>>>
>>> Following is simple cluster configuration -
>>>
>>> Node 1, node 2, node 3, and node4 are part of cluster, its
>>> unrestricted unordered fail-over domain with active - active nxn
>>> configuration
>>>
>>> So a node 2 can get services from node1, node3 or node4 when any of
>>> these(1,3,4) node fails(e.g. power failure).
>>>
>>> In that event I want to find out which of the node has failed over
>>> node2, I was invoking "clustat -x -S service name" on node2 in my
>>> custom agent and was parsing for "last_owner" field to obtain name of
>>> node on which service was previously running.
>>>
>>> This however doesn't seem to be working in case if I shutdown node(but
>>> works if I migrate service from one node to another using clusvcadm)
>>>
>>> Is there anyway that I can find out which node has failed during
>>> failover of service on a standby node? Any tool which I might have
>>> missed or some command which I can send to ccsd to get this
>>> information
>>>
>>> Thanks
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>




More information about the Linux-cluster mailing list