[Linux-cluster] Re: cluster not doing failover

Sai Loganathan sail at serverengines.com
Wed Mar 21 01:37:50 UTC 2007


Hello,
Thanks for the info. Now I am doing manual fencing but get the following
error whenever I do a failover.

Mar 12 17:25:50 node2 clurgmgrd[6088]: <info> State change: node1 DOWN
Mar 12 17:25:52 node2 clurgmgrd[6088]: <notice> Starting stopped service
iscsi_ip
Mar 12 17:25:52 node2 clurgmgrd: [6088]: <info> Adding IPv4 address
172.40.2.119 to eth2
Mar 12 17:25:52 node2 clurgmgrd[6088]: <notice> Starting stopped service
iscsi_lun
Mar 12 17:25:53 node2 clurgmgrd[6088]: <notice> Service iscsi_lun started
Mar 12 17:25:54 node2 clurgmgrd[6088]: <notice> Service iscsi_ip started
Mar 12 17:26:24 node2 kernel: CMAN: removing node node1 from the cluster :
Missed too many heartbeats
Mar 12 17:26:24 node2 fenced[6040]: node1 not a cluster member after 0 sec
post_fail_delay
Mar 12 17:26:24 node2 fenced[6040]: fencing node "node1"
Mar 12 17:26:24 node2 fence_manual: Node node1 needs to be reset before
recovery can procede.  Waiting for node1 to rejoin the cluster or for manual
acknowledgement that it has been reset (i.e. fence_ack_manual -n node1)

I just power down node 1 to simulate the failover to node2. Unless I execute
the command fence_ack_manual -n node1, the system will not move forward and
wait in fencing. How to fix this error?

During shutdown, I get the following error message and system waits there
infinitely.
Starting Killall: CMAN: sendmsg failed: -101
WARNING: dlm_emergency_shutdown
SM: 00000003 sm_stop: SG stilljoined
How to fix this error?

Thanks,
Sai Logan



-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of
linux-cluster-request at redhat.com
Sent: Saturday, March 10, 2007 9:00 AM
To: linux-cluster at redhat.com
Subject: Linux-cluster Digest, Vol 35, Issue 13

Send Linux-cluster mailing list submissions to
	linux-cluster at redhat.com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request at redhat.com

You can reach the person managing the list at
	linux-cluster-owner at redhat.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

   1. Re: cluster not doing failover (Jonathan E Brassow)


----------------------------------------------------------------------

Message: 1
Date: Fri, 9 Mar 2007 19:53:40 -0600
From: Jonathan E Brassow <jbrassow at redhat.com>
Subject: Re: [Linux-cluster] cluster not doing failover
To: linux clustering <linux-cluster at redhat.com>
Message-ID: <40407159e8e6506b05d46c82d921d936 at redhat.com>
Content-Type: text/plain; charset="iso-8859-1"


On Mar 9, 2007, at 5:30 PM, Sai Loganathan wrote:
>             <fencedevices>
>                         <fencedevice agent="fence_ilo" hostname="admin"
> login="admin" name="node1_fence" passwd="admin"/>
>                         <fencedevice agent="fence_ilo" hostname="admin"
> login="admin" name="node2_fence" passwd="admin"/>
>             </fencedevices>

The above line look funny to me.  The hostname for the fence device is 
"admin"?

> Using the cluster ip address (172.40.2.119), I was able to do an nfs
> mount of the shared lun from a 3rd machine. Started an infinite ls on
> that lun.
> To simulate failover, I just powered-down the node1 and hoping to see
> the read io stop but resume via the node2. But, I see the following
> error message on the node 2.
> Mar  9 12:14:49 node2 fenced[7422]: fence "node1" failed
> Mar  9 12:14:54 node2 fenced[7422]: fencing node "node1"
> Mar  9 12:14:54 node2 fenced[7422]: agent "fence_ilo" reports: Can't
> call method "configure" on an undefined value at /sbin/fence_ilo line
> 169, <> line 4.
> Mar  9 12:14:54 node2 fenced[7422]: fence "node1" failed
> Mar  9 12:14:59 node2 fenced[7422]: fencing node "node1"
> Mar  9 12:14:59 node2 fenced[7422]: agent "fence_ilo" reports: Can't
> call method "configure" on an undefined value at /sbin/fence_ilo line
> 169, <> line 4.
>  
> Seems like I am not doing something correct with respect to fencing.
> Can I setup cluster without fencing first of all?

Yes.  You can use manual fencing.  That should only be used for testing 
purposes though... it is not a supported configuration.


  brassow
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3471 bytes
Desc: not available
Url :
https://www.redhat.com/archives/linux-cluster/attachments/20070309/c015b8da/
attachment.bin

------------------------------

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 35, Issue 13
*********************************************



___________________________________________________________________________________
This message, together with any attachment(s), contains confidential and proprietary information of
ServerEngines LLC and is intended only for the designated recipient(s) named above. Any unauthorized
review, printing, retention, copying, disclosure or distribution is strictly prohibited.  If you are not the
intended recipient of this message, please immediately advise the sender by reply email message and
delete all copies of this message and any attachment(s). Thank you.





More information about the Linux-cluster mailing list