[Linux-cluster] fence ' node1' failed if etho down

Tue Feb 17 08:48:39 UTC 2009

Hi,

first, thank you very much for your answer,

 You are right, I have not fencing devices at all, but for one reason: I
havent!!!

I´m just testing with 2 xen virtual machines running on the same host and
mounting an iscsi disk on other host to simulate shared storage.

on the other hand, I think I don´t understand the concept of fencing,

I try to configure fencing devices with luci, but when I try I don´t know
what to select from the combo of fencing devices. (perphaps manual fencing,
althoug its not recommended for production)

so, as I think this is a newbie and perhaps a silly question,

Can you give any good reference about fencing to learn about it or an
example configuation with fence devices to see how it must be done

thanks again,

ESG

2009/2/17 spods at iinet.net.au <spods at iinet.net.au>

> A couple of things.
>
> You don't have any fencing devices defined in cluster.conf at all.  No
> power
> fencing, no I/O fencing, not even manual fencing.
>
> You need to define how each node of the cluster is to be fenced (forcibly
> removed
> from the cluster) for proper failover operations to occur.
>
> Secondly, if the only connection shared between the two nodes is the
> network cord
> you just disconnected, then of course nothing will happen - each node has
> just
> lost the only common connection between each other to control the faulty
> node
> (i.e. through fencing).
>
> There need's to be more connections in between the nodes of a cluster than
> just a
> network card.  This can be achieved with a second NIC, I/O fencing,
> centralised
> or individual power controls (I/O switches or IPMI).
>
> That way in the event that the network connection is the single point of
> failure
> between the two nodes, at least a node can be fenced if it's behaving
> improperly.
>
> Once the faulty node is fenced, the remaining nodes should at that point
> continue
> providing cluster services.
>
> Regards,
>
> Stewart
>
>
>
>
> On Mon Feb 16 16:29 , ESGLinux  sent:
>
> >Hello All,
> >
> >I have a cluster with two nodes running one service (mysql). The two nodes
> uses
> a ISCSI disk with gfs on it.
> >I havenÂ´t configured fencing at all.
> >
> >I have tested diferent situtations of fail and these are my results:
> >
> >
> >If I halt node1 the service relocates to node2 - OK
> >if I kill the process in node1 the services relocate to node2 - OK
> >
> >but
> >
> >if I unplug the wire of the ether device or make ifdown eth0 on node1 all
> the
> cluster fails. The service doesnÂ´t relocate.
> >
> >In node2 I get the messages:
> >
> >Feb 15 13:29:34 localhost fenced[3405]: fencing node "192.168.1.188"
> >Feb 15 13:29:34 localhost fenced[3405]: fence "192.168.1.188" failed
> >Feb 15 13:29:39 localhost fenced[3405]: fencing node "192.168.1.188"
> >
> >Feb 15 13:29:39 localhost fenced[3405]: fence "192.168.1.188" failed
> >
> >again and again. The node2 never runs the service and I try to reboot the
> node1
> the computer hangs waiting for stopping the services.
> >
> >
> >In this situation all I can do is to switch off the power of node1 and
> reboot
> the node2. This situation is not acceptable at all.
> >
> >I think the problem is just with fencing but I dont know how to apply to
> this
> situation ( I have RTFM from redhat siteÂ  but I have seen how to apply it.
> :-( )
> >
> >
> >this is my cluster.conf file
> >
> ><cluster alias="MICLUSTER" config_version="62" name="MICLUSTER">
> >Â Â Â Â Â Â Â  <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
> >
> >Â Â Â Â Â Â Â  <clusternodes>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <clusternode name="node1" nodeid="1"
> votes="1">
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <fence/>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  </clusternode>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <clusternode name="node2" nodeid="2"
> votes="1">
> >
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <fence/>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  </clusternode>
> >Â Â Â Â Â Â Â  </clusternodes>
> >Â Â Â Â Â Â Â  <cman expected_votes="1" two_node="1"/>
> >Â Â Â Â Â Â Â  <fencedevices/>
> >
> >Â Â Â Â Â Â Â  <rm>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <failoverdomains>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <failoverdomain
> name="DOMINIOFAIL" nofailback="0"
> ordered="0" restricted="1">
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
> <failoverdomainnode name="node1" priority="1"/>
> >
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
> <failoverdomainnode name="node2" priority="1"/>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  </failoverdomain>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  </failoverdomains>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <resources/>
> >
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <service domain="DOMINIOFAIL" exclusive="0"
> name="BBDD"
> revovery="restart">
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <mysql
> config_file="/etc/my.cnf" listen_address=""
> mysql_options="" name="mydb" shutdown_wait="3"/>
> >
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <ip address="192.168.1.183"
> monitor_link="1"/>
> >Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  </service>
> >Â Â Â Â Â Â Â  </rm>
> ></cluster>
> >
> >Any idea? references?
> >
> >Thanks in advance
> >
> >
> >Greetings
> >
> >ESG
> >
> >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090217/e5a69269/attachment.htm>