[Linux-cluster] fencing problem
Marcos David
marcos.david at efacec.pt
Thu Dec 14 15:19:21 UTC 2006
Hello,
I still need help with this one ;)
help! please!
Thanks.
Marcos David wrote:
> hello,
> I'm experiencing some problems with cluster fencing.
> First lets start with the specs:
>
> it's two node-cluster (Sun X4100) running RHEL4 Update 4 and RHCS 4
>
> the machines both have ILOM device that acts as a first level of fencing.
> then there is a second level of fencing that is performed by an UPS.
>
> my problem is the following:
> if i shutdown one of the nodes (simulating a power failure) the other
> tries to fence the failed node. So far so good.
> The problem is that since the ILOM in the node is offline the second
> node keeps trying to fence the ILOM device and never gives up!
>
> According to what I've read on the FAQ about fencing levels, if the
> first level fails it should go to the second level, and so on...
>
> But it never does this!
>
> Here a copy of th /var/log/messages:
>
> Dec 11 17:50:28 node_b kernel: CMAN: removing node node_a from the
> cluster : Missed too many heartbeats
> Dec 11 17:50:28 node_b fenced[3240]: node_a not a cluster member after
> 0 sec post_fail_delay
> Dec 11 17:50:28 node_b fenced[3240]: fencing node "node_a"
> Dec 11 17:52:47 node_b fenced[3240]: agent "fence_ipmilan" reports:
> Rebooting machine @ IPMI:172.18.56.17...ipmilan: Failed to connect
> after 30 seconds Failed
> Dec 11 17:52:47 node_b ccsd[9390]: process_get: Invalid connection
> descriptor received.
> Dec 11 17:52:47 node_b ccsd[9390]: Error while processing get: Invalid
> request descriptor
> Dec 11 17:52:47 node_b fenced[3240]: fence "node_a" failed
> Dec 11 17:52:52 node_b fenced[3240]: fencing node "node_a"
>
> the last 4 lines repeat for ever....
>
> here is a copy of the cluster.conf
>
>
> <?xml version="1.0"?>
> <cluster config_version="19" name="SERVER-A">
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="node-a" votes="1">
> <fence>
> <method name="1">
> <device name="fence_node-a"/>
> </method>
> <method name="2">
> <device name="UPS_node-a"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node-b" votes="1">
> <fence>
> <method name="1">
> <device name="fence_node-b"/>
> </method>
> <method name="2">
> <device name="UPS_node-b"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_ipmilan" auth="password"
> ipaddr="172.18.57.17" login="root" name="fence_node-a"
> passwd="changeme"/>
> <fencedevice agent="fence_ipmilan" auth="password"
> ipaddr="172.18.57.18" login="root" name="fence_node-b"
> passwd="changeme"/>
> <fencedevice agent="fence_apc" ipaddr="172.18.57.20"
> login="power" name="UPS_node-a" passwd="power"/>
> <fencedevice agent="fence_apc" ipaddr="172.18.57.21"
> login="power" name="UPS_node-b" passwd="power"/>
>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="Cluster_0" ordered="1"
> restricted="0">
> <failoverdomainnode name="node-a"
> priority="1"/>
> <failoverdomainnode name="node-b"
> priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <fs device="/dev/sdb1" force_fsck="1"
> force_unmount="1" fsid="46144" fstype="ext3" mountpoint="/mnt/shared"
> name="Storedge_Shared" options="" self_fence="1"/>
> <ip address="172.18.57.16" monitor_link="1"/>
> <ip address="172.18.57.11" monitor_link="1"/>
> <ip address="172.18.57.14" monitor_link="1"/>
> </resources>
> <service autostart="1" domain="Cluster_0"
> name="postgresql">
> <ip ref="172.18.57.16">
> <fs ref="Storedge_Shared">
> <script
> file="/etc/init.d/postgresql"
> name="PostgreSQL">
> </fs>
> </ip>
> </service>
> <service autostart="1" domain="Cluster_0" name="afs">
> <ip ref="172.18.57.14">
> <script file="/etc/init.d/afs"
> name="AFS"/>
> </ip>
> </service>
> </rm>
> </cluster>
>
> I would like to know a way to solve this problem.... :-)
>
> Thanks in advance,
>
> Marcos David
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
More information about the Linux-cluster
mailing list