[Linux-cluster] can't communicate with fenced -1
Gian Paolo Buono
gpbuono at gmail.com
Wed Jun 25 08:55:49 UTC 2008
Hi,
if I try to restart on yoda2 cman
[root at yoda2 ~]# /etc/init.d/cman restart
Stopping cluster:
Stopping fencing... done
Stopping cman... done
Stopping ccsd... done
Unmounting configfs... done
[ OK ]
Starting cluster:
Enabling workaround for Xend bridged networking... done
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing... failed
[FAILED]
[root at yoda2 ~]# tail -f /var/log/messages
Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] Members Joined:
Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] r(0) ip(172.20.0.174)
Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
primary component and will provide service.
Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message
172.20.0.174
Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message
172.20.0.175
Jun 25 10:50:42 yoda2 openais[18429]: [CPG ] got joinlist message from node
2
Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because
we were killed by cman_tool or other application
Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
infrastructure after 30 seconds.
Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32]:55090
on this server there are 3 xen domu and i can't to reboot yoda2 :( ..
best regards.. and sorry for my english :)
2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>
>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>
>>>
>>>
>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have two RHEL5.1 boxes installed sharing a
>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>
>>>>
>>>>
>>>> as a high-availability system of xen guest.
>>>>
>>>> One of the most repeating problems are fence_tool related.
>>>>
>>>> # service cman start
>>>> Starting cluster:
>>>> Loading modules... done
>>>> Mounting configfs... done
>>>> Starting ccsd... done
>>>> Starting cman... done
>>>> Starting daemons... done
>>>> Starting fencing... fence_tool: can't communicate with fenced -1
>>>>
>>>>
>>>>
>>>>
>>>> # fenced -D
>>>> 1204556546 cman_init error 0 111
>>>>
>>>> # clustat
>>>> CMAN is not running.
>>>>
>>>> # cman_tool join
>>>>
>>>> # clustat
>>>> msg_open: Connection refused
>>>>
>>>> Member Status: Quorate
>>>> Member Name ID Status
>>>>
>>>> ------ ---- ---- ------
>>>> yoda1 1 Online, Local
>>>> yoda2 2 Offline
>>>>
>>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>>
>>>>
>>>>
>>>>
>>>> the same time. But in the current HA configuration, I cannot guarantee
>>>> two systems will be rebooted at the same time for every problem we
>>>> face. This is my config file:
>>>>
>>>> ###################################cluster.conf####################################
>>>>
>>>>
>>>>
>>>>
>>>> <?xml version="1.0"?>
>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>
>>>>
>>>>
>>>>
>>>> <clusternodes>
>>>> <clusternode name="yoda2" nodeid="1" votes="1">
>>>> <fence/>
>>>> </clusternode>
>>>> <clusternode name="yoda1" nodeid="2" votes="1">
>>>>
>>>>
>>>>
>>>>
>>>> <fence/>
>>>> </clusternode>
>>>> </clusternodes>
>>>> <cman expected_votes="1" two_node="1"/>
>>>> <rm>
>>>> <failoverdomains/>
>>>>
>>>>
>>>>
>>>>
>>>> <resources/>
>>>> </rm>
>>>> <fencedevices/>
>>>> </cluster>
>>>> ###################################cluster.conf####################################
>>>> Regards.
>>>>
>>>> Hi
>>>
>>> I configured a two node cluster with no fence device on RHEL5.1.
>>> The cluster started and stopped with no issues. The only difference that
>>> I see is that I have used FQDN in my cluster.conf
>>>
>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>
>>> Check your /etc/hosts if it has the FQDN in it.
>>>
>>> Thanks
>>> Gowrishankar Rajaiyan
>>>
>>>
>>>
>>
>
> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
>> Hi,
>> the problem of my cluster is that it start-up weel but after two days the
>> problem that I have described is running, and this problem gets solved if
>> the two machines are rebooted at the same time.
>>
>> Thanks
>> Gian Paolo
>>
>
>
> Hi Gian
>
> Could you please attach the logs.
>
> Thanks
> Gowrishankar Rajaiyan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/03101f0d/attachment.htm>
More information about the Linux-cluster
mailing list