[Linux-cluster] can't communicate with fenced -1

Wed Jun 25 08:55:49 UTC 2008

Hi,
if I try to restart on yoda2 cman
[root at yoda2 ~]# /etc/init.d/cman restart
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]
Starting cluster:
   Enabling workaround for Xend bridged networking... done
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... failed

                                                           [FAILED]
[root at yoda2 ~]# tail -f /var/log/messages
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
primary component and will provide service.
Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
172.20.0.174
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
172.20.0.175
Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from node
2
Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because
we were killed by cman_tool or other application
Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
infrastructure after 30 seconds.
Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32]:55090

on this server there are 3 xen domu and i can't to reboot yoda2 :( ..

best regards..  and sorry for my english :)

2008/6/25 GS R <gsrlinux at gmail.com>:

>
>>
>>
>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>
>>>
>>>
>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have two RHEL5.1 boxes installed sharing a
>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>
>>>>
>>>>
>>>> as a high-availability system of xen guest.
>>>>
>>>> One of the most repeating problems are fence_tool related.
>>>>
>>>>   # service cman start
>>>>   Starting cluster:
>>>>      Loading modules... done
>>>>      Mounting configfs... done
>>>>      Starting ccsd... done
>>>>      Starting cman... done
>>>>      Starting daemons... done
>>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>>
>>>>
>>>>
>>>>
>>>>  # fenced -D
>>>>   1204556546 cman_init error 0 111
>>>>
>>>>   # clustat
>>>>   CMAN is not running.
>>>>
>>>>   # cman_tool join
>>>>
>>>>   # clustat
>>>>   msg_open: Connection refused
>>>>
>>>>   Member Status: Quorate
>>>>     Member Name                        ID   Status
>>>>
>>>>     ------ ----                        ---- ------
>>>>     yoda1                             1 Online, Local
>>>>     yoda2                             2 Offline
>>>>
>>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>>
>>>>
>>>>
>>>>
>>>> the same time. But in the current HA configuration, I cannot guarantee
>>>> two systems will be rebooted at the same time for every problem we
>>>> face. This is my config file:
>>>>
>>>> ###################################cluster.conf####################################
>>>>
>>>>
>>>>
>>>>
>>>> <?xml version="1.0"?>
>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>
>>>>
>>>>
>>>>
>>>>         <clusternodes>
>>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>>                         <fence/>
>>>>                 </clusternode>
>>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>>
>>>>
>>>>
>>>>
>>>>                         <fence/>
>>>>                 </clusternode>
>>>>         </clusternodes>
>>>>         <cman expected_votes="1" two_node="1"/>
>>>>         <rm>
>>>>                 <failoverdomains/>
>>>>
>>>>
>>>>
>>>>
>>>>                 <resources/>
>>>>         </rm>
>>>>         <fencedevices/>
>>>> </cluster>
>>>> ###################################cluster.conf####################################
>>>> Regards.
>>>>
>>>> Hi
>>>
>>> I configured a two node cluster with no fence device on RHEL5.1.
>>> The cluster started and stopped with no issues. The only difference that
>>> I see is that I have used FQDN in my cluster.conf
>>>
>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>
>>> Check your /etc/hosts if it has the FQDN in it.
>>>
>>> Thanks
>>> Gowrishankar Rajaiyan
>>>
>>>
>>>
>>
>
> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>
>> Hi,
>> the problem of my cluster is that it start-up weel but after two days the
>> problem that I have described is running, and this problem gets solved if
>> the two machines are rebooted at the same time.
>>
>> Thanks
>> Gian Paolo
>>
>
>
> Hi Gian
>
> Could you please attach the logs.
>
> Thanks
> Gowrishankar Rajaiyan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/03101f0d/attachment.htm>