[Linux-cluster] can't communicate with fenced -1
Gian Paolo Buono
gpbuono at gmail.com
Wed Jun 25 09:24:04 UTC 2008
Hi,
an other problem the process clurgmgrd don't dead:
[root at yoda2 ~]# /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop:
but nothing to do...
[root at yoda2 ~]# ps -ef | grep clurgmgrd
root 6620 1 55 Jun03 ? 12-02:06:46 clurgmgrd
[root at yoda2 ~]# kill -9 6620
[root at yoda2 ~]# ps -ef | grep clurgmgrd
and the process clvmd
[root at yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2
help me ... i don't want reboot the yoda2 ...
bye
On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono at gmail.com>
wrote:
> Hi,
> if I try to restart on yoda2 cman
> [root at yoda2 ~]# /etc/init.d/cman restart
> Stopping cluster:
> Stopping fencing... done
> Stopping cman... done
> Stopping ccsd... done
> Unmounting configfs... done
> [ OK ]
> Starting cluster:
> Enabling workaround for Xend bridged networking... done
> Loading modules... done
> Mounting configfs... done
> Starting ccsd... done
> Starting cman... done
> Starting daemons... done
> Starting fencing... failed
>
> [FAILED]
> [root at yoda2 ~]# tail -f /var/log/messages
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] Members Joined:
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] r(0) ip(172.20.0.174)
> Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
> primary component and will provide service.
> Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message
> 172.20.0.174
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM ] got nodejoin message
> 172.20.0.175
> Jun 25 10:50:42 yoda2 openais[18429]: [CPG ] got joinlist message from
> node 2
> Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because
> we were killed by cman_tool or other application
> Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
> Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
> Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32
> ]:55090
>
>
> on this server there are 3 xen domu and i can't to reboot yoda2 :( ..
>
> best regards.. and sorry for my english :)
>
> 2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>>
>>>
>>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>>
>>>>
>>>>
>>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have two RHEL5.1 boxes installed sharing a
>>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> as a high-availability system of xen guest.
>>>>>
>>>>> One of the most repeating problems are fence_tool related.
>>>>>
>>>>> # service cman start
>>>>> Starting cluster:
>>>>> Loading modules... done
>>>>> Mounting configfs... done
>>>>> Starting ccsd... done
>>>>> Starting cman... done
>>>>> Starting daemons... done
>>>>> Starting fencing... fence_tool: can't communicate with fenced -1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> # fenced -D
>>>>> 1204556546 cman_init error 0 111
>>>>>
>>>>> # clustat
>>>>> CMAN is not running.
>>>>>
>>>>> # cman_tool join
>>>>>
>>>>> # clustat
>>>>> msg_open: Connection refused
>>>>>
>>>>> Member Status: Quorate
>>>>> Member Name ID Status
>>>>>
>>>>> ------ ---- ---- ------
>>>>> yoda1 1 Online, Local
>>>>> yoda2 2 Offline
>>>>>
>>>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> the same time. But in the current HA configuration, I cannot guarantee
>>>>> two systems will be rebooted at the same time for every problem we
>>>>> face. This is my config file:
>>>>>
>>>>> ###################################cluster.conf####################################
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <?xml version="1.0"?>
>>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <clusternodes>
>>>>> <clusternode name="yoda2" nodeid="1" votes="1">
>>>>> <fence/>
>>>>> </clusternode>
>>>>> <clusternode name="yoda1" nodeid="2" votes="1">
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <fence/>
>>>>> </clusternode>
>>>>> </clusternodes>
>>>>> <cman expected_votes="1" two_node="1"/>
>>>>> <rm>
>>>>> <failoverdomains/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <resources/>
>>>>> </rm>
>>>>> <fencedevices/>
>>>>> </cluster>
>>>>> ###################################cluster.conf####################################
>>>>> Regards.
>>>>>
>>>>> Hi
>>>>
>>>> I configured a two node cluster with no fence device on RHEL5.1.
>>>> The cluster started and stopped with no issues. The only difference that
>>>> I see is that I have used FQDN in my cluster.conf
>>>>
>>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>>
>>>> Check your /etc/hosts if it has the FQDN in it.
>>>>
>>>> Thanks
>>>> Gowrishankar Rajaiyan
>>>>
>>>>
>>>>
>>>
>>
>> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>
>>> Hi,
>>> the problem of my cluster is that it start-up weel but after two days the
>>> problem that I have described is running, and this problem gets solved if
>>> the two machines are rebooted at the same time.
>>>
>>> Thanks
>>> Gian Paolo
>>>
>>
>>
>> Hi Gian
>>
>> Could you please attach the logs.
>>
>> Thanks
>> Gowrishankar Rajaiyan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/b39ebe5e/attachment.htm>
More information about the Linux-cluster
mailing list