[Linux-cluster] Re: Starting up two of three nodes that compose a cluster

carlopmart carlopmart at gmail.com
Fri Sep 21 17:21:05 UTC 2007


David Teigland wrote:
> On Fri, Sep 21, 2007 at 06:36:04PM +0200, carlopmart wrote:
>> David Teigland wrote:
>>> On Fri, Sep 21, 2007 at 06:15:37PM +0200, carlopmart wrote:
>>>>>> [root at thranduil ~]# fence_ack_manual -n elrond.hpulabs.org
>>>>>>
>>>>>> Warning:  If the node "elrond.hpulabs.org" has not been manually fenced
>>>>>> (i.e. power cycled or disconnected from shared storage devices)
>>>>>> the GFS file system may become corrupted and all its data
>>>>>> unrecoverable!  Please verify that the node shown above has
>>>>>> been reset or disconnected from storage.
>>>>>>
>>>>>> Are you certain you want to continue? [yN] y
>>>>>> can't open /tmp/fence_manual.fifo: No such file or directory
>>>>> That looks like the old RHEL4/cluster-1.0 version of fence_ack_manual...
>>>> And has some solution???
>>> You need to make sure the RHEL4/cluster-1.0 binaries are removed from the
>>> nodes and the new RHEL5/cluster-2.0/openais binaries are installed.  If
>>> you're getting this far, it may only be some fencing binaries that are
>>> incorrect, so first just remove fence_manual and fence_ack_manual and make
>>> sure you have the new fence_ack_manual installed (it's now a bash script).
>>> fence_manual no longer exists in RHEL5/cluster-2.0 code since
>>> fence_ack_manual talks directly with fenced.
>>>
>>> Dave
>>>
>>>
>> Sorry??? this three nodes are RHEL5 with lastest patches applied except 
>> kernel version 2.6.18-8.1.10.
>>
>> Version of cman is: cman-2.0.64-1.0.1.el5
>> Version of gfs-utils:
>> Version of rgmanager: rgmanager-2.0.24-1.el5
>>
>>  And fence-manual exists on this cluster suite:
>>
>> [root at haldir xen]# whereis fence_manual
>> fence_manual: /sbin/fence_manual /usr/share/man/man8/fence_manual.8.gz
>> [root at haldir xen]# rpm -qf /sbin/fence_manual
>> cman-2.0.64-1.0.1.el5
>> [root at smeagol xen]#
>>
>> And fence_ack_manual it is not a bash script, it is a binary:
>>
>> [root at haldir xen]# whereis fence_ack_manual
>> fence_ack_manual: /sbin/fence_ack_manual 
>> /usr/share/man/man8/fence_ack_manual.8.gz
>> [root at haldir xen]# cd /sbin
>> [root at haldir sbin]# file fence_ack_manual
>> fence_ack_manual: ELF 32-bit LSB executable, Intel 80386, version 1 
>> (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for 
>> GNU/Linux 2.6.9, stripped
>> [root at haldir sbin]#
>>
>>  Do I need to install rhel5.1 beta to do this?? If it yes i have a very 
>> very great problem ....
> 
> Looks like I was wrong about what got into RHEL5, it's a real pity the new
> stuff didn't make it.  Looking back at your cluster.conf file it seems
> that you're using fence_gnbd for that node, so my next guess is that
> fence_gnbd isn't found or isn't working.
> 
> I can't find a way to override a failing fence operation in the RHEL5
> code, so that probably means you'll have to get fence_gnbd working.
> 
> Or, another somewhat dangerous option is to disable startup fencing
> altogether by adding this to cluster.conf:
>   <fence_daemon clean_start="1"/>
> 
> Dave
> 
> 
Thanks Dave, but I have tried clean_start without luck ... Error is the 
same. Fence_gnd works ok, almost when three nodes are up. 
(deagol.hpulabs.org is a VMWare virtual machine allocated on a ESX cluster).

Well I will try to do a cron job to change cluster.conf at 00:00 AM on 
Monday ... I think that this is the only option ....

-- 
CL Martinez
carlopmart {at} gmail {d0t} com




More information about the Linux-cluster mailing list