[Linux-cluster] Re: Starting up two of three nodes that compose a cluster

carlopmart carlopmart at gmail.com
Fri Sep 21 16:08:29 UTC 2007


David Teigland wrote:
> On Fri, Sep 21, 2007 at 05:50:09PM +0200, carlopmart wrote:
>> David Teigland wrote:
>>> On Fri, Sep 21, 2007 at 05:29:22PM +0200, carlopmart wrote:
>>>> [root at thranduil log]# mount -t gfs /dev/xvdc1 /data
>>>> /sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -22
>>>> /sbin/mount.gfs: error mounting lockproto lock_dlm
>>> This has already been changed to report a descriptive error message,
>>>  "node not a member of the default fence domain"
>>>
>>> as is shown in the debug log from gfs_controld below, and I suspect
>>> appears in your /var/log/messages.
>>>
>>>> 1190388485 mount: not in default fence domain
>>>> 1190388485 datavol01 do_mount: rv -22
>>>> [root at thranduil log]# group_tool -v; group_tool dump gfs
>>>> type             level name     id       state node id local_done
>>>> fence            0     default  00010001 JOIN_START_WAIT 1 100010001 0
>>>> [1]
>>> This shows it's not in the fence domain yet.  The reason appears to be
>>> that it's trying to fence someone.  Again, look in /var/log/messages to
>>> find out more information about what needs to be fenced, or why fencing
>>> isn't working.
>>>
>>> Dave
>>>
>>>
>> Correct Dave. Error is:
>>
>> Sep 21 16:51:25 thranduil fenced[1081]: fencing node "elrond.hpulabs.org"
>> Sep 21 16:51:25 thranduil fenced[1081]: fence "elrond.hpulabs.org" failed
>>
>>  And it is ok. "elrond.hpulabs.org" is the node that I can't startup 
>> (it is on maintenance hardware until monday). I need to start all other 
>> cluster services under thranduil and haldir .... Is it possible???
> 
> Two options:
> 
> 1. Remove that node from cluster.conf so it's not fenced every time the
> cluster starts up.
> 
> 2. Manually override/ack the fencing operation every time it happens with:
> fence_ack_manual -n elrond.hpulabs.org.  This will allow things to
> continue.
> 
> Dave
> 
> 
  First option it isn't possible because I can't restore cluster.conf 
when elrond comes up on the other two nodes.

  Second option returns me this error:

  [root at thranduil ~]# clustat
Member Status: Quorate

   Member Name                        ID   Status
   ------ ----                        ---- ------
   thranduil.hpulabs.org                 1 Online, Local, rgmanager
   haldir.hpulabs.org                    2 Online, rgmanager
   elrond.hpulabs.org                    3 Offline

   Service Name         Owner (Last)                   State
   ------- ----         ----- ------                   -----
   service:rsync-svc    (none)                         stopped
   service:wwwsoft-svc  (none)                         stopped
   service:proxy-svc    (thranduil.hpulabs.org)        stopped
   service:mail-svc     (none)                         stopped
[root at thranduil ~]# fence_ack_manual -n elrond.hpulabs.org

Warning:  If the node "elrond.hpulabs.org" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!  Please verify that the node shown above has
been reset or disconnected from storage.

Are you certain you want to continue? [yN] y
can't open /tmp/fence_manual.fifo: No such file or directory

-- 
CL Martinez
carlopmart {at} gmail {d0t} com




More information about the Linux-cluster mailing list