[Linux-cluster] Re: How to configure a cluster to remain up in the event of node failure

Christine Caulfield ccaulfie at redhat.com
Wed Aug 13 10:09:13 UTC 2008


Brett Cave wrote:
> I think I found a problem with the way it starts up...  See just below
> the startup output for more info...
> 
> On Tue, Aug 12, 2008 at 4:59 PM, Brett Cave <brettcave at gmail.com> wrote:
>> With a 3node gfs1 cluster, and if i hard reset 1 node, it hangs on
>> startup, although the cluster seems to return to normal.
>> Nodes: node2, node3, node4
>> each node has 1 vote, and a qdisk has 2 votes.
>>
>> If I reset node3, gfs on node2 and node4 is blocked while node3
>> restarts. First question: is there a config that will allow the
>> cluster to continue operating while 1 node is down? My quorum is 3 and
>> total votes is 4 while node3 is restarting, but my gfs mountpoints are
>> inaccessible until my cman services start up on node3.
>>
>> Secondly, when node3 restarts, it hangs when trying to remount gfs file systems.
>> Starting cman
>> Mounting configfs...done
>> Starting ccsd...done
>> Starting cman...done
>> Starting daemons...done
>> Starting fencing...done
>>                   OK
>> qdiskd        OK
>>
>> "Mounting other file systems..." OK
>>
>> Mounting GFS filesystems: GFS 0.1.1-7.el5 installed
>> Trying to join cluster "lock_dlm","jemdevcluster:cache1"
>> dlm: Using TCP for communications
>> dlm: connecting to 2
>> dlm: got connection to 2
>> dlm: connecting to 2
>> dlm: got connection from 4
> 
> Could this be the problem?

Yes, that's bad! You should only get one "connecting to" message per 
node. If you're getting two it looks like the connection is being closed 
by the remote node for some reason. Are there any messages on node 2 
that might give a clue as to what's happening ?


Chrissie




More information about the Linux-cluster mailing list