[Linux-cluster] Unclean shutdown/restart procedure

Patrick Caulfield pcaulfie at redhat.com
Fri Sep 10 09:40:39 UTC 2004


On Wed, Sep 08, 2004 at 05:01:12PM -0400, Fredric Isaman wrote:
> I believe this is a bug, but it may be I misunderstand how to cleanly
> bring up/down a whole cluster.
> 
> I have a 3-node cluster, using today's CVS.  I can bring it up fine, mount
> a gfs filesystem over iscsi on each node, then shut it down fine, using
> the procedures from useage.txt.  However, if I then startup again without
> rebooting, when I get to clvmd -d , it will increase the active subsystem
> count, and fail with:
> 
>   Unable to create lockspace for CLVM
>   Can't initialise cluster interface
> 
> and the kernel log shows:
> 
>   dlm: Can't bind to port 21064
>   dlm: cannot start lowcomms -98
> 
> At this point the node that ran clvmd has an active subsystem count of
> one.  Shutting down the cluster (using cman_tool leave force) and
> restarting it does not change this. Trying to run clvmd in this state
> causes the machine to immediately hang with no messages to the log or
> console.

I think I've fixed the refcounting bug now. The "Can't bind" error is a
nuisance. If you shut the DLM down, depending on the state of the sockets, you
have to wait until all the other nodes have also shut down their connections to
this node before bringing it back up again.

patrick




More information about the Linux-cluster mailing list