[Linux-cluster] a couple of questions regarding clusters

Fri Jul 13 13:35:08 UTC 2007

On Sat, 2007-06-23 at 17:23 -0500, Brent Sachnoff wrote:
> I have a 3 node cluster running redhat 4 with gfs.  What is the proper
> way to have a node leave the cluster for maintenance and then rejoin
> after maintenance is completed?  From the docs, I have read that I
> need to unmount gfs and then stop all the services in the following
> order: rgmanager, gfs, clvmd, fenced.  I can then issue a cman_tool
> leave (remove) request.
> 
>  
> 
> I have also noticed that if I lose ip connectivity to a certain node I
> lose gfs connectivity with the other two nodes.  I thought that I
> would only need 2 votes to continue connectivity. 
> 
>  
> 
> Thanks for the help!

Hi Brent,

I've been up to my eyeballs in alligators recently, and somehow your
email seems to have slipped my attention.  Sorry about that.
Usually I'm better at responding to emails (usually within one day).
I don't know if anyone has answered your question yet (it's always
better to ask on the linux-cluster public mailing list where you
can get input from hundreds of people rather than just me) but I'll
try.

Your summary of how to remove a node for maintenance sounds right.
That should be it.

If you unmount the gfs file system from one of the nodes,
the other nodes should still have normal access to it.
If a node leaves the cluster under abnormal conditions, like 
it dies or has the power switch turned off, the other nodes will
freeze their activity on the gfs file system until they know the
node has been fenced properly.  That usually means an "ACK" from
the fence agent is passed to the other nodes.  If you're using
manual fencing, they'll hang forever, waiting for the ack command.
That's done on purpose to ensure the file system's integrity.

I've not seen a problem with that area.  If you're having a problem
with it, you can open up a bugzilla record against the appropriate
version of software and we'll dig into it, but you'll probably
need to provide instructions on how to recreate it, since we
haven't seen the problem here.

Regards,

Bob Peterson
Red Hat Cluster Suite