[Linux-cluster] GFS volume locks during cluster node join/leave

Martijn Storck martijn.storck at gmail.com
Fri Mar 18 08:50:53 UTC 2011


Hello again,

We have a 3-node RHCS cluster with a shared GFS volume that's performing
quite well after some tuning, so I couldn't be happier.

However, whenever a node leaves the cluster (be it in a 'nice' way by
rebooting, or after being fenced) our GFS volume is unusable for at least 30
seconds. Even an 'ls' on the volume blocks during this period. During this
period I see no activity in the /var/log/messages of the other nodes. The
only message is that one node is leaving the cluster. After 30 seconds the
cluster starts reconfiguring.

When I fence a node the same thing happens. It takes about 30 seconds before
the other nodes try to reclaim the journal of the lost node, which in itself
takes over a minute. Once the missing node rejoins after a reboot, the GFS
is again unavaiable for a long period.

Is this expected behaviour? Is there anything we can do to reduce these
delays? We run 10 VMs on our active nodes.. it's a shame to have these all
lock up because we're rebooting a passive node :)

Thanks!

Cheers,
Martijn Storck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20110318/c399b94e/attachment.htm>


More information about the Linux-cluster mailing list