[Linux-cluster] Restarting GFS2 without reboot

Tue Nov 26 15:44:39 UTC 2013

On 26/11/13 07:43, Vladimir Melnik wrote:
> On Tue, Nov 26, 2013 at 12:34:35PM +0000, Steven Whitehouse wrote:
>>> I have to admit that fencing hasn't been enabled in this cluster, 90% of
>>> jobs on these 2 nodes are working with other storage that is accessible
>>> by NFS. So it wouldn't be okay to reboot a node due to any problems with
>>> GFS2.
>> In which case, get fencing configured first. Otherwise the first time
>> there is a problem, you risk data corruption. There is a very good
>> reason that fencing is required. It sounds like your overall config
>> needs a bit of a rethink,
> 
> Yes, I'm going to move GFS2 on separate cluster which will have fencing,
> because I understand there's a huge risk to corrupt all the data.
> 
> But are there any suggestions on how to remount GFS2 now?
> 
> Thank you!

It's not just data corruption risk.

As I understand the mechanics (and Steven would know better);

Node fails, peer calls fenced.
fenced informs DLM, dlm blocks
fence loops until it succeeds
fenced informs DLM, locks on now-fenced node are reaped and
cleanup/recover begins.

Without fencing, it will enter that loop and not recover, leaving your
cluster blocked.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?