[Linux-cluster] GFS on 3-node cluster corrupted after full network outage

Jayson Vantuyl jvantuyl at engineyard.com
Tue Dec 12 08:30:51 UTC 2006


To my knowledge, yes and no.

GFS will continue to run but will be unable to do any locking.

This has interesting behavior.  Specifically, if a node has a lock on  
something in the GFS, it will continue to modify it.  It will  
continue to journal its changes too (since it locks a journal upon  
mounting).  It will not touch anything new because it won't be able  
to acquire locks on new parts of the GFS.  Any calls waiting on this  
will hang.

So, the GFS will still be potentially modified, but not corrupted.   
In fact, whenever a quorate subset of nodes eventually forms due to  
fencing or an administrator intervening somehow, it should initially  
fence the other nodes.  After fencing, the remaining quorate nodes  
(at least the ones mounting the GFS) will scan the journals on the  
GFS for uncommitted transactions and commit them (it may roll them  
back if appropriate, not sure about the details here).  So, assuming  
working fencing, this can't ever corrupt the GFS even though  
modifications still continue to the filesystem.  Despite the  
complexity, I believe this is actually very good behavior.

-- 
Jayson Vantuyl
Systems Architect
Engine Yard
jvantuyl at engineyard.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061212/29abf497/attachment.htm>


More information about the Linux-cluster mailing list