[Linux-cluster] Running GFS without fencing and maybe locking ;-)

Thu Mar 23 09:42:00 UTC 2006

David Teigland wrote:
> If you guarantee that your ro mounts will never be remounted to rw, then
> it should be safe to not fence them when they fail (since they'll never
> under any circumstance make any writes to the fs).
> 
> The problem is not related to fencing, but comes when the node mounting rw
> fails.  There are only ro mounts left and none of them can recover the
> journal of the failed node because they can't write.  What these ro mounts
> do next is the important part, and there appears to be a shortcoming in
> the current code that I just noticed.  It looks like the ro mounts will
> continue reading the fs normally without the journal of the failed rw node
> ever being replayed.  They'll likely come across some inconsistent part of
> the fs and panic/withdraw.  It shouldn't be difficult to test this.

OK, understood.

For this test I set up a 3 nodes cluster:

adnux2 and adnux3 have the GFS mounted readonly
adnux4 has GFS mounted with write access.

For the test I was copying files from /usr to the GFS and during this
copy process I powered off this host (adnux4).

Both nodes noticed the failed node and adnux2 fenced it. But as you
said, it was not possible to write the journal back:

/var/log/messages (adnux2):
---
Mar 23 10:02:45 adnux2 kernel: CMAN: removing node adnux4 from the
cluster : Missed too many heartbeats
Mar 23 10:02:46 adnux2 fenced[9559]: adnux4 not a cluster member after 0
sec post_fail_delay
Mar 23 10:02:46 adnux2 fenced[9559]: fencing node "adnux4"
Mar 23 10:02:46 adnux2 fenced[9559]: fence "adnux4" success
...
Mar 23 10:02:53 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=2:
Trying to acquire journal lock...
Mar 23 10:02:53 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=2:
Looking at journal...
Mar 23 10:02:53 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=2:
Can't replay: read-only FS
Mar 23 10:02:53 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=2:
Failed
Mar 23 10:02:55 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=1:
Trying to acquire journal lock...
Mar 23 10:02:55 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=1:
Looking at journal...
Mar 23 10:02:55 adnux2 kernel: GFS: fsid=adnuxCluster1:adnux.0: jid=1: Done
...
---

So even if I'm adding two hosts to the cluster with write access then
how can I control, that if one of these nodes fails, the other node with
write access is replaying the journal (and not the node which is doing
the fencing and has the filesystem mounted readonly)?

Ok, back to the tests. Both nodes (adnux2 and adnux3) weren't able to
access the filesystem after adnux4 failed:

adnux3 data # ls -l /home/data/gfs
ls: /home/data/gfs: Input/output error

None of the following steps helped:
- adnux4 rejoins the cluster
- adnux4 mounts the GFS with write access
- adnux4: unmounting filesystem and repaired the GFS with gfs_fsck

Only after mounting and remounting the filesystem on one of the nodes,
the access was possible again.

But to finish this now, I think we need these spectators and at least
two nodes which can be fenced (only GFS on the HBA, no Database...).

Thanks again, Dave.

Arnd