[Linux-cluster] gfs2_jadd borked my cluster?

rhurst at bidmc.harvard.edu rhurst at bidmc.harvard.edu
Wed Oct 20 16:41:07 UTC 2010


Latest RHEL 5u5 with a four node cluster:

cman-2.0.115-34.el5_5.3
gfs2-utils-0.1.62-20.el5
kernel-2.6.18-194.17.1.el5

Three nodes are blades; the fourth is a KVM guest.

I executed `gfs2_jadd -j1 /home` to add a fourth journal; it completely successfully with old=3, new=4 message.  I checked on all three nodes with `gfs2_tool journals /home` and they all reported four journals of size 128MB.

I joined KVM guest to cluster.  I attempted to mount /home and it complained there were only three journals.  EH???  So, I umount /home on a blade and mount /home on the KVM guest -- it allowed it to mount.

Checking journals on all hosts again, they now report only 3.

I umount /home on KVM guest, and re-mounted it on the blade.  It, too, only reports 3 journals now.

I repeated process again, but second time around, I got a GFS2 filesystem withdrawal dump on the guest.  And now the DLM has got that channel locked on all nodes with a LEAVE_STOP_WAIT status.  I tried fence_node against the guest, it re-booted the node fine, but now DLM fence is locked with a FAIL_ALL_STOPPED status.

1) Can I clear this issue (obviously without re-booting)?

2) What could possibly have gone wrong with gfs2_jadd?




More information about the Linux-cluster mailing list