[Linux-cluster] cluster not fencing after filesystem failure

Robert Jacobson Robert.C.Jacobson at nasa.gov
Wed Apr 29 12:48:37 UTC 2015


Hi,

I'm having a problem on CentOS 6.5 with a two-node cluster for HA NFS. 
Here's the cluster.conf:  http://pastebin.com/aVAuUDtc

The cluster nodes are VMware guests.  Occasionally the node providing
the NFS service has a problem accessing the disk device (I'm working
with VMware on that...), but long story short -- the kernel shuts down
the XFS filesystem:

Apr 25 02:29:51 sdo-dds-nfsnode2 kernel: XFS (dm-10): metadata I/O
error: block 0x170013a900 ("xlog_iodone") error 5 buf count 65536
Apr 25 02:29:51 sdo-dds-nfsnode2 kernel: XFS (dm-10):
xfs_do_force_shutdown(0x2) called from line 1062 of file
fs/xfs/xfs_log.c.  Return address = 0xffffffffa027f131
Apr 25 02:29:51 sdo-dds-nfsnode2 kernel: XFS (dm-10): Log I/O Error
Detected.  Shutting down filesystem
Apr 25 02:29:51 sdo-dds-nfsnode2 kernel: nfsd: non-standard errno: 5
Apr 25 02:29:51 sdo-dds-nfsnode2 kernel: XFS (dm-10): Please umount the
filesystem and rectify the problem(s)

rgmanager noticed the filesystem problem (see log at
http://pastebin.com/mPPBP2HY ), and marked "HA_nfs" service in a failed
state.

What I'm confused about is why the fencing is not taking place in the
above scenario.  I'm guessing I have either a misunderstanding or
misconfiguration.
At this point I'd like the other node to fence the failed one and take
over.  Or, the failed node to fence itself.

I've tested fencing from the command line and it works:
fence_vmware_soap --ip 192.168.50.9 --username ddsfence --password
secret -z --action reboot -U  "423d288c-03ff-74bf-9a4f-bf661f8ed87b"

I'd appreciate any help with this.

package versions, if it matters:

rgmanager-3.0.12.1-19.el6.x86_64
cman-3.0.12.1-59.el6_5.2.x86_64

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Robert Jacobson               Robert.C.Jacobson at nasa.gov
Lead System Admin       Solar Dynamics Observatory (SDO)
Bldg 14, E222                             (301) 286-1591 




More information about the Linux-cluster mailing list