[Linux-cluster] NFS Serving Issues

Colin Simpson Colin.Simpson at iongeo.com
Tue Aug 16 19:29:58 UTC 2011


Hi

I have two issues with clustered NFS Services on RHEL6.1. One is an
oddity and the other is a problem umounting NFS mounted file-systems.

First issue if I define my NFS services as (cluster.conf fragment):

<resources>
  <ip address="10.10.50.41" monitor_link="1"/>
  <fs device="/dev/cluvg00/lv00home" force_fsck="1" force_unmount="1"
mountpoint="/mnt/home" name="homefs" options="acl" quick_status="0"
self_fence="0"/>
  <nfsexport name="exporteclunfshome"/>
  <nfsclient name="nfsdhome" options="rw" target="10.0.0.0/8"/>
</resources>

<service autostart="0" domain="cluBnfb" exclusive="0" name="nfsdhome"
nfslock="1" recovery="relocate">
  <ip ref="10.10.50.41">
   <fs ref="homefs">
   <nfsexport ref="exportclunfshome">
     <nfsclient ref="nfsdhome"/>
   </nfsexport>
   </fs>
  </ip>
</service>

,when the service is stopped I get a "Stale NFS file handle" from
mounted filesystems accessing the NFS mount point at those times. i.e.
if I have a copy going I get on the service being disabled:

cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-client-5.6-x86_64-dvd2.iso': Stale NFS
file handle
cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-server-5.6-x86_64-dvd.iso': Stale NFS file
handle
cp: cannot stat `/home/wsmith/ww/cstst/./rhel-server-6.0-i386-dvd.iso':
Stale NFS file handle
cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-server-6.0-x86_64-dvd.iso': Stale NFS file
handle
cp: cannot stat
`/home/wsmith/ww/cstst/./rhel-workstation-6.0-i386-dvd.iso': Stale NFS
file handle

The above format of cluster.conf having the "ip ref" contain the rest of
the things is as per the "Deploying Highly Available NFS on Red Hat
Enterprise Linux 6" document.

But if I don't enclose the nfs and fs things in the ip, the clients hang
until the services restart i.e

<service autostart="0" domain="cluBnfb" exclusive="0" name="nfsdhome"
nfslock="1" recovery="relocate">
  <ip ref="10.10.50.41"\>
   <fs ref="homefs">
   <nfsexport ref="exportclunfshome">
     <nfsclient ref="nfsdhome"/>
   </nfsexport>
   </fs>
</service>

This seems more sensible as a behaviour, as it would appear to be more
predictable from the clients (i.e their processes hang until the NFS
reappears). So in the case of my copy above it just resumes when the NFS
service reappears. This is as per the NFS cookbook.

BTW Is it best practice to use one nfsexport per nfsclient or is one
nfsexport resource enough cluster wide?

Why is there a behaviour disparity? Which is correct?

Question 2: I have the old case on either of the above where I can't
unmount the exported file system when I stop the service (so I can't
migrate it). Not unless I halt the file server hosting the file share or
force fence it. I just get the old:

# umount /mnt/home
umount: /mnt/home: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Of course nothing is shown in lsof or fuser. This is annoying for a
number of reasons. One is that I can't readily perform basic load
balancing by migrating NFS services to their correct nodes (as I can't
migrate a service without halting a node). 

But more seriously I can't easily shut the cluster down cleanly when
told to by a UPS on power outage. Shutting down the node will be unable
to be performed cleanly as a resource is open (so will be liable to
fencing). If I halt the node (the least bad option left I can see) it
will get fenced and it will start booting back up again (which I'd like
pretty much in all other circumstances except a power outage). Forcing
the node to leave the cluster before my halt, will result in fencing and
restart. "umount -fl" doesn't free the resource locking the services.

Any tips for how to make this work more cleanly or how to free the
things stopping the NFS exported filesystem umounting cleanly?

Thanks for any advice

Colin

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.






More information about the Linux-cluster mailing list