[Linux-cluster] RHEL 5.3 NFSv4 cluster

Tue Jun 30 16:40:33 UTC 2009

Hi,

Is there an up to date document detailing the configuration of an NFSv4
cluster service on a 2-node RHEL 5.3 Cluster Suite setup? Most of the
info I find is from 2006/2007 and states that these features are in a
state of flux and could change soon.

My current configuration is 2 nodes, RHEL 5.3 (kernel
2.6.18-128.1.14.el5PAE), SAN attached shared storage, with GFS2 file
systems.

I read the documents at:

http://wiki.linux-nfs.org/wiki/index.php/NFS_Recovery_and_Client_Migrati
on

http://www.howtoforge.com/high_availability_nfs_drbd_heartbeat

And also the NFS cluster cookbook and Red Hat's NFS cluster example. The
former two are fairly old, and the latter two documents seem fairly
basic and don't address certain issues like:

1.	Is it still recommended to configure /var/lib/nfs/v4recovery on
a shared file system between nodes?
2.	Do I need to set the "fsid=" parameter for every export in
/etc/exports and set it to a unique value? (I currently only have fsid
set for nfs root)
3.	Should I set all of the RPC services in /etc/sysconfig/nfs to
listen on a dedicated port?
4.	Can I leave the NFS service running on both nodes at the same
time and just fail over the IP address, or should I add the nfs service
script to the cluster config to start/stop it as part of the service?
5.	The NFS Recovery and Client Migration doc above mentions that
lock migration is not handled yet and that there needs to be a way to
release locks and leases during failover. Has this been addressed
somehow? Does stopping/starting the NFS service accomplish this?

Also, when mounting my NFS shares using the cluster's virtual IP address
or name, I get some errors in my NFS server's logs regarding timed out
callbacks:

Jun 25 15:00:12 node2 kernel: nfs4_cb: server <CLIENT1 IP ADDRESS> not
responding, timed out

Jun 25 17:07:37 node2 kernel: nfs4_cb: server <CLIENT2 IP ADDRESS> not
responding, timed out

If I mount the file system using the cluster node's static address/name,
these errors don't appear, but for obvious reasons, this is undesirable.

Thanks,

Eric

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090630/4e95d0eb/attachment.htm>