[Linux-cluster] How to set up NFS HA service

Tue Apr 19 15:58:34 UTC 2005

On Tue, 2005-04-19 at 15:08 +0200, birger wrote:

> I thought I'd just mount my gfs file systems outside the resource 
> manager's control to have them present all the time and just use the 
> resource manager to move over the IP address and do the NFS magic. That 
> seems impossible, as I couldn't get any exports to happen when I defined 
> them in cluster.conf without a surrounding <fs>.

Known bug/feature:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151669

You can change this behavior if you wanted to by adding <child type=...>
to service.sh's "special" element in the XML meta-data.

> It almost works. I get mounts, and they get exported. But I have some 
> error messages in the log file and the exports take a loooong time. Only 
> 2 of the 3 exports defined seem to show up.
> 
> I'm also a bit puzzled about why the file systems don't get unmounted 
> when I disable all services.

They're GFS.  Add force_unmount="1" to the <fs> elements if you want
them to be umounted.  GFS is nice because you *don't* have to umount
it.  

FYI, NFS services on traditional file systems don't cleanly stop right
now due to an EBUSY during umount from the kernel.  Someone's looking in
to it on the NFS side (apparently, not all the refs are getting cleared
if a node has an NFS mount ref and we unexport the FS, or something).

> As for file locking:
> I copied /etc/init.d/nfslock to /etc/init.d/nfslock-svc and made some 
> changes.
> First, I added a little code to enable nfslock to read a variable 
> STATD_STATEDIR for the -p option from the config file in /etc/sysconfig. 
> I think this should get propagated back to upcoming fedora releases if 
> someone who knows how would bother to do it... I then changed 
> nfslock-svc to read a different config file (/etc/sysconfig/nfs-svc) and 
> to do 'service nfslock stop' at the top of the start section and 
> 'service nfslock start' at the bottom of the stop section.
> This enables me to have statd running as e.g. 'server1' on the cluster 
> node until it takes over the nfs service. At takeover, statd gets 
> restarted with statedir on a cluster file system (so it can take over 
> lock info belonging to the service) and with the name of the NFS service 
> IP address. Does this sound reasonable? 

Sort of.  There's a lot needed to make NFS locks fail over properly, and
some of it has to be done in the kernel.  My memories of working on this
had the following things (this might be incomplete):

rpc.statd:

- Make rpc.statd monitor who had locks (SM_MONITOR requests from lockd)
based on the inbound IP address on which the lock request was received.
This isn't easy, because the only client of rpc.statd is lockd on the
local node.  Also, you'll need to have this list either replicated or on
a shared partition which can be moved about with the rest of the
service.

- Make rpc.statd be able to to notify clients who took locks based on a
specific IP address (I think my patch or one just like it eventually
made it into nfs-utils, so this might be done).

lockd:

- Add a method to tell lockd to set a grace period for lock recovery.
This should only be done for a specific IP or device only, not on
everything -- otherwise, every time an NFS service was started/failed-
over, every single NFS client who wanted a lock would have to wait for
the grace period - which sucks.

- Provide a way for lockd to tell rpc.statd what IP address the lock
request came in on.

> Apr 19 14:42:58 server1 clurgmgrd[7498]: <notice> Service nfssvc started
> Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
> Apr 19 14:43:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)
> Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts-ro" returned 1 (generic error)
> Apr 19 14:44:56 server1 clurgmgrd[7498]: <notice> status on nfsclient "nis-hosts" returned 1 (generic error)

Hmm, that's odd, it could be a bug in the status phase which is related
to NIS exports.  Does this only happen after a failover, or does it
happen all the time?

-- Lon