[Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface

Tue Jun 13 03:17:50 UTC 2006

On Monday June 12, wcheng at redhat.com wrote:
> NFS v2/v3 active-active NLM lock failover has been an issue with our
> cluster suite. With current implementation, it (cluster suite) is trying
> to carry the workaround as much as it can with user mode scripts where,
> upon failover, on taken-over server, it:
> 
> 1. Tear down virtual IP.
> 2. Unexport the subject NFS export.
> 3. Signal lockd to drop the locks.
> 4. Un-mount filesystem if needed.
> 
...
>                                                                 we would
> like to be able to selectively drop locks (only) associated with the
> requested exports without disrupting other NFS services. 

There seems to be an unstated assumption here that there is one
virtual IP per exported filesystem.  Is that true?

Assuming it is and that I understand properly what you want to do....

I think that maybe the right thing to do is *not* drop the locks on a
particular filesystem, but to drop the locks made to a particular
virtual IP.

Then it would make a lot of sense to have one lockd thread per IP, and
signal the lockd in order to drop the locks.
True: that might be more code.  But if it is the right thing to do,
then it should be done that way.

On the other hand, I can see a value in removing all the locks for a
particular filesytem quite independent of failover requirements.
If I want to force-unmount a filesystem, I need to unexport it, and I
need to kill all the locks.  Currently you can only remove locks from
all filesystems, which might not be ideal.

I'm not at all keen on the NFSEXP_FOLOCK flag to exp_unexport, as that
is an interface that I would like to discard eventually.  The
preferred mechanism for exporting filesystems is to flush the
appropriate 'cache', and allow it to be repopulated with whatever is
still valid via upcalls to mountd.

So:
 I think if we really want to "remove all NFS locks on a filesystem",
 we could probably tie it into umount - maybe have lockd register some
 callback which gets called just before s_op->umount_begin.

 If we want to remove all locks that arrived on a particular
 interface, then we should arrange to do exactly that.  There are a
 number of different options here. 
  One is the multiple-lockd-threads idea.
  One is to register a callback when an interface is shut down.
  Another (possibly the best) is to arrange a new signal for lockd
  which say "Drop any locks which were sent to IP addresses that are
  no longer valid local addresses".

So those are my thoughts.  Do any of them seem reasonable to you?

NeilBrown