[Linux-cachefs] Adventures in NFS re-exporting

Thu Dec 3 22:39:52 UTC 2020

> -----Original Message-----
> From: Trond Myklebust [mailto:trondmy at hammerspace.com]
> Sent: Thursday, December 3, 2020 2:14 PM
> To: bfields at fieldses.org
> Cc: linux-cachefs at redhat.com; ffilzlnx at mindspring.com; linux-
> nfs at vger.kernel.org; daire at dneg.com
> Subject: Re: Adventures in NFS re-exporting
> 
> On Thu, 2020-12-03 at 17:04 -0500, bfields at fieldses.org wrote:
> > On Thu, Dec 03, 2020 at 09:57:41PM +0000, Trond Myklebust wrote:
> > > On Thu, 2020-12-03 at 13:45 -0800, Frank Filz wrote:
> > > > > On Thu, 2020-12-03 at 16:13 -0500, bfields at fieldses.org wrote:
> > > > > > On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust
> > > > > > wrote:
> > > > > > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote:
> > > > > > > > I've been scratching my head over how to handle reboot of
> > > > > > > > a
> > > > > > > > re-
> > > > > > > > exporting server.  I think one way to fix it might be just
> > > > > > > > to allow the re- export server to pass along reclaims to
> > > > > > > > the original server as it receives them from its own
> > > > > > > > clients.  It might require some protocol tweaks, I'm not
> > > > > > > > sure.  I'll try to get my thoughts in order and propose
> > > > > > > > something.
> > > > > > > >
> > > > > > >
> > > > > > > It's more complicated than that. If the re-exporting server
> > > > > > > reboots, but the original server does not, then unless that
> > > > > > > re- exporting server persisted its lease and a full set of
> > > > > > > stateids somewhere, it will not be able to atomically
> > > > > > > reclaim delegation and lock state on the server on behalf of
> > > > > > > its clients.
> > > > > >
> > > > > > By sending reclaims to the original server, I mean literally
> > > > > > sending new open and lock requests with the RECLAIM bit set,
> > > > > > which would get brand new stateids.
> > > > > >
> > > > > > So, the original server would invalidate the existing client's
> > > > > > previous clientid and stateids--just as it normally would on
> > > > > > reboot--but it would optionally remember the underlying locks
> > > > > > held by the client and allow compatible lock reclaims.
> > > > > >
> > > > > > Rough attempt:
> > > > > >
> > > > > >
> > > > > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_for_
> > > > > > re-expor
> > > > > > t_servers
> > > > > >
> > > > > > Think it would fly?
> > > > >
> > > > > So this would be a variant of courtesy locks that can be
> > > > > reclaimed by the client using the reboot reclaim variant of
> > > > > OPEN/LOCK outside the grace period? The purpose being to allow
> > > > > reclaim without forcing the client to persist the original
> > > > > stateid?
> > > > >
> > > > > Hmm... That's doable, but how about the following alternative:
> > > > > Add
> > > > > a function
> > > > > that allows the client to request the full list of stateids that
> > > > > the server holds on its behalf?
> > > > >
> > > > > I've been wanting such a function for quite a while anyway in
> > > > > order to allow the client to detect state leaks (either due to
> > > > > soft timeouts, or due to reordered close/open operations).
> > > >
> > > > Oh, that sounds interesting. So basically the re-export server
> > > > would re-populate it's state from the original server rather than
> > > > relying on it's clients doing reclaims? Hmm, but how does the
> > > > re-export server rebuild its stateids? I guess it could make the
> > > > clients repopulate them with the same "give me a dump of all my
> > > > state", using the state details to match up with the old state and
> > > > replacing stateids. Or did you have something different in mind?
> > > >
> > >
> > > I was thinking that the re-export server could just use that list of
> > > stateids to figure out which locks can be reclaimed atomically, and
> > > which ones have been irredeemably lost. The assumption is that if
> > > you have a lock stateid or a delegation, then that means the clients
> > > can reclaim all the locks that were represented by that stateid.
> >
> > I'm confused about how the re-export server uses that list.  Are you
> > assuming it persisted its own list across its own crash/reboot?  I
> > guess that's what I was trying to avoid having to do.
> >
> No. The server just uses the stateids as part of a check for 'do I hold state for
> this file on this server?'. If the answer is 'yes' and the lock owners are sane, then
> we should be able to assume the full set of locks that lock owner held on that
> file are still valid.
> 
> BTW: if the lock owner is also returned by the server, then since the lock owner
> is an opaque value, it could, for instance, be used by the client to cache info on
> the server about which uid/gid owns these locks.

Let me see if I'm understanding your idea right...

Re-export server reboots within the extended lease period it's been given by the original server. I'm assuming it uses the same clientid? But would probably open new sessions. It requests the list of stateids. Hmm, how to make the owner information useful, nfs-ganesha doesn't pass on the actual client's owner but rather just passes the address of its record for that client owner. Maybe it will have to do something a bit different for this degree of re-export support...

Now the re-export server knows which original client lock owners are allowed to reclaim state. So it just acquires locks using the original stateid as the client reclaims (what happens if the client doesn't reclaim a lock? I suppose the re-export server could unlock all regions not explicitly locked once reclaim is complete). Since the re-export server is acquiring new locks using the original stateid it will just overlay the original lock with the new lock and write locks don't conflict since they are being acquired by the same lock owner. Actually the original server could even balk at a "reclaim" in this way that wasn't originally held... And the original server could "refresh" the locks, and discard any that aren't refreshed at the end of reclaim. That part assumes the original server is apprised that what is actually happening is a reclaim.

The re-export server can destroy any stateids that it doesn't receive reclaims for.

Hmm, I think if the re-export server is implemented as an HA cluster, it should establish a clientid on the original server for each virtual IP (assuming that's the unit of HA)  that exists. Then when virtual IPs are moved, the re-export server just goes through the above reclaim process for that clientid.

Frank