[Linux-cachefs] Adventures in NFS re-exporting

Frank Filz ffilzlnx at mindspring.com
Thu Dec 3 23:34:57 UTC 2020


> > > -----Original Message-----
> > > From: Trond Myklebust [mailto:trondmy at hammerspace.com]
> > > Sent: Thursday, December 3, 2020 2:14 PM
> > > To: bfields at fieldses.org
> > > Cc: linux-cachefs at redhat.com; ffilzlnx at mindspring.com; linux-
> > > nfs at vger.kernel.org; daire at dneg.com
> > > Subject: Re: Adventures in NFS re-exporting
> > >
> > > On Thu, 2020-12-03 at 17:04 -0500, bfields at fieldses.org wrote:
> > > > On Thu, Dec 03, 2020 at 09:57:41PM +0000, Trond Myklebust wrote:
> > > > > On Thu, 2020-12-03 at 13:45 -0800, Frank Filz wrote:
> > > > > > > On Thu, 2020-12-03 at 16:13 -0500, bfields at fieldses.org
> > > > > > > wrote:
> > > > > > > > On Thu, Dec 03, 2020 at 08:27:39PM +0000, Trond Myklebust
> > > > > > > > wrote:
> > > > > > > > > On Thu, 2020-12-03 at 13:51 -0500, bfields wrote:
> > > > > > > > > > I've been scratching my head over how to handle reboot
> > > > > > > > > > of a
> > > > > > > > > > re-
> > > > > > > > > > exporting server.  I think one way to fix it might be
> > > > > > > > > > just to allow the re- export server to pass along
> > > > > > > > > > reclaims to the original server as it receives them
> > > > > > > > > > from its own clients.  It might require some protocol
> > > > > > > > > > tweaks, I'm not sure.  I'll try to get my thoughts in
> > > > > > > > > > order and propose something.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It's more complicated than that. If the re-exporting
> > > > > > > > > server reboots, but the original server does not, then
> > > > > > > > > unless that
> > > > > > > > > re- exporting server persisted its lease and a full set
> > > > > > > > > of stateids somewhere, it will not be able to atomically
> > > > > > > > > reclaim delegation and lock state on the server on
> > > > > > > > > behalf of its clients.
> > > > > > > >
> > > > > > > > By sending reclaims to the original server, I mean
> > > > > > > > literally sending new open and lock requests with the
> > > > > > > > RECLAIM bit set, which would get brand new stateids.
> > > > > > > >
> > > > > > > > So, the original server would invalidate the existing
> > > > > > > > client's previous clientid and stateids--just as it
> > > > > > > > normally would on reboot--but it would optionally remember
> > > > > > > > the underlying locks held by the client and allow
> > > > > > > > compatible lock reclaims.
> > > > > > > >
> > > > > > > > Rough attempt:
> > > > > > > >
> > > > > > > >
> > > > > > > > https://wiki.linux-nfs.org/wiki/index.php/Reboot_recovery_
> > > > > > > > for_
> > > > > > > > re-expor
> > > > > > > > t_servers
> > > > > > > >
> > > > > > > > Think it would fly?
> > > > > > >
> > > > > > > So this would be a variant of courtesy locks that can be
> > > > > > > reclaimed by the client using the reboot reclaim variant of
> > > > > > > OPEN/LOCK outside the grace period? The purpose being to
> > > > > > > allow reclaim without forcing the client to persist the
> > > > > > > original stateid?
> > > > > > >
> > > > > > > Hmm... That's doable, but how about the following
> > > > > > > alternative:
> > > > > > > Add
> > > > > > > a function
> > > > > > > that allows the client to request the full list of stateids
> > > > > > > that the server holds on its behalf?
> > > > > > >
> > > > > > > I've been wanting such a function for quite a while anyway
> > > > > > > in order to allow the client to detect state leaks (either
> > > > > > > due to soft timeouts, or due to reordered close/open
> > > > > > > operations).
> > > > > >
> > > > > > Oh, that sounds interesting. So basically the re-export server
> > > > > > would re-populate it's state from the original server rather
> > > > > > than relying on it's clients doing reclaims? Hmm, but how does
> > > > > > the re-export server rebuild its stateids? I guess it could
> > > > > > make the clients repopulate them with the same "give me a dump
> > > > > > of all my state", using the state details to match up with the
> > > > > > old state and replacing stateids. Or did you have something
> > > > > > different in mind?
> > > > > >
> > > > >
> > > > > I was thinking that the re-export server could just use that
> > > > > list of stateids to figure out which locks can be reclaimed
> > > > > atomically, and which ones have been irredeemably lost. The
> > > > > assumption is that if you have a lock stateid or a delegation,
> > > > > then that means the clients can reclaim all the locks that were
> > > > > represented by that stateid.
> > > >
> > > > I'm confused about how the re-export server uses that list.  Are
> > > > you assuming it persisted its own list across its own
> > > > crash/reboot?
> > > > I
> > > > guess that's what I was trying to avoid having to do.
> > > >
> > > No. The server just uses the stateids as part of a check for 'do I
> > > hold state for this file on this server?'. If the answer is 'yes'
> > > and the lock owners are sane, then we should be able to assume the
> > > full set of locks that lock owner held on that file are still valid.
> > >
> > > BTW: if the lock owner is also returned by the server, then since
> > > the lock owner is an opaque value, it could, for instance, be used
> > > by the client to cache info on the server about which uid/gid owns
> > > these locks.
> >
> > Let me see if I'm understanding your idea right...
> >
> > Re-export server reboots within the extended lease period it's been
> > given by the original server. I'm assuming it uses the same clientid?
> 
> Yes. It would have to use the same clientid.
> 
> > But would probably open new sessions. It requests the list of
> > stateids. Hmm, how to make the owner information useful, nfs-ganesha
> > doesn't pass on the actual client's owner but rather just passes the
> > address of its record for that client owner. Maybe it will have to do
> > something a bit different for this degree of re-export support...
> >
> > Now the re-export server knows which original client lock owners are
> > allowed to reclaim state. So it just acquires locks using the original
> > stateid as the client reclaims (what happens if the client doesn't
> > reclaim a lock? I suppose the re-export server could unlock all
> > regions not explicitly locked once reclaim is complete). Since the
> > re-export server is acquiring new locks using the original stateid it
> > will just overlay the original lock with the new lock and write locks
> > don't conflict since they are being acquired by the same lock owner.
> > Actually the original server could even balk at a "reclaim" in this
> > way that wasn't originally held... And the original server could
> > "refresh" the locks, and discard any that aren't refreshed at the end
> > of reclaim. That part assumes the original server is apprised that
> > what is actually happening is a reclaim.
> >
> > The re-export server can destroy any stateids that it doesn't receive
> > reclaims for.
> 
> Right. That's in essence what I'm suggesting. There are corner cases to be
> considered: e.g. "what happens if the re-export server crashes after unlocking
> on the server, but before passing the LOCKU reply on the the client", however I
> think it should be possible to figure out strategies for those cases.

That's no different than a regular NFS server crashes before responding to an unlock. The client likely doesn't reclaim locks it was attempting to drop at server crash time. So then one place we would definitely have abandoned locks on the original server IF the unlock never made it to the original server. But we're already talking strategies to clean up abandoned locks.

I won't be surprised if we find a more tricky corner case, but my gut feel is every corner case will have a relatively simple solution.

Another consideration is how to handle the size of the state list... Ideally we would have some way to break it up that is less clunky than readdir (at least the state list can be assumed to be static during the course of the fetching of it, even for a regular client just interested in it, it could pause state activity until the list is retrieved).

Frank

Frank





More information about the Linux-cachefs mailing list