[NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

Thu Apr 26 04:35:13 UTC 2007

Neil Brown wrote:

>On Monday April 23, wcheng at redhat.com wrote:
>  
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>>             We started the discussion using network interface (to drop 
>>the locks) but found it wouldn't work well on local filesytems such as 
>>ext3. There is really no control on which local (sever side) interface 
>>NFS clients will use (shouldn't be hard to implement one though). When 
>>the fail-over server starts to remove the locks, it needs a way to find 
>>*all* of the locks associated with the will-be-moved partition. This is 
>>to allow umount to succeed. The server ip address alone can't guarantee 
>>that. That was the reason we switched to fsid. Also remember this is NFS 
>>v2/v3 - clients have no knowledge of server migration.
>>    
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem.  They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>  
>

A convincing argument... unfortunately, this happens to be a case where 
we need to protect server from client's misbehaviors. For a local 
filesystem (ext3), if any file reference count is not zero (i.e. some 
clients are still holding the locks), the filesystem can't be 
un-mounted. We would have to fail the failover to avoid data corruption.

>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid.  Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>  
>

This debate has been (so far) tolerable and helpful - so I'm not going 
to comment on this paragraph :) ... But I have to remind people my first 
proposal was adding new flags into export command (say "exportfs -ud" to 
unexport+drop locks, and "exportfs -g" to re-export and start grace 
period). Then we moved to "echo network-addr into procfs", later 
switched to "fsid" approach. A very long journey ...

>  
>
>>>  The reply to SM_MON (currently completely ignored by all versions
>>>  of Linux) has an extra value which indicates how many more seconds
>>>  of grace period there is to go.  This can be stuffed into res_stat
>>>  maybe.
>>>  Places where we currently check 'nlmsvc_grace_period', get moved to
>>>  *after* the nlmsvc_retrieve_args call, and the grace_period value
>>>  is extracted from host->nsm.
>>> 
>>>
>>>      
>>>
>>ok with me but I don't see the advantages though ?
>>    
>>
>
>So we can have a different grace period for each different 'host'.
>  
>

IMHO, having grace period for each client (host) is overkilled.

> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem.  We know they can
>only be held by client that sent request to a given set of IP
>addresses.   Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd.  The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line.  This will tell
>lockd to drop those lockd.  When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>  
>

Bright idea ! But doesn't solve the issue of misbehaved clients who come 
in from un-wanted (server) interfaces. Does it ?

>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve.  Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>  
>
I'm trying to do the write-up now. But could the following temporarily 
serve the purpose ? What is not clear from this thread of discussion?

http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html

-- Wendy