[NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
Wendy Cheng
wcheng at redhat.com
Thu Apr 26 04:35:13 UTC 2007
Neil Brown wrote:
>On Monday April 23, wcheng at redhat.com wrote:
>
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>> We started the discussion using network interface (to drop
>>the locks) but found it wouldn't work well on local filesytems such as
>>ext3. There is really no control on which local (sever side) interface
>>NFS clients will use (shouldn't be hard to implement one though). When
>>the fail-over server starts to remove the locks, it needs a way to find
>>*all* of the locks associated with the will-be-moved partition. This is
>>to allow umount to succeed. The server ip address alone can't guarantee
>>that. That was the reason we switched to fsid. Also remember this is NFS
>>v2/v3 - clients have no knowledge of server migration.
>>
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem. They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>
>
A convincing argument... unfortunately, this happens to be a case where
we need to protect server from client's misbehaviors. For a local
filesystem (ext3), if any file reference count is not zero (i.e. some
clients are still holding the locks), the filesystem can't be
un-mounted. We would have to fail the failover to avoid data corruption.
>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid. Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>
>
This debate has been (so far) tolerable and helpful - so I'm not going
to comment on this paragraph :) ... But I have to remind people my first
proposal was adding new flags into export command (say "exportfs -ud" to
unexport+drop locks, and "exportfs -g" to re-export and start grace
period). Then we moved to "echo network-addr into procfs", later
switched to "fsid" approach. A very long journey ...
>
>
>>> The reply to SM_MON (currently completely ignored by all versions
>>> of Linux) has an extra value which indicates how many more seconds
>>> of grace period there is to go. This can be stuffed into res_stat
>>> maybe.
>>> Places where we currently check 'nlmsvc_grace_period', get moved to
>>> *after* the nlmsvc_retrieve_args call, and the grace_period value
>>> is extracted from host->nsm.
>>>
>>>
>>>
>>>
>>ok with me but I don't see the advantages though ?
>>
>>
>
>So we can have a different grace period for each different 'host'.
>
>
IMHO, having grace period for each client (host) is overkilled.
> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem. We know they can
>only be held by client that sent request to a given set of IP
>addresses. Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd. The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line. This will tell
>lockd to drop those lockd. When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>
>
Bright idea ! But doesn't solve the issue of misbehaved clients who come
in from un-wanted (server) interfaces. Does it ?
>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve. Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>
>
I'm trying to do the write-up now. But could the following temporarily
serve the purpose ? What is not clear from this thread of discussion?
http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html
-- Wendy
More information about the Cluster-devel
mailing list