[Linux-cluster] RHEL/CentOS-6 HA NFS Configuration Question

Thu May 17 08:26:46 UTC 2012

On 05/16/2012 08:19 PM, Colin Simpson wrote:
> This is interesting.
> 
> We very often see the filesystems fail to umount on busy clustered NFS
> servers.

Yes, I am aware the issue since I have been investigating it in details
for the past couple of weeks.

> 
> What is the nature of the "real fix"?

First, the bz you mention below is unrelated to the unmount problem we
are discussing. clustered nfsd locks are a slightly different story.

There are two issues here:

1) cluster users expectations
2) nfsd internal design

(and note I am not blaming either cluster or nfsd here)

Generally cluster users expect to be able to do things like (fake meta
config):

<service1..
 <fs1..
  <nfsexport1..
   <nfsclient1..
    <ip1..
....
<service2
 <fs2..
  <nfsexport2..
   <nfsclient2..
    <ip2..

and be able to move services around cluster nodes without problem. Note
that it is irrelevant of the fs used. It can be clustered or not.

This setup does unfortunately clash with nfsd design.

When shutdown of a service happens (due to stop or relocation is
indifferent):

ip is removed
exportfs -u .....
(and that's where we hit the nfsd design limitation)
umount fs..

By design (tho I can't say exactly why it is done this way without
speculating), nfsd will continue to serve open sessions via rpc.
exportfs -u will only stop new incoming requests.

If nfsd is serving a client, it will continue to hold a lock on the
filesystem (in kernel) that would prevent the fs to be unmounted.

The only way to effectively close the sessions are:

- drop the VIP and wait for connections timeout (nfsd would effectively
  also drop the lock on the fs) but it is slow and not always consistent
  on how long it would take

- restart nfsd.

The "real fix" here would be to wait for nfsd containers that do support
exactly this scenario. Allowing unexport of single fs and lock drops
etc. etc. This work is still in very early stages upstream, that doesn't
make it suitable yet for production.

The patch I am working on, is basically a way to handle the clash in the
best way as possible.

A new nfsrestart="" option will be added to both fs and clusterfs, that,
if the filesystem cannot be unmounted, if force_unmount is set, it will
perform an extremely fast restart of nfslock and nfsd.

We can argue that it is not the final solution, i think we can agree
that it is more of a workaround, but:

1) it will allow service migration instead of service failure
2) it will match cluster users expectations (allowing different exports
and live peacefully together).

The only negative impact that we have been able to evaluate so far (the
patch is still under heavy testing phase), beside having to add a config
option to enable it, is that there will be a small window in which all
clients connect to a certain node for all nfs services, will not be
served because nfsd is restarting.

So if you are migrating export1 and there are clients using export2,
export2 will also be affected for those few ms required to restart nfsd.
(assuming export1 and 2 are running on the same node of course).

Placing things in perspective for a cluster, I think that it is a lot
better to be able to unmount a fs and relocate services as necessary vs
a service failing completely and maybe node being fenced.

> 
> I like the idea of NFSD fully being in user space, so killing it would
> definitely free the fs.
> 
> Alan Brown (who's on this list) recently posted to a RH BZ that he was
> one of the people who moved it into kernel space for performance reasons
> in the past (that are no longer relevant):
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=580863#c9
> 
> , but I doubt this is the fix you have in mind.

No that's a totally different issue.

> 
> Colin
> 
> On Tue, 2012-05-15 at 20:21 +0200, Fabio M. Di Nitto wrote:
>> This solves different issues at startup, relocation and recovery
>>
>> Also note that there is known limitation in nfsd (both rhel5/6) that
>> could cause some problems in some conditions in your current
>> configuration. A permanent fix is being worked on atm.
>>
>> Without extreme details, you might have 2 of those services running on
>> the same node and attempting to relocate one of them can fail because
>> the fs cannot be unmounted. This is due to nfsd holding a lock (at
>> kernel level) to the FS. Changing config to the suggested one, mask the
>> problem pretty well, but more testing for a real fix is in progress.
>>
>> Fabio
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> ________________________________
> 
> 
> This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster