[Linux-cluster] GFS2 + NFS crash BUG: Unable to handle kernel NULL pointer deference

Mon Jul 11 10:43:58 UTC 2011

Hi,

On Mon, 2011-07-11 at 09:30 +0100, Alan Brown wrote:
> On 08/07/11 22:09, J. Bruce Fields wrote:
> 
> > With default mount options, the linux NFS client (like most NFS clients)
> > assumes that a file has a most one writer at a time.  (Applications that
> > need to do write-sharing over NFS need to use file locking.)
> 
> The problem is that file locking on V3 isn't passed back down to the 
> filesystem - hence the issues with nfs vs samba (or local disk 
> access(*)) on the same server.
> 
> (*) Local disk access includes anything running on other nodes in a 
> GFS/GFS2 environment. This precludes exporting the same GFS(2) 
> filesystem on multiple cluster nodes.
> 
Well the locks are kind of passed down, but there is not enough info to
make it work correctly, hence we require the localflocks mount option to
prevent this information from being passed down at all.

> 
> > The NFS protocol supports higher granularity timestamps.  The limitation
> > is the exported filesystem.  If you're using something other than
> > ext2/3, you're probably getting higher granularity.
> 
> GFS/GFS2 in this case...
> 
GFS supports second resolution time stamps
GFS2 supports nanosecond resolution time stamps

> >> can (and has)
> >> result in writes made by non-nfs processes to cause NFS clients which have
> >> that file opened read/write to see "stale filehandle" errors due to the
> >> inode having changed when they weren't expecting it.
> >
> > Changing file data or attributes won't result in stale filehandle
> > errors.  (Bug reports welcome if you've seen otherwise.)
> 
> I'll have to try and repeat the issue, but it's a race condition with a 
> narrow window at the best of times.
> 
GFS2 doesn't do anything odd with filehandles, they shouldn't be coming
up as stale unless the inode has been removed.

> > Stale
> > filehandle errors should only happen when a client attempts to use a
> > file which no longer exists on the server.  (E.g. if another client
> > deletes a file while your client has it open.)
> 
> It's possible this has happened. I have no idea what user batch scripts 
> are trying to do on the compute nodes, but in the case that was brought 
> to my attention the file was edited on one node while another had it open.
> 
That probably means the editor made a copy of it and then moved it back
over the top of the original file, thus unlinking the original file.

> >  (This can also happen if
> > you rename a file across directories on a filesystem exported with the
> > subtree_check option.  The subtree_check option is deprecated, for that
> > reason.)
> 
> All our FSes are exported no_subtree_check and at the root of the FS.
> 
> >> We (should) all know NFS was a kludge. What's surprising is how much
> >> kludge stll remains in the current v2/3 code (which is surprisingly opaque
> >> and incredibly crufty, much of it dates from the early 1990s or earlier)
> >
> > Details welcome.
> 
> The non-parallelisation in exportfs (leading to race conditions) for 
> starters. We had to insert flock statements in every call to it in 
> /usr/share/cluster/nfsclient.sh in order to have reliable service startups
> 
> There are a number of RH Bugzilla tickets revolving around NFS behaviour 
> which would be worth looking at.
> 
> >> As I said earlier, V4 is supposed to play a lot nicer
> >
> > V4 has a number of improvements, but what I've described above applies
> > across versions (module some technical details about timestamps vs.
> > change attributes).
> 
> Thanks for the input.
> 
> NFS has been a major pain point in our organisation for years. If you 
> have ideas for doing things better then I'm very interested.
> 
> Alan
> 
NFS and GFS2 is an area in which we are trying to gradually increase the
supportable use cases. It is also a rather complex area, so it will take
some time to do this,

Steve.