[Linux-cluster] NFS on GFS architectural issues / problems
Tom Mornini
tmornini at engineyard.com
Mon Aug 21 15:44:57 UTC 2006
I get an error page that no document with ID 99 exists.
I'm about to setup a cluster that uses NFS with GFS, so I'd love to
read that document.
On Aug 21, 2006, at 12:42 AM, Riaan van Niekerk wrote:
> hi Bob and others
>
> I found on the Red Hat 108 Developer Portal the following GFS1/GFS2
> design document which details amongst others, some of the issues
> with NFS on GFS:
> https://rpeterso.108.redhat.com/servlets/ProjectDocumentView?
> documentID=99
>
> (I see it was sent to this list over a year ago, but I never found
> it while searching through the archives. it has a lot of good
> information in it)
>
> It has a disclaimer: Some of the comments
> are no longer applicable due to design changes
>
> My question to you or anyone who is familiar with NFS on GFS, or
> GFS in general, which of the following are still valid issues for
> the current (6.1u4) version of GFS. If all or most of them still
> apply, I can use this as motivation for my customer to strongly
> consider going off NFS on GFS. Removing the NFS from our GFS
> cluster has been on the cards for quite a while, but has not gained
> momentum due to lack of information on the performance gains of
> such a move (very difficult to gage) or the architectural problems/
> limitations of NFS on GFS (for which the following extract is spot-
> on).
>
> Note - can you consider adding a link to this document from your FAQ?
>
> +++++++++
>
> o NFS Support
>
> A GFS filesystem can be exported through NFS to other
> nodes. There
> are a number of issues with NFS on top of a cluster
> filesystem,
> though.
>
> 1) Filehandle misses
>
> When a NFS request comes into the server, it's up to
> the filesystem
> (and a few Linux helper routines) to map the NFS
> filehandle to the
> correct inode. Doing that is easy if the inode is
> already in the
> node's cache. The tricky part is when the
> filesystem must read in
> the inode from the disk. There is nothing in the
> filehandle that
> anchors the inode into the filesystem (such as a
> glock on a
> directory that contains an entry pointing to the
> inode), so a lot
> more care has to taken to make sure the block really
> contains a
> valid inode. (See the section on the proposed new
> RG formats.)
>
> It's also non-trivial to handle inode migration in
> GFS when a NFS
> server is running. There is no centralized data
> structure that can
> map filehandles into inodes (such a structure would be a
> scalability/performance bottleneck). It's difficult
> to find a
> representation of the inode that could be used to
> quickly find it
> even in the face of the inode changing blocks.
>
> Another problem is that filehandle requests can come
> in random
> times for inodes that don't exist anymore or are in
> the process of
> being recreated. This can break optimizations based
> on ideas like
> "since this node in the process of creating this
> inode, it are
> the only one who knows about its locks". GFS has
> suffered from
> these mis-optimizations in the past. From what I've
> seen, I believe
> OCFS2 currently has problems like this, too.
>
> 2) Readdir
>
> Linux has an interesting mechanism to do handle
> readdir() requests.
> The VFS (or NFSD) passes the filesystem a request
> containing not
> only the directory and offset to be read, but a
> filldir function to
> call for each entry found. So, the filesystem
> doesn't directly
> fill in a buffer of entries, but calls an arbitrary
> routine that
> can either put the entries into a buffer or do some
> other type of
> processing on them. This is a powerful concept, but
> can be easily
> misused.
>
> I believe that NFSD's use of it is problematic at
> best. The
> filldir routine used by NFSD for the readdirplus NFS
> procedure
> calls back into the filesystem to do a lookup and
> stat() on the
> inode pointed to by the entry. This call is painful
> because of
> GFS' locking. gfs_readdir() must call filldir with
> the directory
> lock held so that it doesn't lose its place in the
> directory. The
> stat() that the filldir routine does causes the
> inode's lock to be
> acquired. Because concurrent inode locks must
> always be acquired
> in ascending numerical order and the filldir routine
> forces an
> ordering that might be something other than that,
> there is a
> deadlock potential.
>
> GFS detects when NFSD calls its readdir and switches
> to a routine
> that avoids calling the filldir routine with the
> lock held. It's
> not as efficient, but it avoids the deadlock. It'd
> be nice if
> there was a better way to do the detection, though.
> (The code
> currently looks at the program's name.)
>
> 3) FCNTL locking
>
> There are a huge number of issues with acquiring and
> failing over
> fcntl()-style locks when there are multiple GFS
> heads exporting
> NFS. GFS pretty much ignores them right now. A
> good place to
> start would be to change NFSD so it actually passes
> fcntl calls
> down into the filesystem.
>
> 4) NFSv4
>
> NFSv4 requires all sorts of changes to GFS in order
> for them to
> work together. Op locks being one I can remember at
> the moment.
> I think I've repressed my memories of the others.
>
> ++++++++
> <riaan.vcf>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list