[Linux-cluster] GFS2 + NFS crash BUG: Unable to handle kernel NULL pointer deference

Alan Brown ajb2 at mssl.ucl.ac.uk
Fri Jul 8 17:36:53 UTC 2011


On Fri, 8 Jul 2011, Colin Simpson wrote:

> That's not ideal either when Samba isn't too happy working over NFS, and
> that is not recommended by the Samba people as being a sensible config.

I know but there's a real (and demonstrable) risk of data corruption for
NFS vs _anything_ if NFS clients and local processes (or clients of other
services such as a samba server) happen to grab the same file for writing
at the same time.

Apart from that, the 1 second granularity of NFS timestamps can (and has)
result in writes made by non-nfs processes to cause NFS clients which have
that file opened read/write to see "stale filehandle" errors due to the
inode having changed when they weren't expecting it.

We (should) all know NFS was a kludge. What's surprising is how much
kludge stll remains in the current v2/3 code (which is surprisingly opaque
and incredibly crufty, much of it dates from the early 1990s or earlier)

As I said earlier, V4 is supposed to play a lot nicer but I haven't tested
it - as as far as I know it's not suported on GFS systems anyway (That was
the RH official line when I tried to get it working last time..)

I'd love to get v4 running properly in active/active/active setup from
multiple GFS-mounted fileservers to the clients. If anyone knows how to
reliably do it on EL5.6 systems then I'm open to trying again as I believe
that this would solve a number of issues being seen locally (including
various crash bugs).

On the other hand, v2/3 aren't going away anytime soon and some effort
really needs to be put into making them work properly.

On the gripping hand, I'd also like to see viable alternatives to NFS when
it comes to feeding 100+ desktop clients

Making them mount the filesystems using GFS might sound like an
alternative until you consider what happens if any of them crash/reboot
during the day. Batch processes can wait all day, but users with frozen
desktops get irate - quickly.






More information about the Linux-cluster mailing list