NFS is going crazy, and taking me with it

Chris St. Pierre stpierre at NebrWesleyan.edu
Thu Jun 8 19:31:20 UTC 2006


I have a RHEL 4 NFS server that shares out three volumes, all
read-only.  One goes to another Linux box, and the other two go to a
Solaris 9 machine.  One of the volumes mounted on the Solaris boxes is
having bewildering problems.

Every night, two processes run on the server that cause these
problems.  The first is:

/sbin/quotacheck -fguma

The second is AIDE, a Tripwire replacement.  When either of these
processes runs, semi-random files semi-disappear from the client.  The
files are always in the same directories, but different ones disappear
on different days.  The symptoms are always the same: running 'ls'
will show the files, but running 'ls -lAF' (or anything that requires
running stat() on them) fails with "File not found."  Opening them
also fails.  To solve this problem, I have to touch the file *on the
client*; of course, it gives an error that it can't create the file in
question, but after that, everything works.

The only common thread I can think of between quotacheck and AIDE is
that both stat a very large number of files on the server.  That said,
AIDE is not configured to check any of the volumes that are shared via
NFS.  I also wrote a quick Perl script to recurse into a directory and
stat all the files in it, but that doesn't break the NFS shares,
either.

I initially thought the problems where related to the firewall on my
server, so I turned it off.  (There is no firewall on the client.)
Based on suggestions from fellow S.A., I tried adding actimeo=0 and
forcedirectio to the mount options on the client, but that didn't
solve anything.  My users are getting very antsy, to say the least.
Does anyone have any ideas?  (Aside from cosmic rays, I mean.)  Here's
my /etc/exports on 'huxley', the server:

/webdirs/univ job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
/webdirs/students students.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
/webdirs/faculty job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)

And on 'job', the client, the corresponding lines from /etc/vfstab:

huxley:/webdirs/univ    -       /www_misc       nfs     -       yes soft,bg,actimeo=0,forcedirectio
huxley:/webdirs/faculty -       /web/people     nfs     -       yes soft,bg

It bears repeating that only one of the volumes (/webdirs/univ,
mounted on /www_misc) is having problems; the other volume shared
between the two servers is just fine.  Other NFS mounts on the client
and shares from the server are similarly fine.  In fact, most of the
NFS share in question is fine -- it's just two directories that
consistently lose files whenever quotacheck or AIDE is run.

Any ideas?  I'm up against a brick wall on this one.  Thanks!

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University




More information about the redhat-list mailing list