NFS is going crazy, and taking me with it

Chris St. Pierre stpierre at NebrWesleyan.edu
Fri Jun 9 15:23:49 UTC 2006


Nigel--

Thanks for your reply.  Wild speculation is about all I have right
now. :)

I believe actimeo=0 means "never cache"; also, I added this argument
*after* I first noticed the problem, so actimeo=0 almost certainly
isn't the (whole) problem.  Nonetheless, I may try removing actimeo=0
and leaving forcedirectio and see if that changes things any.

Thanks for the suggestion.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

On Fri, 9 Jun 2006, Nigel Wade wrote:

> Chris St. Pierre wrote:
>> I have a RHEL 4 NFS server that shares out three volumes, all
>> read-only.  One goes to another Linux box, and the other two go to a
>> Solaris 9 machine.  One of the volumes mounted on the Solaris boxes is
>> having bewildering problems.
>> 
>> Every night, two processes run on the server that cause these
>> problems.  The first is:
>> 
>> /sbin/quotacheck -fguma
>> 
>> The second is AIDE, a Tripwire replacement.  When either of these
>> processes runs, semi-random files semi-disappear from the client.  The
>> files are always in the same directories, but different ones disappear
>> on different days.  The symptoms are always the same: running 'ls'
>> will show the files, but running 'ls -lAF' (or anything that requires
>> running stat() on them) fails with "File not found."  Opening them
>> also fails.  To solve this problem, I have to touch the file *on the
>> client*; of course, it gives an error that it can't create the file in
>> question, but after that, everything works.
>> 
>> The only common thread I can think of between quotacheck and AIDE is
>> that both stat a very large number of files on the server.  That said,
>> AIDE is not configured to check any of the volumes that are shared via
>> NFS.  I also wrote a quick Perl script to recurse into a directory and
>> stat all the files in it, but that doesn't break the NFS shares,
>> either.
>> 
>> I initially thought the problems where related to the firewall on my
>> server, so I turned it off.  (There is no firewall on the client.)
>> Based on suggestions from fellow S.A., I tried adding actimeo=0 and
>> forcedirectio to the mount options on the client, but that didn't
>> solve anything.  My users are getting very antsy, to say the least.
>> Does anyone have any ideas?  (Aside from cosmic rays, I mean.)  Here's
>> my /etc/exports on 'huxley', the server:
>> 
>> /webdirs/univ job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
>> /webdirs/students
>> students.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
>> /webdirs/faculty
>> job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
>> 
>> And on 'job', the client, the corresponding lines from /etc/vfstab:
>> 
>> huxley:/webdirs/univ    -       /www_misc       nfs     -       yes
>> huxley:soft,bg,actimeo=0,forcedirectio
>> huxley:/webdirs/faculty -       /web/people     nfs     -       yes soft,bg
>> 
>> It bears repeating that only one of the volumes (/webdirs/univ,
>> mounted on /www_misc) is having problems; the other volume shared
>> between the two servers is just fine.  Other NFS mounts on the client
>> and shares from the server are similarly fine.  In fact, most of the
>> NFS share in question is fine -- it's just two directories that
>> consistently lose files whenever quotacheck or AIDE is run.
>> 
>> Any ideas?  I'm up against a brick wall on this one.  Thanks!
>> 
>> Chris St. Pierre
>> Unix Systems Administrator
>> Nebraska Wesleyan University
>> 
>
> This is just wild speculation on my part...
>
> Could it be that the job you are running is placing such a heavy load on the
> server that NFS requests from the client are timing out? This in turn is being
> cached on the client, causing the resulting "File not found" errors? I notice
> you have actimeo=0, could this be the culprit - does that mean cache forever,
> or never cache? The man page isn't forthcoming on that.
>
> -- 
> Nigel Wade, System Administrator, Space Plasma Physics Group,
>            University of Leicester, Leicester, LE1 7RH, UK
> E-mail :    nmw at ion.le.ac.uk
> Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555
>
> -- 
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>




More information about the redhat-list mailing list