NFS is going crazy, and taking me with it
Chris St. Pierre
stpierre at NebrWesleyan.edu
Fri Jun 9 15:23:49 UTC 2006
Nigel--
Thanks for your reply. Wild speculation is about all I have right
now. :)
I believe actimeo=0 means "never cache"; also, I added this argument
*after* I first noticed the problem, so actimeo=0 almost certainly
isn't the (whole) problem. Nonetheless, I may try removing actimeo=0
and leaving forcedirectio and see if that changes things any.
Thanks for the suggestion.
Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University
On Fri, 9 Jun 2006, Nigel Wade wrote:
> Chris St. Pierre wrote:
>> I have a RHEL 4 NFS server that shares out three volumes, all
>> read-only. One goes to another Linux box, and the other two go to a
>> Solaris 9 machine. One of the volumes mounted on the Solaris boxes is
>> having bewildering problems.
>>
>> Every night, two processes run on the server that cause these
>> problems. The first is:
>>
>> /sbin/quotacheck -fguma
>>
>> The second is AIDE, a Tripwire replacement. When either of these
>> processes runs, semi-random files semi-disappear from the client. The
>> files are always in the same directories, but different ones disappear
>> on different days. The symptoms are always the same: running 'ls'
>> will show the files, but running 'ls -lAF' (or anything that requires
>> running stat() on them) fails with "File not found." Opening them
>> also fails. To solve this problem, I have to touch the file *on the
>> client*; of course, it gives an error that it can't create the file in
>> question, but after that, everything works.
>>
>> The only common thread I can think of between quotacheck and AIDE is
>> that both stat a very large number of files on the server. That said,
>> AIDE is not configured to check any of the volumes that are shared via
>> NFS. I also wrote a quick Perl script to recurse into a directory and
>> stat all the files in it, but that doesn't break the NFS shares,
>> either.
>>
>> I initially thought the problems where related to the firewall on my
>> server, so I turned it off. (There is no firewall on the client.)
>> Based on suggestions from fellow S.A., I tried adding actimeo=0 and
>> forcedirectio to the mount options on the client, but that didn't
>> solve anything. My users are getting very antsy, to say the least.
>> Does anyone have any ideas? (Aside from cosmic rays, I mean.) Here's
>> my /etc/exports on 'huxley', the server:
>>
>> /webdirs/univ job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
>> /webdirs/students
>> students.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
>> /webdirs/faculty
>> job.nebrwesleyan.edu(all_squash,anonuid=1080,anongid=1080,ro)
>>
>> And on 'job', the client, the corresponding lines from /etc/vfstab:
>>
>> huxley:/webdirs/univ - /www_misc nfs - yes
>> huxley:soft,bg,actimeo=0,forcedirectio
>> huxley:/webdirs/faculty - /web/people nfs - yes soft,bg
>>
>> It bears repeating that only one of the volumes (/webdirs/univ,
>> mounted on /www_misc) is having problems; the other volume shared
>> between the two servers is just fine. Other NFS mounts on the client
>> and shares from the server are similarly fine. In fact, most of the
>> NFS share in question is fine -- it's just two directories that
>> consistently lose files whenever quotacheck or AIDE is run.
>>
>> Any ideas? I'm up against a brick wall on this one. Thanks!
>>
>> Chris St. Pierre
>> Unix Systems Administrator
>> Nebraska Wesleyan University
>>
>
> This is just wild speculation on my part...
>
> Could it be that the job you are running is placing such a heavy load on the
> server that NFS requests from the client are timing out? This in turn is being
> cached on the client, causing the resulting "File not found" errors? I notice
> you have actimeo=0, could this be the culprit - does that mean cache forever,
> or never cache? The man page isn't forthcoming on that.
>
> --
> Nigel Wade, System Administrator, Space Plasma Physics Group,
> University of Leicester, Leicester, LE1 7RH, UK
> E-mail : nmw at ion.le.ac.uk
> Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
More information about the redhat-list
mailing list