Solaris NFS clients go wonky when server went FC4->FC5
Will Partain
will.partain at verilab.com
Tue Sep 19 21:26:12 UTC 2006
"Chris Mohler" <cr33dog at gmail.com> writes:
[previous recap is at the end]
> Probably not related, but I see those errors on my NFS clients from
> time to time.
>
> It's usually when:
>
> A - The server is using 90%+ of the CPU
> B - the network traffic is very high.
Chris, thanks for your reply. That's definitely not it -- this is a
quiet-backwater network. Further developments and facts:
* I upgraded the SPARC box (client) to Solaris 10 6/06 (i.e. the
latest), in case the problem was old, crufty code there. No change.
* It is something _I'm_ doing that is triggering the problem; I have a
colleague who has been using a similar box for days without incident.
* I can make the problem happen always and instantly. My test case
happens to involve an NFS partition named /sysadm/.-ark-install-ALL
I can 'ls /sysadm/.-ark-install-ALL' and it mounts and works fine.
If I 'truss' the offending test case, it fails at the syscall...
open64("/our/.-ark-deploy/arkbase/share/ark/arkcmd", O_RDONLY)
I changed the mount from 'intr' to 'soft', so that I would get an
error message other than just "server not responding". (Useful
trick, no?)
In every case, I get...
NFS <op> failed for server foo: error 5 (RPC: Timed out)
... where <op> is usually getattr, but can be something else.
But running all the stuff like '/usr/bin/rpcinfo -t foo nfs' shows
everything a picture of happiness.
[The exact mount opts were:
read/write/nosetuid/nodevices/nodev/timeo=600/retrans=2/proto=tcp/vers=3/soft/xattr]
* Once it goes ga-ga over one mount from the server, it is ga-ga about
other mounts from the same server -- until it rights itself again.
* I _thought_ it might have something to do with running as root; but
no deal -- I can burst it as me, too.
I count this as slight progress :-( Any other ideas?
Will
== recap ============================================================
For a long time (years), have had sparc-solaris8 NFS clients
(well-patched) talking to a RH/Fedora NFS server, recently FC4
(x86_64). The mount options, dished out through autofs, were
(probably sub-optimally):
-rw,nosuid,nodev,sync,retry=5,rsize=16384,wsize=16384,intr
These were lightly-used clients; it worked; everybody happy.
I yum-upgraded the server to FC5 (current kernel, nfs-utils). It
works.. most of the time, but the clients now often-but-not-always
wander off into...
NFS server foo not responding still trying
NFS server foo ok
More information about the fedora-list
mailing list