Re: Solaris NFS clients go wonky when server went FC4->FC5

"Chris Mohler" <cr33dog gmail com> writes:

[previous recap is at the end]

> Probably not related, but I see those errors on my NFS clients from
> time to time.
> It's usually when:
> A - The server is using 90%+ of the CPU
> B - the network traffic is very high.

Chris, thanks for your reply.  That's definitely not it -- this is a
quiet-backwater network.  Further developments and facts:

* I upgraded the SPARC box (client) to Solaris 10 6/06 (i.e. the
  latest), in case the problem was old, crufty code there.  No change.

* It is something _I'm_ doing that is triggering the problem; I have a
  colleague who has been using a similar box for days without incident.

* I can make the problem happen always and instantly.  My test case
  happens to involve an NFS partition named /sysadm/.-ark-install-ALL

  I can 'ls /sysadm/.-ark-install-ALL' and it mounts and works fine.

  If I 'truss' the offending test case, it fails at the syscall...

   open64("/our/.-ark-deploy/arkbase/share/ark/arkcmd", O_RDONLY)

  I changed the mount from 'intr' to 'soft', so that I would get an
  error message other than just "server not responding".  (Useful
  trick, no?)

  In every case, I get...

   NFS <op> failed for server foo: error 5 (RPC: Timed out)

  ... where <op> is usually getattr, but can be something else.

  But running all the stuff like '/usr/bin/rpcinfo -t foo nfs' shows
  everything a picture of happiness.

  [The exact mount opts were:

* Once it goes ga-ga over one mount from the server, it is ga-ga about
  other mounts from the same server -- until it rights itself again.

* I _thought_ it might have something to do with running as root; but
  no deal -- I can burst it as me, too.

I count this as slight progress :-(  Any other ideas?


== recap ============================================================

For a long time (years), have had sparc-solaris8 NFS clients
(well-patched) talking to a RH/Fedora NFS server, recently FC4
(x86_64).  The mount options, dished out through autofs, were
(probably sub-optimally):


These were lightly-used clients; it worked; everybody happy.

I yum-upgraded the server to FC5 (current kernel, nfs-utils).  It
works.. most of the time, but the clients now often-but-not-always
wander off into...

  NFS server foo not responding still trying
  NFS server foo ok

