Why is my load ave so high now? [Now I know why!]

Rick Stevens ricks at nerd.com
Tue Jul 28 18:00:58 UTC 2009


Kevin J. Cummings wrote:
> On 07/27/2009 02:26 PM, Rick Stevens wrote:
>> You see a bunch of NFS-related things in a "D" state and you wonder why
>> it's slow?
> 
> Yes.  Mostly because the machine accessing the NFS mounts has been
> re-booted a couple of times.
> 
>> If you have processes in an I/O wait (a.k.a. "D") state, that'll bog
>> stuff down badly...especially if the NFS mounts are mounted "hard".
> 
> Well, tonight I rebooted the server with NFS turned off.  When it
> booted, I saw a load average between 1 and 2.  That's all.  When it
> re-booted, ivtv started back up, despite my blacklisting it and removing
> it from modprobe.conf.  However, ivtvfb did not get installed.
> I also noticed that BOINC started right up again.  With astropulse
> grabbing all the idle cpu time, my load average was still between 1 and 2.
> 
> So, I decided that NFS was my problem, but I'm still not sure why.
> 
> So, I tried a couple of things.  My laptop references a few directories
> on my server via NFS and autofs.
> 
> So, I started nfs again on the server (service nfs start)
> 
> Load average remains between 1 and 2.  So far so good.
> 
>>From the laptop, I did a "cd /net/kjc386".  I can then do an ls and see
> all of the exported filesystems.  Continues to look good.
> 
> "ls home" lists the directories in the server's exported /home dir.
> nfs does the work, and disappears from the top -i that I have running.
> Great.
> 
> Next I do a "ls c:" to look at the old WINDOWS partition on my server.
> HANG!  I can't interrupt the ls with ^C nor ^Z.  I have to kill it from
> another process.  When I do, the hung nfs processes on the server stay
> hung.  After it collects all 8 allowed nfs processes, nothing more nfs
> works to the server, and the load average climbs roughly 1 per nfs
> process (I watched the load average increase with each new nfs process
> that appeared).
> 
> So, I guess my question is what's broken with NFS between my F11 laptop
> and the F10 server????

I could see where "ls c:" might be interpreted by the system as trying
to find an NFS machine called "c".  An NFS mount command is:

     mount -t nfs server:/sharename /mountpoint

Perhaps F11 is trying to invoke an automount of an NFS share from server
"c" to satisfy your "ls" command.  That'd be wild!

I haven't tried this.  perhaps you've found a very subtle bug in F11's
NFS client implementation.  Could you run a wireshark or tcpdump and
watch for NFS traffic when you do that "ls c:" command?  If you do,
then I'd file a bugzilla PDQ (pretty damned quick).

----------------------------------------------------------------------
- Rick Stevens, Systems Engineer                      ricks at nerd.com -
- AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
-                                                                    -
- "People tell me I look at the dark side.  That's not true.  I have -
-   the heart of a small boy......in a jar right here on my desk."   -
-                                                    -- Stephen King -
----------------------------------------------------------------------




More information about the fedora-list mailing list