df hangs on down nfs server mounted with hard,intr, can't kill

Wade Hampton wade.hampton at nsc1.net
Mon Mar 8 19:58:44 UTC 2004


Ron Herardian wrote:

>"On a hard-mounted file system, NFS operations are retried until they are acknowledged by the server. A side effect of hard-mounting NFS file systems is that processes block (or "hang") in a high-priority disk wait state until their NFS RPC calls complete. 
>If an NFS server goes down, the clients using its file systems hang if they reference these file systems before the server recovers. Using -intr in conjunction with the -hard mount option allows users to interrupt system calls that are blocked waiting on a crashed server. The system call is interrupted when the process making the call receives a signal, usually sent by the user typing Ctrl-C or using the kill command.
>
Yep, in the man page too.  That would imply that the mount commands 
listed below
which include "hard,intr" would allow one to send a signal (ctrl-C or 
killall or kill -9)
and terminate the process.  However, with Fedora and the below listed 
kernel,
I could not kill the task.

>On a soft-mounted file system, an NFS RPC call returns a timeout error if it fails the number of times specified by the retrans option. You should not use the -soft option on any file system that is writeable, nor on any file system from which you load executables. NFS only guarantees the consistency of data after a server crash if the NFS file system was hard-mounted by the client."
>  
>
This is a very good point....  Thanks.

>[http://www.brandonhutchinson.com/nfs_timeouts.html]
>
>
>
>Wade Hampton wrote:
>  
>
>>I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
>>remote solaris server (hence choice of options):
>>
>>   rsize=32768,ro,hard,intr,tcp,nfsvers=3
>>
>>When the remote is down or disconnected, a "df" hangs (as expected),
>>but I can't kill it, even as root or with kill -9.  The docs for mount
>>indicate that the INTR option should allow for killing apps mounted
>>with HARD.
>>
>>I also coded a test program that calls statvfs(2) and it hangs in the
>>on the statvfs(2) call when run against a down NFS server.  It too
>>can't be interrupted or killed.
>>
>>My questions are:
>>
>>1)  Is there a safe and reliable means to check for a down NFS server
>>     (e.g., is showmount -e <server> safe enough -- it is interruptable
>>     hence one could wrap this with a timer and it you timeout, the
>>     server would be down)?
>>
>>2)  Is the non-interruptable operation (even with INTR option)
>>     a bug or feature?
>>
>>3)  Is there a simple kernel call, /proc entry, or similar that can
>>    be used for this purpose?
>>
>>4)  Is there a perl module to accomplish this?
>>
>>This would be very useful for network monitoring, e.g., when the
>>server goes down and stays down for >1 minute, generate an SNMP
>>trap and write to a log file.  It would be good if you can't put an SNMP
>>agent on the server, but only on the client.  It is also useful for writing
>>a highly reliable client application.
>>
>>As I have no control over the remote system, when it went down,
>>I had to do a hard reboot of my Linux box to stop the hung apps.  This
>>is a Windows solution, not a Linux solution
>>
>>Note, I found this when writing some scripts for MRTG to check
>>the disk utilization of partitions.  My df's hung so I didn't even get
>>the proper values for my local partitions.  After a few days, I had
>>LOTS of hung MRTG apps.
>>
>>Thanks
>>--
>>Wade Hampton
>>
>>--
>>fedora-list mailing list
>>fedora-list at redhat.com
>>To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list
>>    
>>
>
>  
>





More information about the fedora-list mailing list