df hangs on down nfs server mounted with hard,intr, can't kill
Wade Hampton
wade.hampton at nsc1.net
Mon Mar 8 19:58:44 UTC 2004
Ron Herardian wrote:
>"On a hard-mounted file system, NFS operations are retried until they are acknowledged by the server. A side effect of hard-mounting NFS file systems is that processes block (or "hang") in a high-priority disk wait state until their NFS RPC calls complete.
>If an NFS server goes down, the clients using its file systems hang if they reference these file systems before the server recovers. Using -intr in conjunction with the -hard mount option allows users to interrupt system calls that are blocked waiting on a crashed server. The system call is interrupted when the process making the call receives a signal, usually sent by the user typing Ctrl-C or using the kill command.
>
Yep, in the man page too. That would imply that the mount commands
listed below
which include "hard,intr" would allow one to send a signal (ctrl-C or
killall or kill -9)
and terminate the process. However, with Fedora and the below listed
kernel,
I could not kill the task.
>On a soft-mounted file system, an NFS RPC call returns a timeout error if it fails the number of times specified by the retrans option. You should not use the -soft option on any file system that is writeable, nor on any file system from which you load executables. NFS only guarantees the consistency of data after a server crash if the NFS file system was hard-mounted by the client."
>
>
This is a very good point.... Thanks.
>[http://www.brandonhutchinson.com/nfs_timeouts.html]
>
>
>
>Wade Hampton wrote:
>
>
>>I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
>>remote solaris server (hence choice of options):
>>
>> rsize=32768,ro,hard,intr,tcp,nfsvers=3
>>
>>When the remote is down or disconnected, a "df" hangs (as expected),
>>but I can't kill it, even as root or with kill -9. The docs for mount
>>indicate that the INTR option should allow for killing apps mounted
>>with HARD.
>>
>>I also coded a test program that calls statvfs(2) and it hangs in the
>>on the statvfs(2) call when run against a down NFS server. It too
>>can't be interrupted or killed.
>>
>>My questions are:
>>
>>1) Is there a safe and reliable means to check for a down NFS server
>> (e.g., is showmount -e <server> safe enough -- it is interruptable
>> hence one could wrap this with a timer and it you timeout, the
>> server would be down)?
>>
>>2) Is the non-interruptable operation (even with INTR option)
>> a bug or feature?
>>
>>3) Is there a simple kernel call, /proc entry, or similar that can
>> be used for this purpose?
>>
>>4) Is there a perl module to accomplish this?
>>
>>This would be very useful for network monitoring, e.g., when the
>>server goes down and stays down for >1 minute, generate an SNMP
>>trap and write to a log file. It would be good if you can't put an SNMP
>>agent on the server, but only on the client. It is also useful for writing
>>a highly reliable client application.
>>
>>As I have no control over the remote system, when it went down,
>>I had to do a hard reboot of my Linux box to stop the hung apps. This
>>is a Windows solution, not a Linux solution
>>
>>Note, I found this when writing some scripts for MRTG to check
>>the disk utilization of partitions. My df's hung so I didn't even get
>>the proper values for my local partitions. After a few days, I had
>>LOTS of hung MRTG apps.
>>
>>Thanks
>>--
>>Wade Hampton
>>
>>--
>>fedora-list mailing list
>>fedora-list at redhat.com
>>To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list
>>
>>
>
>
>
More information about the fedora-list
mailing list