df hangs on down nfs server mounted with hard,intr, can't kill
Wade Hampton
wade.hampton at nsc1.net
Mon Mar 8 17:01:38 UTC 2004
I have a Fedora server with kernel 2.4.22-1-2163 SMP mounting a
remote solaris server (hence choice of options):
rsize=32768,ro,hard,intr,tcp,nfsvers=3
When the remote is down or disconnected, a "df" hangs (as expected),
but I can't kill it, even as root or with kill -9. The docs for mount
indicate that the INTR option should allow for killing apps mounted
with HARD.
I also coded a test program that calls statvfs(2) and it hangs in the
on the statvfs(2) call when run against a down NFS server. It too
can't be interrupted or killed.
My questions are:
1) Is there a safe and reliable means to check for a down NFS server
(e.g., is showmount -e <server> safe enough -- it is interruptable
hence one could wrap this with a timer and it you timeout, the
server would be down)?
2) Is the non-interruptable operation (even with INTR option)
a bug or feature?
3) Is there a simple kernel call, /proc entry, or similar that can
be used for this purpose?
4) Is there a perl module to accomplish this?
This would be very useful for network monitoring, e.g., when the
server goes down and stays down for >1 minute, generate an SNMP
trap and write to a log file. It would be good if you can't put an SNMP
agent on the server, but only on the client. It is also useful for writing
a highly reliable client application.
As I have no control over the remote system, when it went down,
I had to do a hard reboot of my Linux box to stop the hung apps. This
is a Windows solution, not a Linux solution
Note, I found this when writing some scripts for MRTG to check
the disk utilization of partitions. My df's hung so I didn't even get
the proper values for my local partitions. After a few days, I had
LOTS of hung MRTG apps.
Thanks
--
Wade Hampton
More information about the fedora-list
mailing list