F12 NFS Failures

Todd Denniston Todd.Denniston at tsb.cranrdte.navy.mil
Tue Dec 1 16:00:47 UTC 2009


John Austin wrote, On Tue, 24 Nov 2009 12:21:58 +0000:
> On Mon, 2009-11-23 at 15:00 -0800, Rick Stevens wrote:
>> On 11/21/2009 10:41 AM, John Austin wrote:
>>> On Sat, 2009-11-21 at 11:11 -0700, Greg Woods wrote:
>>>> On Sat, 2009-11-21 at 10:09 +0000, John Austin wrote:
>>>>
>>>>> When copying a large file (2.7GB) from the server to the
>>>>> F12 m/c a complete freeze of the F12 machine occurs.
>>>>
>>>> I haven't seen freezes, but I have seen corruption when trying to copy
>>>> large files (e.g. like a DVD iso image) via NFS. In fact, this happened
>>>> to me when I was trying to install an F12 virtual machine on my F11 box
>>>> (so I could try it out before deciding whether or not to bite the bullet
>>>> and upgrade the host OS). I copied over the DVD iso image, then tried to
>>>> install a VM from it, and it failed the media test. Sure enough, it also
>>>> failed the sha256sum test. Copying the same DVD iso file via scp instead
>>>> worked fine. I do not trust NFS for large files.
>>>>
>>>> --Greg
>>>>
>>>>
>>> Hi Greg
>>>
>>> That's interesting and very worrying - surely it can't/shouldn't happen!
>>>
>>> I have been using NFS for years for all types/sizes of files and
>>> never had a problem until the last couple of months.
>>>
>>> 1.  The Centos/RHEL 5.3/5.4 kernel had a serious bug that has been fixed with the
>>> 	latest kernel update
>>>
>>> 2.  Now this F12 problem
>>>
>>> Surely a very large worldwide community uses NFS ?
>>>
>>> OK the F12 case could be my finger trouble or even a hardware problem
>>>
>>> I will install F12 on a second machine and test again (against the same server)
>> Can you verify that you run into the same issue if you run NFS over TCP
>> as opposed to NFS over UDP (it's an option in the mount command on the
>> client, use either "proto=tcp" or "proto=udp").
>>
>> By default, the system queries the server and selects a protocol based
>> on what's being asked of it.  See the "TRANSPORT METHODS" section of
>> "man nfs".
>> ----------------------------------------------------------------------
>> - Rick Stevens, Systems Engineer                      ricks at nerd.com -
>> - AIM/Skype: therps2        ICQ: 22643734            Yahoo: origrps2 -
>> -                                                                    -
>> -               The Theory of Rapitivity: E=MC Hammer                -
>> -                                  -- Glenn Marcus (via TopFive.com) -
>> ----------------------------------------------------------------------
> 
> 
> Hi Rick
> 
> Many thanks for the reply - you have found a work-around !!
> 
> Just tested my machine with UDP and TCP
> This was using md5sum for about 10GB over the NFS mount
> 
> 1. The default for F12/Centos5.4 appears to be TCP - which freezes
> 2. Forcing UDP gives NO errors for 10GB transfer
> 3. Forcing TCP gives a freeze
> 
> Having briefly read the man pages this is the opposite of what I would
> expect and of what you suggest !!
> 
> There must be a timing problem somewhere - 
> 
> Please see the other thread "Sky2 NIC Problem? - Was F12 NFS Failures"
> for other tests I have carried out
> 
> Regards
> 
> John
> 
> 
> 
> 

what are your other mount options?
having seen the "Sky2 NIC Problem" message, your card/driver may be having issues, but some nfs 
options may help/hurt.

I am assuming that you only have 'hard' and not 'hard,intr' as options to the mount.
And for transferring large files over NFS, I have had experiences that say stay away from 'soft' NFS.

it is interesting that TCP nfs locks the machine and fails to copy the very large file, while UDP 
succeeds in copying the same file with the same device/drver. BTW when you say that UDP gave no 
errors, do you mean that from the user program perspective (cp, and then sha256sum) there were no 
errors, or that from both the user and syslog perspective there were no errors? I am wondering if 
you have found a place where the UDP code deals with a bad packet correctly and the TCP version has 
not seen enough (bad environment) testing. Wouldn't happen to have a serial cable around so you can 
capture where the kernel goes bonkers at would you? (note, never done the serial console myself.)

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter




More information about the fedora-list mailing list