[NFS] nfs write problems

Bob Kryger bobk at panix.com
Wed Oct 3 10:21:51 UTC 2007


Trond Myklebust wrote:
> On Tue, 2007-10-02 at 15:56 -0400, Bob Kryger wrote:
>   
>> So, I have a relatively new system on which I am seeing strange NFS 
>> behavior.
>>
>> In short I am getting seemingly random errors in files written via NFS.
>>     
[snip details]
>> Anyone ever seen anything like this before?
>> Suggest where I might look next?
>> Additional tests?
>>     
>
> Feel free to describe your test in a bit more detail. Without more
> information, we obviously can't rule out the existence of an NFS bug,
>   
I was trying to be thorough, I hope I succeeded.

Is there anything else that might be helpful? I certainly would not go 
to a bug first, as I may very well have something misconfigured, but I 
cannot seem to identify what that might be. I do have about 8 other 
linux NFS servers in production on different hardware, SATA mostly, 
where I am not seeing any issues. I don't think it's a hardware issue 
though, as I cannot reproduce the problem without the use of NFS. (Hmm, 
maybe if I NFS mount to the server itself. Would that prove anything?)
> however usually whenever people describe this sort of problem it is
> because they have failed to understand the NFS caching model as
> described in
>
>     http://nfs.sourceforge.net/#faq_a8
>   
Excellent, Thanks for the lead and I will test these items shortly.

After reading the FAQ, I'm not sure I see how the cache consistency 
mechanisms apply to this problem. If I test the files after they are 
closed shouldn't the data be consistent, written completely to the 
server? If there were a data write error should I not see it somewhere? 
If so where? client? server? would it be up to the client program to 
catch it? I wonder if dd would see it. For the purpose of testing, I 
have limited this server to serving to only a single client at a time, 
so there will be no other variables/systems interfering.

So to test this I read back the data of a newly written, 256M file, 
right from the client that wrote it. In this case with nocto option. 
This should take the client cache into account. I compared the results 
from the server side as well. It had errors, the same errors in the same 
locations on both the client and the server. So, this seems to indicate 
that it is the issue is on the nfs client not the server. (hmmm) But the 
same client does not have a problem with any other server. At least one 
has never been reported. I'll verify that rigorously.

I am not familiar with the mechanism that NFS uses to verify data 
validity between the client and the server. I assume that there is some 
sort of checksum. Did I mention that this is NFSv3? At least I have not 
specified v4.
> So please include a reproducible test for us.
>   
Easily reproducible on this system. Short of providing access to this 
system, not sure what more to do. Oh, wait, was that humor? Indicating 
that I have provided significant detail? Dang, I've got to sharpen my 
international tongue-in-cheek detector.
> Cheers
>   Trond
>   
Cool name

thanks
Bob
>
>   




More information about the fedora-list mailing list