[Linux-cachefs] NFS conversion to new netfs and fscache APIs

Thu Dec 3 16:26:51 UTC 2020

On Wed, Dec 2, 2020 at 12:01 PM Daire Byrne <daire.byrne at gmail.com> wrote:
>
> David,
>
> First off, thanks for the work on this - we look forward to this landing.
>

Yeah no problem - thank you for your interest and testing it!

> I did some very quick tests of just the bandwidth using server class networking (40Gbit) and storage (NVMe).
>
> Comparing the old fscache with the new one, we saw a minimal degradation in reading back from the backing disk. But I am putting this more down the the more directio style of access in the new version.
>
> This can be seen when the cache is being written as we no longer use the writeback cache. I'm assuming something similar happens on reads so that we don't use readahead?
>

Without getting into it too much and just guessing, I'd guess either
it's the usage of directIO or the limitation of the 1GB in cachefiles,
but not sure.  We need to drill down of course into it because it
could be a lot of things.

> Anyway, the quick summary of performance using 10 threads of reads follows. I should mention that the NVMe has a physical limit of ~2,500MB/s writes & 5,000MB/s reads:
>
> iter fscache:
> uncached first reads ~2,500MB/s (writing to nvme ext4/xfs)
> cached subsequent reads ~4,200 (reading from nvme ext4)
> cached subsequent reads ~3,500 (reading from nvme xfs)
>
> old fscache:
> uncached first reads ~2,500MB/s (writing to nvme ext4/xfs)
> cached subsequent reads ~5,000 (reading from nvme ext4)
> xfs crashes a lot ...
>
> I have not done a thorough analysis of CPU usage or perf top differences yet.
>
> Then I went on to test our rather unique NFS re-export workload where we take this fscache backed server and re-export the fsc mounts to many clients. At this point something odd appeared to be happening. The clients were loading software from the fscache backed mounts but were often segfaulting at various points. This suggested that they were getting corrupted data or the memory mapping (binaries, libraries) was failing in some way. Perhaps some odd interaction between fscache and knfsd?
>
> I did a quick test of re-export without the fsc caching enabled on the server mounts (with the same 5.10-rc kernel) and I didn't get any errors. That's as far as I got before I got drawn away by other things. I hope to dig into it a little more next week. But I just thought I'd give some quick feedback of one potential difference I'm seeing compared to the previous version.
>

Hmmm, interesting.  So just to be clear, you ran my patches without
'fsc' on the mount and it was fine, but with 'fsc' on the mount there
were data corruptions in this re-export use case?  I've not done any
tests with a re-export like that but off the top of my head I'm not
sure why it would be a problem.  What NFS version(s) are you using?

> I also totally accept that this is a very niche workload (and hard to reproduce)... I should have more details on it next week.
>

Ok - thanks again Daire!

> Daire
>
> On Sat, Nov 21, 2020 at 1:50 PM David Wysochanski <dwysocha at redhat.com> wrote:
>>
>> I just posted patches to linux-nfs but neglected to CC this list.  For
>> any interested in patches which convert NFS to use the new netfs and
>> fscache APIs, please see the following series on linux-nfs:
>> [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs
>> https://marc.info/?l=linux-nfs&m=160596540022461&w=2
>>
>> Thanks.
>>
>> --
>> Linux-cachefs mailing list
>> Linux-cachefs at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cachefs
>>