[Linux-cachefs] NFS conversion to new netfs and fscache APIs

Fri Dec 4 19:09:59 UTC 2020

On Fri, Dec 4, 2020 at 1:03 PM Daire Byrne <daire.byrne at gmail.com> wrote:
>
> David,
>
> Okay, I spent a little more time on this today and I think we can forget about the re-export thing for a moment.
>
> I looked at what was happening and the issue seemed to be that once I had multiple clients of the re-export server (which has the iter fscache and fsc enabled mounts) all reading the same files at the same time (for the first time), then we often ended up with a missing sequential chunk of data from the cached file.
>
> The size and apparent size seemed to be the same as the original file on the server but md5sum and hexdump against the client mounted file showed otherwise.
>
> So then I tried to replicate this scenario in the simplest way using just a single (fscache-iter) client with an fsc enabled mountpoint using multiple processes to read the same uncached file for the first time (no NFS re-exporting).
>
> * client1 mounts the NFS server without fsc
> * client2 mounts the NFS server with fsc (with fscache-iter).
>
> client1 # md5sum /mnt/server/file.1
> 9ca99335b6f75a300dc22e45a776440c
> client2 # cat /mnt/server/file.1
> client2 # md5sum /mnt/server/file.1
> 9ca99335b6f75a300dc22e45a776440c
>
> All good. The files was cached to disk and looks good. Now let's read the an uncached file using multiple processes simultaneously:
>
> client1 # md5sum /mnt/server/file.2
> 9ca99335b6f75a300dc22e45a776440c
> client2 # for x in {1..10}; do (cat /mnt/server/file.2 > /dev/null &); done; wait
> client2 # md5sum /mnt/server/file.2
> 26dd67fbf206f734df30fdec72d71429
>
> The file is now different/corrupt. So in my re-export case it's just that we have multiple knfsd processes reading in the same file simultaneously for the first time into cache. Then it remains corrupt and serves that out to multiple NFS clients.
>

Hmmm, yeah that for sure shouldn't happen!

> In this case the backing filesystem was ext4 and the nfs client mount options were fsc,vers=4.2 (vers=3 is the same). The NFS server is running RHEL7.4.
>

How big is ' /mnt/server/file.2' and what is the NFS server kernel?
Also can you give me the mount options from /proc/mounts on 'client2'?
I'm not able to reproduce this yet but I'll keep trying.

> Daire
>
> On Thu, Dec 3, 2020 at 4:27 PM David Wysochanski <dwysocha at redhat.com> wrote:
>>
>> On Wed, Dec 2, 2020 at 12:01 PM Daire Byrne <daire.byrne at gmail.com> wrote:
>> >
>> > David,
>> >
>> > First off, thanks for the work on this - we look forward to this landing.
>> >
>>
>> Yeah no problem - thank you for your interest and testing it!
>>
>> > I did some very quick tests of just the bandwidth using server class networking (40Gbit) and storage (NVMe).
>> >
>> > Comparing the old fscache with the new one, we saw a minimal degradation in reading back from the backing disk. But I am putting this more down the the more directio style of access in the new version.
>> >
>> > This can be seen when the cache is being written as we no longer use the writeback cache. I'm assuming something similar happens on reads so that we don't use readahead?
>> >
>>
>> Without getting into it too much and just guessing, I'd guess either
>> it's the usage of directIO or the limitation of the 1GB in cachefiles,
>> but not sure.  We need to drill down of course into it because it
>> could be a lot of things.
>>
>> > Anyway, the quick summary of performance using 10 threads of reads follows. I should mention that the NVMe has a physical limit of ~2,500MB/s writes & 5,000MB/s reads:
>> >
>> > iter fscache:
>> > uncached first reads ~2,500MB/s (writing to nvme ext4/xfs)
>> > cached subsequent reads ~4,200 (reading from nvme ext4)
>> > cached subsequent reads ~3,500 (reading from nvme xfs)
>> >
>> > old fscache:
>> > uncached first reads ~2,500MB/s (writing to nvme ext4/xfs)
>> > cached subsequent reads ~5,000 (reading from nvme ext4)
>> > xfs crashes a lot ...
>> >
>> > I have not done a thorough analysis of CPU usage or perf top differences yet.
>> >
>> > Then I went on to test our rather unique NFS re-export workload where we take this fscache backed server and re-export the fsc mounts to many clients. At this point something odd appeared to be happening. The clients were loading software from the fscache backed mounts but were often segfaulting at various points. This suggested that they were getting corrupted data or the memory mapping (binaries, libraries) was failing in some way. Perhaps some odd interaction between fscache and knfsd?
>> >
>> > I did a quick test of re-export without the fsc caching enabled on the server mounts (with the same 5.10-rc kernel) and I didn't get any errors. That's as far as I got before I got drawn away by other things. I hope to dig into it a little more next week. But I just thought I'd give some quick feedback of one potential difference I'm seeing compared to the previous version.
>> >
>>
>> Hmmm, interesting.  So just to be clear, you ran my patches without
>> 'fsc' on the mount and it was fine, but with 'fsc' on the mount there
>> were data corruptions in this re-export use case?  I've not done any
>> tests with a re-export like that but off the top of my head I'm not
>> sure why it would be a problem.  What NFS version(s) are you using?
>>
>>
>> > I also totally accept that this is a very niche workload (and hard to reproduce)... I should have more details on it next week.
>> >
>>
>> Ok - thanks again Daire!
>>
>>
>>
>> > Daire
>> >
>> > On Sat, Nov 21, 2020 at 1:50 PM David Wysochanski <dwysocha at redhat.com> wrote:
>> >>
>> >> I just posted patches to linux-nfs but neglected to CC this list.  For
>> >> any interested in patches which convert NFS to use the new netfs and
>> >> fscache APIs, please see the following series on linux-nfs:
>> >> [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs
>> >> https://marc.info/?l=linux-nfs&m=160596540022461&w=2
>> >>
>> >> Thanks.
>> >>
>> >> --
>> >> Linux-cachefs mailing list
>> >> Linux-cachefs at redhat.com
>> >> https://www.redhat.com/mailman/listinfo/linux-cachefs
>> >>
>>