[Linux-cachefs] NFS conversion to new netfs and fscache APIs

Fri Dec 4 18:03:29 UTC 2020

David,

Okay, I spent a little more time on this today and I think we can forget
about the re-export thing for a moment.

I looked at what was happening and the issue seemed to be that once I had
multiple clients of the re-export server (which has the iter fscache and
fsc enabled mounts) all reading the same files at the same time (for the
first time), then we often ended up with a missing sequential chunk of data
from the cached file.

The size and apparent size seemed to be the same as the original file on
the server but md5sum and hexdump against the client mounted file showed
otherwise.

So then I tried to replicate this scenario in the simplest way using just a
single (fscache-iter) client with an fsc enabled mountpoint using multiple
processes to read the same uncached file for the first time (no NFS
re-exporting).

* client1 mounts the NFS server without fsc
* client2 mounts the NFS server with fsc (with fscache-iter).

client1 # md5sum /mnt/server/file.1
9ca99335b6f75a300dc22e45a776440c
client2 # cat /mnt/server/file.1
client2 # md5sum /mnt/server/file.1
9ca99335b6f75a300dc22e45a776440c

All good. The files was cached to disk and looks good. Now let's read the
an uncached file using multiple processes simultaneously:

client1 # md5sum /mnt/server/file.2
9ca99335b6f75a300dc22e45a776440c
client2 # for x in {1..10}; do (cat /mnt/server/file.2 > /dev/null &);
done; wait
client2 # md5sum /mnt/server/file.2
26dd67fbf206f734df30fdec72d71429

The file is now different/corrupt. So in my re-export case it's just that
we have multiple knfsd processes reading in the same file simultaneously
for the first time into cache. Then it remains corrupt and serves that out
to multiple NFS clients.

In this case the backing filesystem was ext4 and the nfs client mount
options were fsc,vers=4.2 (vers=3 is the same). The NFS server is running
RHEL7.4.

Daire

On Thu, Dec 3, 2020 at 4:27 PM David Wysochanski <dwysocha at redhat.com>
wrote:

> On Wed, Dec 2, 2020 at 12:01 PM Daire Byrne <daire.byrne at gmail.com> wrote:
> >
> > David,
> >
> > First off, thanks for the work on this - we look forward to this landing.
> >
>
> Yeah no problem - thank you for your interest and testing it!
>
> > I did some very quick tests of just the bandwidth using server class
> networking (40Gbit) and storage (NVMe).
> >
> > Comparing the old fscache with the new one, we saw a minimal degradation
> in reading back from the backing disk. But I am putting this more down the
> the more directio style of access in the new version.
> >
> > This can be seen when the cache is being written as we no longer use the
> writeback cache. I'm assuming something similar happens on reads so that we
> don't use readahead?
> >
>
> Without getting into it too much and just guessing, I'd guess either
> it's the usage of directIO or the limitation of the 1GB in cachefiles,
> but not sure.  We need to drill down of course into it because it
> could be a lot of things.
>
> > Anyway, the quick summary of performance using 10 threads of reads
> follows. I should mention that the NVMe has a physical limit of ~2,500MB/s
> writes & 5,000MB/s reads:
> >
> > iter fscache:
> > uncached first reads ~2,500MB/s (writing to nvme ext4/xfs)
> > cached subsequent reads ~4,200 (reading from nvme ext4)
> > cached subsequent reads ~3,500 (reading from nvme xfs)
> >
> > old fscache:
> > uncached first reads ~2,500MB/s (writing to nvme ext4/xfs)
> > cached subsequent reads ~5,000 (reading from nvme ext4)
> > xfs crashes a lot ...
> >
> > I have not done a thorough analysis of CPU usage or perf top differences
> yet.
> >
> > Then I went on to test our rather unique NFS re-export workload where we
> take this fscache backed server and re-export the fsc mounts to many
> clients. At this point something odd appeared to be happening. The clients
> were loading software from the fscache backed mounts but were often
> segfaulting at various points. This suggested that they were getting
> corrupted data or the memory mapping (binaries, libraries) was failing in
> some way. Perhaps some odd interaction between fscache and knfsd?
> >
> > I did a quick test of re-export without the fsc caching enabled on the
> server mounts (with the same 5.10-rc kernel) and I didn't get any errors.
> That's as far as I got before I got drawn away by other things. I hope to
> dig into it a little more next week. But I just thought I'd give some quick
> feedback of one potential difference I'm seeing compared to the previous
> version.
> >
>
> Hmmm, interesting.  So just to be clear, you ran my patches without
> 'fsc' on the mount and it was fine, but with 'fsc' on the mount there
> were data corruptions in this re-export use case?  I've not done any
> tests with a re-export like that but off the top of my head I'm not
> sure why it would be a problem.  What NFS version(s) are you using?
>
>
> > I also totally accept that this is a very niche workload (and hard to
> reproduce)... I should have more details on it next week.
> >
>
> Ok - thanks again Daire!
>
>
>
> > Daire
> >
> > On Sat, Nov 21, 2020 at 1:50 PM David Wysochanski <dwysocha at redhat.com>
> wrote:
> >>
> >> I just posted patches to linux-nfs but neglected to CC this list.  For
> >> any interested in patches which convert NFS to use the new netfs and
> >> fscache APIs, please see the following series on linux-nfs:
> >> [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs
> >> https://marc.info/?l=linux-nfs&m=160596540022461&w=2
> >>
> >> Thanks.
> >>
> >> --
> >> Linux-cachefs mailing list
> >> Linux-cachefs at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cachefs
> >>
>
>