[Libguestfs] [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.

Tue Aug 11 10:49:31 UTC 2020

On Mon, Aug 10, 2020 at 5:01 PM Richard W.M. Jones <rjones at redhat.com> wrote:
>
> On Sat, Aug 08, 2020 at 02:14:22AM +0300, Nir Soffer wrote:
> > On Fri, Aug 7, 2020 at 5:36 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> > >
> > > On Fri, Aug 07, 2020 at 05:29:24PM +0300, Nir Soffer wrote:
> > > > On Fri, Aug 7, 2020 at 5:07 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> > > > > These ones?
> > > > > https://www.redhat.com/archives/libguestfs/2020-August/msg00078.html
> > > >
> > > > No, we had a bug when copying image from glance caused sanlock timeouts
> > > > because of the unpredictable page cache flushes.
> > > >
> > > > We tried to use fadvice but it did not help. The only way to avoid such issues
> > > > is with O_SYNC or O_DIRECT. O_SYNC is much slower but this is the path
> > > > we took for now in this flow.
> > >
> > > I'm interested in more background about this, because while it is true
> > > that O_DIRECT and POSIX_FADV_DONTNEED are not exactly equivalent, I
> > > think I've shown here that DONTNEED can be used to avoid polluting the
> > > page cache.
> >
> > This fixes the minor issue of polluting the page cache, but it does not help
> > to avoid stale data in the cache, or unconfrolled flushes.
> >
> > The bug I mentioned is:
> > https://bugzilla.redhat.com/1832967
> >
> > This explains the issue:
> > https://bugzilla.redhat.com/1247135#c29
> >
> > And here you can how unrelated I/O is affected by uncontrolled flushes:
> > https://bugzilla.redhat.com/1247135#c30
> > https://bugzilla.redhat.com/1247135#c36
>
> Thanks for the explanation.
>
> Our use of file_flush (ie fdatasync) is less than ideal - we should
> probably use sync_file_range, which is what Linus suggested.  However
> in this case it won't be a problem because we're only flushing the few
> pages we have just written.  We control the file and there is no
> chance that the flush will cause an uncontrollable flood of data.

Yes, this is the same solution we used, but it is too slow. We did it
because we did not have time to implement it using direct I/O.

But you are adding it as the only option in nbdkit, while nbdkit could
also support direct I/O.

> The bigger issue to me:
>
> Sanlock should really be using cgroups or some mechanism to prioritize
> its block traffic over everything else.

This is exactly this bug:
[RFE] Ensure quality of service for sanlock io when using file-based storage
https://bugzilla.redhat.com/1247135

But nobody found a way to do this, and the bug was closed. We repopend
it recently because of the glance download issue, but since we fixed this in
our glance import it was closed again.

>
> Any method to solve this
> where every single other process in the system is required to use
> direct I/O is IMO a ridiculous hack.

It is called RHV for a reason - Ridiculous Hack Virtualization :-)

>  What happens if some other
> process on the same machine happens to block the NFS server by writing
> lots?  Perhaps the admin installs some proprietary backup software
> that we are unable to modify?  Same thing would happen.

oVirt hypervisor is not a playground where you install random stuff,
but yes, this
will be a problem if you start to use some program copying images to shared
storage without direct I/O.

I think this should be solved in sanlock. Currently it runs as a
standard service.
It probably should run in a different way in its own cgroup to ensure
it get enough
cpu time and highest priority I/O. But I'm not sure if the kernel
provides a solution.
Sanlock I/O should have higher priority then the kernel thread flushing the page
cache.

Nir