[Libguestfs] [PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.

Nir Soffer nsoffer at redhat.com
Fri Aug 7 14:29:24 UTC 2020


On Fri, Aug 7, 2020 at 5:07 PM Richard W.M. Jones <rjones at redhat.com> wrote:
>
> On Fri, Aug 07, 2020 at 04:43:12PM +0300, Nir Soffer wrote:
> > On Fri, Aug 7, 2020, 16:16 Richard W.M. Jones <rjones at redhat.com> wrote:
> > > I'm not sure if or even how we could ever do a robust O_DIRECT
> > >
> >
> > We can let the plugin an filter deal with that. The simplest solution is to
> > drop it on the user and require aligned requests.
>
> I mean this is very error prone.  It requires the end user to know
> about the basically unknowable restrictions of O_DIRECT and isn't even
> possible in one common case - if the size of the file isn't an exact
> multiple of the filesystem block size.

Yes, doing direct I/O is hard, even qemu still has bugs in this area that pop
from time to time.

It is fine to fail open if the size of the imgae is not aligned to
underlying block size.
However finding the underlying block size can of worms :-)

> > Maybe a filter can handle alignment?
> >
> > > implementation, but my idea was that it might be an alternate
> > > implementation of cache=none.  But if we thought we might use O_DIRECT
> > > as a separate mode, then maybe we should rename cache=none.
> > > cache=advise?  cache=dontneed?  I can't think of a good name!
> > >
> >
> > Yes, don't call it none if you use the cache.
> >
> > How about advise=?
> >
> > I would keep cache semantics similar to qemu.
>
> qemu uses cache=none as a synonym for O_DIRECT, but AFAIK it has
> nothing that tries to use posix_fadvise(DONTNEED) with or without
> Linus's double buffering technique.

Yes, this is the right way. posix_fadvise is not a replacement for O_DIRECT.

> qemu does use
> posix_fadvise(DONTNEED) in one place but AFAICT it is only used for
> live migration.
>
> ...
> > We already tried this with dd and the results were not good.
>
> These ones?
> https://www.redhat.com/archives/libguestfs/2020-August/msg00078.html

No, we had a bug when copying image from glance caused sanlock timeouts
because of the unpredictable page cache flushes.

We tried to use fadvice but it did not help. The only way to avoid such issues
is with O_SYNC or O_DIRECT. O_SYNC is much slower but this is the path
we took for now in this flow.

>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-builder quickly builds VMs from scratch
> http://libguestfs.org/virt-builder.1.html
>




More information about the Libguestfs mailing list