[Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter

Tue Feb 8 20:29:48 UTC 2022

On Tue, Feb 08, 2022 at 12:39:02PM +0100, Laszlo Ersek wrote:
> On 02/04/22 17:44, Eric Blake wrote:
> > While writing tests for an nbdcopy bug, I found myself wanting a way
> > to easily view an entire image as data, but without disabling extents
> > support altogether.  The existing extentlist filter can do this, but
> > requires a secondary file.
> > 
> > Still an RFC because I need testsuite coverage similar to
> > test-nozero.sh (eek - we don't have any direct test of the noextent
> > filter, but only indirect coverage through other tests).  Also, are
> > there any other extentmode=MODE values that might make sense?
> > ---

> > +=item B<extentmode=mask>
> > +
> > +Extent support is not advertised to the client; clients should not
> > +query for extent information, and must assume the entire disk is
> > +allocated.
> > +
> > +This is the default if the C<extentmode> parameter is not specified.
> > +
> > +=item B<extentmode=data>
> > +
> > +(nbdkit E<ge> 1.30)
> > +
> > +Extent support is advertised, but extent requests from the client will
> > +be answered with a claim that the entire disk forms a single allocated
> > +data extent.
> > +
> > +=back
> > +
> > +All other parameters are passed through to and processed by the
> > +underlying plugin in the normal way.
> 
> Is this necessary to spell out? (Looked at a random other filter's
> documentation, and didn't see such a statement, so I think it's the
> default.)
> 
> (The same question applies to "plugin-args" in the synopsys, more or
> less...)

Hmm, we aren't always consistent, but I agree that it can be pruned
without loss.

> > +/* Produce single data extent. */
> > +static int
> > +noextents_extents (nbdkit_next *next,
> > +                   void *handle, uint32_t count, uint64_t offset,
> > +                   uint32_t flags,
> > +                   struct nbdkit_extents *ret_extents,
> > +                   int *err)
> > +{
> > +  assert (extentmode == DATA);
> > +  return nbdkit_add_extent (ret_extents, offset, count, 0);
> >  }
> 
> I don't have the context in which this function may be called, in my
> head, but this implementation seems to reflect precisely the requested
> offset range back at the client, so a question arises:
> 
> - What if the client requests an offset range that (at least partially)
> exceeds the size of the file? In that case, I think we should not report
> that range as existent. For example, the chunked block status query in
> virt-v2v asks for 2GB offset ranges, and it's expected that the returned
> extent list will not exceed the file size.

The code in server/ guarantees that we cannot call into a filter or
plugin with an extents request that would read out of bounds; ie. on
input, offset+count will never exceed what next->get_size() would tell
us anyways.  Conversely, nbdkit_add_extent() already permits us to
pass in redundant information prior to offset (as long as we make
forward progress by eventually using nbdkit_add_extent for at least
one byte at offset before returning), as well as to provide more
information than needed (the upper layer can set a clamp, such as when
FLAG_REQ_ONE is in use by the client, or at the 32-bit boundary
condition, where our additions beyond that clamp are merely ignored).
So we could just as easily write

  return nbdkit_add_extent (ret_extents, 0, INT64_MAX, 0);

which will be trimmed to size as needed before going over the wire
back to the client.  But now that I've written that, I can see that
there is indeed a useful benchmarking distinction between using
offset/count vs. 0/INT_MAX - the former behaves as if the client had
always passed in FLAG_REQ_ONE (we can never progress faster than the
client wants us to go), while the latter gives as much information to
the client as possible (subject to 32-bit limits, until my 64-bit
block status work is revived).  So instead of just one new mode named
'data', maybe we want a larger family of possible modes:

- mask (default; existing behavior of masking extent support so client
  can't query them at all); client must assume disk is data
- data-match (reply that the extent is data, but match the size of the
  client request; behaves as if the client unconditionally uses
  FLAG_REQ_ONE); client sees disk as data
- data-full (reply that the entire remainder of the extent was data,
  although the client's use of FLAG_REQ_ONE clamps it back to the
  query size); client sees disk as data
- req-one (pass the query on through to the plugin, but then
  purposefully trim it back before replying to the client so that it
  behaves as if the client unconditionally uses FLAG_REQ_ONE); client
  can see holes, but cannot read ahead
- plugin (pass the query on through to the plugin, with no
  alterations to the plugin's result - normalizes the impact of having
  the filter in the stack when comparing to other modes); client can
  see holes, and can benefit from reading ahead

We can also bike-shed on better names for those modes (maybe
'disabled' instead of 'mask'; or 'passthrough' instead of 'plugin',
for example).

In fact, as written above, it means the 'noextents' filter could serve
the role of controlling three orthogonal aspects: are extents
advertised, can extents return additional information beyond the
client's requested length (when the client didn't use REQ_ONE), and
can extents report hole/zero information.

Is there ever a reason that we would want to always call into the
plugin, but then mask some (or all) of the returned extent bits to
only allow certain bits to return to the client?  For example, a mode
that lets the plugin report the 'zero' bit but not the 'hole' bit?
Being able to time how long the plugin takes to answer the query (even
if we then throw away the plugin's answer to claim that we have only
data) may be an interesting benchmark, as well, when comparing to the
speed at which we simulate the entire disk as data without reading
into the plugin.

> 
> (I understand that a server is permitted to cover a larger offset range
> than requested in its reply, unless LIBNBD_CMD_FLAG_REQ_ONE is set;
> however, this is different. Without LIBNBD_CMD_FLAG_REQ_ONE, the
> response may well be permitted to exceed the requested range, but it's
> still not permitted, AIUI, to report nonexistent data.)

The nbdkit core takes care of that.  A filter or plugin can report on
nonexistent data, but it will be clamped back so that the client never
sees more than the advertised disk length.

> 
> In short, I think we should call get_size() (?) on the underlying plugin
> (?), and truncate the requested range accordingly.

Already done implicitly by the core.

> 
> > 
> >  static struct nbdkit_filter filter = {
> >    .name              = "noextents",
> >    .longname          = "nbdkit noextents filter",
> > +  .config            = noextents_config,
> > +  .config_help       = noextents_config_help,
> >    .can_extents       = noextents_can_extents,
> > +  .extents           = noextents_extents,
> >  };
> > 
> >  NBDKIT_REGISTER_FILTER(filter)
> > 
> 
> Thanks,
> Laszlo
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org