[Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter

Wed Feb 9 07:19:39 UTC 2022

On 02/08/22 21:29, Eric Blake wrote:
> On Tue, Feb 08, 2022 at 12:39:02PM +0100, Laszlo Ersek wrote:
>> On 02/04/22 17:44, Eric Blake wrote:
>>> While writing tests for an nbdcopy bug, I found myself wanting a way
>>> to easily view an entire image as data, but without disabling extents
>>> support altogether.  The existing extentlist filter can do this, but
>>> requires a secondary file.
>>>
>>> Still an RFC because I need testsuite coverage similar to
>>> test-nozero.sh (eek - we don't have any direct test of the noextent
>>> filter, but only indirect coverage through other tests).  Also, are
>>> there any other extentmode=MODE values that might make sense?
>>> ---
> 
>>> +=item B<extentmode=mask>
>>> +
>>> +Extent support is not advertised to the client; clients should not
>>> +query for extent information, and must assume the entire disk is
>>> +allocated.
>>> +
>>> +This is the default if the C<extentmode> parameter is not specified.
>>> +
>>> +=item B<extentmode=data>
>>> +
>>> +(nbdkit E<ge> 1.30)
>>> +
>>> +Extent support is advertised, but extent requests from the client will
>>> +be answered with a claim that the entire disk forms a single allocated
>>> +data extent.
>>> +
>>> +=back
>>> +
>>> +All other parameters are passed through to and processed by the
>>> +underlying plugin in the normal way.
>>
>> Is this necessary to spell out? (Looked at a random other filter's
>> documentation, and didn't see such a statement, so I think it's the
>> default.)
>>
>> (The same question applies to "plugin-args" in the synopsys, more or
>> less...)
> 
> Hmm, we aren't always consistent, but I agree that it can be pruned
> without loss.
> 
>>> +/* Produce single data extent. */
>>> +static int
>>> +noextents_extents (nbdkit_next *next,
>>> +                   void *handle, uint32_t count, uint64_t offset,
>>> +                   uint32_t flags,
>>> +                   struct nbdkit_extents *ret_extents,
>>> +                   int *err)
>>> +{
>>> +  assert (extentmode == DATA);
>>> +  return nbdkit_add_extent (ret_extents, offset, count, 0);
>>>  }
>>
>> I don't have the context in which this function may be called, in my
>> head, but this implementation seems to reflect precisely the requested
>> offset range back at the client, so a question arises:
>>
>> - What if the client requests an offset range that (at least partially)
>> exceeds the size of the file? In that case, I think we should not report
>> that range as existent. For example, the chunked block status query in
>> virt-v2v asks for 2GB offset ranges, and it's expected that the returned
>> extent list will not exceed the file size.
> 
> The code in server/ guarantees that we cannot call into a filter or
> plugin with an extents request that would read out of bounds; ie. on
> input, offset+count will never exceed what next->get_size() would tell
> us anyways.

Great!

(This is what I sort of expected, but wasn't sure about, hence "I don't
have the context in which this function may be called".)

> Conversely, nbdkit_add_extent() already permits us to
> pass in redundant information prior to offset (as long as we make
> forward progress by eventually using nbdkit_add_extent for at least
> one byte at offset before returning), as well as to provide more
> information than needed (the upper layer can set a clamp, such as when
> FLAG_REQ_ONE is in use by the client, or at the 32-bit boundary
> condition, where our additions beyond that clamp are merely ignored).
> So we could just as easily write
> 
>   return nbdkit_add_extent (ret_extents, 0, INT64_MAX, 0);
> 
> which will be trimmed to size as needed before going over the wire
> back to the client.

Wow, that's very safe API then!

> But now that I've written that, I can see that
> there is indeed a useful benchmarking distinction between using
> offset/count vs. 0/INT_MAX - the former behaves as if the client had
> always passed in FLAG_REQ_ONE (we can never progress faster than the
> client wants us to go), while the latter gives as much information to
> the client as possible (subject to 32-bit limits, until my 64-bit
> block status work is revived).  So instead of just one new mode named
> 'data', maybe we want a larger family of possible modes:
> 
> - mask (default; existing behavior of masking extent support so client
>   can't query them at all); client must assume disk is data
> - data-match (reply that the extent is data, but match the size of the
>   client request; behaves as if the client unconditionally uses
>   FLAG_REQ_ONE); client sees disk as data
> - data-full (reply that the entire remainder of the extent was data,
>   although the client's use of FLAG_REQ_ONE clamps it back to the
>   query size); client sees disk as data
> - req-one (pass the query on through to the plugin, but then
>   purposefully trim it back before replying to the client so that it
>   behaves as if the client unconditionally uses FLAG_REQ_ONE); client
>   can see holes, but cannot read ahead
> - plugin (pass the query on through to the plugin, with no
>   alterations to the plugin's result - normalizes the impact of having
>   the filter in the stack when comparing to other modes); client can
>   see holes, and can benefit from reading ahead

Hmmmm. This sounds quite exhaustive, but it seems we don't have a use
for all of these modes at the moment. If you add all of them, then I
guess you'd want to write test cases for them too. Seems a bit like
overkill. Maybe capture these options as possible extensions for the
filter in comments, or in a TODO file?

> 
> We can also bike-shed on better names for those modes (maybe
> 'disabled' instead of 'mask'; or 'passthrough' instead of 'plugin',
> for example).
> 
> In fact, as written above, it means the 'noextents' filter could serve
> the role of controlling three orthogonal aspects: are extents
> advertised, can extents return additional information beyond the
> client's requested length (when the client didn't use REQ_ONE), and
> can extents report hole/zero information.
> 
> Is there ever a reason that we would want to always call into the
> plugin, but then mask some (or all) of the returned extent bits to
> only allow certain bits to return to the client?  For example, a mode
> that lets the plugin report the 'zero' bit but not the 'hole' bit?
> Being able to time how long the plugin takes to answer the query (even
> if we then throw away the plugin's answer to claim that we have only
> data) may be an interesting benchmark, as well, when comparing to the
> speed at which we simulate the entire disk as data without reading
> into the plugin.

I think we shouldn't generalize / speculate until there's a need for all
this. It's useful to remain extensible, but the more logic we add, the
more we should keep covered by tests as well. I think there's enough
work already that's actually *needed* for various purposes :)

> 
>>
>> (I understand that a server is permitted to cover a larger offset range
>> than requested in its reply, unless LIBNBD_CMD_FLAG_REQ_ONE is set;
>> however, this is different. Without LIBNBD_CMD_FLAG_REQ_ONE, the
>> response may well be permitted to exceed the requested range, but it's
>> still not permitted, AIUI, to report nonexistent data.)
> 
> The nbdkit core takes care of that.  A filter or plugin can report on
> nonexistent data, but it will be clamped back so that the client never
> sees more than the advertised disk length.

OK!

> 
>>
>> In short, I think we should call get_size() (?) on the underlying plugin
>> (?), and truncate the requested range accordingly.
> 
> Already done implicitly by the core.
> 
>>
>>>
>>>  static struct nbdkit_filter filter = {
>>>    .name              = "noextents",
>>>    .longname          = "nbdkit noextents filter",
>>> +  .config            = noextents_config,
>>> +  .config_help       = noextents_config_help,
>>>    .can_extents       = noextents_can_extents,
>>> +  .extents           = noextents_extents,
>>>  };
>>>
>>>  NBDKIT_REGISTER_FILTER(filter)
>>>
>>
>> Thanks,
>> Laszlo
>>
> 

So I think my only comments that remain relevant are possibly the slight
tweaks to the documentation?

Reviewed-by: Laszlo Ersek <lersek at redhat.com>

Thanks!
Laszlo