[Libguestfs] [nbdkit PATCH] RFC: noextents: Add extentmode=data config parameter
Laszlo Ersek
lersek at redhat.com
Wed Feb 9 07:19:39 UTC 2022
On 02/08/22 21:29, Eric Blake wrote:
> On Tue, Feb 08, 2022 at 12:39:02PM +0100, Laszlo Ersek wrote:
>> On 02/04/22 17:44, Eric Blake wrote:
>>> While writing tests for an nbdcopy bug, I found myself wanting a way
>>> to easily view an entire image as data, but without disabling extents
>>> support altogether. The existing extentlist filter can do this, but
>>> requires a secondary file.
>>>
>>> Still an RFC because I need testsuite coverage similar to
>>> test-nozero.sh (eek - we don't have any direct test of the noextent
>>> filter, but only indirect coverage through other tests). Also, are
>>> there any other extentmode=MODE values that might make sense?
>>> ---
>
>>> +=item B<extentmode=mask>
>>> +
>>> +Extent support is not advertised to the client; clients should not
>>> +query for extent information, and must assume the entire disk is
>>> +allocated.
>>> +
>>> +This is the default if the C<extentmode> parameter is not specified.
>>> +
>>> +=item B<extentmode=data>
>>> +
>>> +(nbdkit E<ge> 1.30)
>>> +
>>> +Extent support is advertised, but extent requests from the client will
>>> +be answered with a claim that the entire disk forms a single allocated
>>> +data extent.
>>> +
>>> +=back
>>> +
>>> +All other parameters are passed through to and processed by the
>>> +underlying plugin in the normal way.
>>
>> Is this necessary to spell out? (Looked at a random other filter's
>> documentation, and didn't see such a statement, so I think it's the
>> default.)
>>
>> (The same question applies to "plugin-args" in the synopsys, more or
>> less...)
>
> Hmm, we aren't always consistent, but I agree that it can be pruned
> without loss.
>
>>> +/* Produce single data extent. */
>>> +static int
>>> +noextents_extents (nbdkit_next *next,
>>> + void *handle, uint32_t count, uint64_t offset,
>>> + uint32_t flags,
>>> + struct nbdkit_extents *ret_extents,
>>> + int *err)
>>> +{
>>> + assert (extentmode == DATA);
>>> + return nbdkit_add_extent (ret_extents, offset, count, 0);
>>> }
>>
>> I don't have the context in which this function may be called, in my
>> head, but this implementation seems to reflect precisely the requested
>> offset range back at the client, so a question arises:
>>
>> - What if the client requests an offset range that (at least partially)
>> exceeds the size of the file? In that case, I think we should not report
>> that range as existent. For example, the chunked block status query in
>> virt-v2v asks for 2GB offset ranges, and it's expected that the returned
>> extent list will not exceed the file size.
>
> The code in server/ guarantees that we cannot call into a filter or
> plugin with an extents request that would read out of bounds; ie. on
> input, offset+count will never exceed what next->get_size() would tell
> us anyways.
Great!
(This is what I sort of expected, but wasn't sure about, hence "I don't
have the context in which this function may be called".)
> Conversely, nbdkit_add_extent() already permits us to
> pass in redundant information prior to offset (as long as we make
> forward progress by eventually using nbdkit_add_extent for at least
> one byte at offset before returning), as well as to provide more
> information than needed (the upper layer can set a clamp, such as when
> FLAG_REQ_ONE is in use by the client, or at the 32-bit boundary
> condition, where our additions beyond that clamp are merely ignored).
> So we could just as easily write
>
> return nbdkit_add_extent (ret_extents, 0, INT64_MAX, 0);
>
> which will be trimmed to size as needed before going over the wire
> back to the client.
Wow, that's very safe API then!
> But now that I've written that, I can see that
> there is indeed a useful benchmarking distinction between using
> offset/count vs. 0/INT_MAX - the former behaves as if the client had
> always passed in FLAG_REQ_ONE (we can never progress faster than the
> client wants us to go), while the latter gives as much information to
> the client as possible (subject to 32-bit limits, until my 64-bit
> block status work is revived). So instead of just one new mode named
> 'data', maybe we want a larger family of possible modes:
>
> - mask (default; existing behavior of masking extent support so client
> can't query them at all); client must assume disk is data
> - data-match (reply that the extent is data, but match the size of the
> client request; behaves as if the client unconditionally uses
> FLAG_REQ_ONE); client sees disk as data
> - data-full (reply that the entire remainder of the extent was data,
> although the client's use of FLAG_REQ_ONE clamps it back to the
> query size); client sees disk as data
> - req-one (pass the query on through to the plugin, but then
> purposefully trim it back before replying to the client so that it
> behaves as if the client unconditionally uses FLAG_REQ_ONE); client
> can see holes, but cannot read ahead
> - plugin (pass the query on through to the plugin, with no
> alterations to the plugin's result - normalizes the impact of having
> the filter in the stack when comparing to other modes); client can
> see holes, and can benefit from reading ahead
Hmmmm. This sounds quite exhaustive, but it seems we don't have a use
for all of these modes at the moment. If you add all of them, then I
guess you'd want to write test cases for them too. Seems a bit like
overkill. Maybe capture these options as possible extensions for the
filter in comments, or in a TODO file?
>
> We can also bike-shed on better names for those modes (maybe
> 'disabled' instead of 'mask'; or 'passthrough' instead of 'plugin',
> for example).
>
> In fact, as written above, it means the 'noextents' filter could serve
> the role of controlling three orthogonal aspects: are extents
> advertised, can extents return additional information beyond the
> client's requested length (when the client didn't use REQ_ONE), and
> can extents report hole/zero information.
>
> Is there ever a reason that we would want to always call into the
> plugin, but then mask some (or all) of the returned extent bits to
> only allow certain bits to return to the client? For example, a mode
> that lets the plugin report the 'zero' bit but not the 'hole' bit?
> Being able to time how long the plugin takes to answer the query (even
> if we then throw away the plugin's answer to claim that we have only
> data) may be an interesting benchmark, as well, when comparing to the
> speed at which we simulate the entire disk as data without reading
> into the plugin.
I think we shouldn't generalize / speculate until there's a need for all
this. It's useful to remain extensible, but the more logic we add, the
more we should keep covered by tests as well. I think there's enough
work already that's actually *needed* for various purposes :)
>
>>
>> (I understand that a server is permitted to cover a larger offset range
>> than requested in its reply, unless LIBNBD_CMD_FLAG_REQ_ONE is set;
>> however, this is different. Without LIBNBD_CMD_FLAG_REQ_ONE, the
>> response may well be permitted to exceed the requested range, but it's
>> still not permitted, AIUI, to report nonexistent data.)
>
> The nbdkit core takes care of that. A filter or plugin can report on
> nonexistent data, but it will be clamped back so that the client never
> sees more than the advertised disk length.
OK!
>
>>
>> In short, I think we should call get_size() (?) on the underlying plugin
>> (?), and truncate the requested range accordingly.
>
> Already done implicitly by the core.
>
>>
>>>
>>> static struct nbdkit_filter filter = {
>>> .name = "noextents",
>>> .longname = "nbdkit noextents filter",
>>> + .config = noextents_config,
>>> + .config_help = noextents_config_help,
>>> .can_extents = noextents_can_extents,
>>> + .extents = noextents_extents,
>>> };
>>>
>>> NBDKIT_REGISTER_FILTER(filter)
>>>
>>
>> Thanks,
>> Laszlo
>>
>
So I think my only comments that remain relevant are possibly the slight
tweaks to the documentation?
Reviewed-by: Laszlo Ersek <lersek at redhat.com>
Thanks!
Laszlo
More information about the Libguestfs
mailing list