[Libguestfs] RFC: nbdkit block size advertisement

Thu Jul 23 22:00:53 UTC 2020

On Thu, Jul 23, 2020 at 10:54:31AM -0500, Eric Blake wrote:
> I'm thinking of adding one or more callbacks to nbdkit to let
> plugins/filters enforce various block size alignments (for example,
> the swab filter requires 2/4/8 alignment, or VDDK requires 512
> alignment, etc).  The NBD protocol currently has NBD_INFO_BLOCK_SIZE
> which can be sent in reply to NBD_OPT_GO to tell the client about
> sizing constraints; qemu already implements it as both client and
> server, so we have some reasonable testing setups (although libnbd
> will also need some additions to make it easier to expose
> constraints to the user, and/or add new convenience APIs to do
> blocksize-style read-modify-write at the libnbd client side rather
> than needing the blocksize filter in the NBD server side).
> 
> But NBD_INFO_BLOCK_SIZE is not the full picture: it only covers
> minimum block size, preferred block size, and maximum data size;
> there has been discussion on the NBD list about also advertising
> maximum action size (discussion has mentioned trim and/or zero, but
> even cache could benefit from larger buffer size than pread), which
> means we should be thinking about supporting future protocol
> extensions in whatever we expose to plugins.
> 
> So, I'm thinking something like the following:
> 
> New enum:
> NBDKIT_BLOCK_SIZE_GET_MINIMUM
> NBDKIT_BLOCK_SIZE_GET_PREFERRED
> NBDKIT_BLOCK_SIZE_GET_MAX_DATA
> NBDKIT_BLOCK_SIZE_GET_MAX_TRIM
> NBDKIT_BLOCK_SIZE_GET_MAX_ZERO
> NBDKIT_BLOCK_SIZE_GET_MAX_CACHE

enum or int?  I think there are ABI problems with enums, although
probably not unless we have more than 256 cases?

> along with a new callback for plugins:
> 
> int64_t block_size (void *handle, int which);
> 
> where 'which' is one of the enum values.  A future nbdkit might
> request an enum value not recognized at the time the plugin was
> compiled, so the recommended behavior is that a plugin returns -1 on
> error, 0 to let nbdkit pick a sane default (including when 'which'
> was unknown), or a positive value for an actual limit.  For now,
> nbdkit would advertise MIN, PREFERRED, and MAX_DATA limits (to
> clients that use NBD_OPT_GO), while the others are enforced only
> internally.  The idea is that if the plugin installs a limit, a
> client request that violates that limit will fail with EINVAL for
> being unaligned, without even asking the plugin to try the response.

Isn't the plan that the server would try to resolve the problem -
eg. by making a RMW request or splitting a request?  This would be
especially the case for internal requests, but I could also see it
having value for clients which either ignore the block size
information, or don't implement it correctly/completely.

> nbdkit calls the plugin callback once per connection per reasonable
> 'which' (caching the results like it does for .get_size, and
> skipping limits where .can_FOO fails). Returning int64_t allows us
> to reuse this callback without change when we add v3 callbacks,
> although values larger than 4G make no difference at present.  I
> thought the use of an enum was nicer than filling in a struct whose
> size might change, or adding one callback per limit.

Yes the enum/int seems like a better idea than dealing with structs,
and is also easier to map into other programming languages.  The
overhead of a few extra indirect function calls is negligible because
it's only once per connection.

The only other alternative I can see would be to put these into struct
nbdkit_plugin (as int64_t fields), but we have always regretted using
plain fields instead of functions in this struct, eg. thread_model,
config_help, etc.

> Filters can relax limits (such as blocksize turning a plugin MIN 512
> into an advertised MIN 1, by doing read-modify-write as needed) or
> tighten limits (the file plugin has MIN 1, but the swab filter
> imposes a tighter MIN 2).  Constraints between limits are as
> follows:

Can the server do this transparently between layers?  Might be a lot
easier to implement it once.

> MIN: must be power of 2, between 1 and 64k; .get_size and .extents
> must return results aligned to MIN (as any unaligned tail or extent
> transition is inaccessible using only aligned operations).  Defaults
> to 1.
> 
> PREFERRED: must be power of 2, between max(512, MIN) and 2M (this
> upper limit is not specified by NBD spec, but matches qemu's
> implementation of what it uses as the max qcow2 cluster size).
> Defaults to max(4k, MIN).
> 
> MAX_DATA: must be multiple of MIN and at least as large as
> PREFERRED; values larger than 64M are okay but clamped by nbdkit's
> own internal limits. Defaults to 64M.
> 
> MAX_TRIM, MAX_ZERO, MAX_CACHE: must be multiple of MIN, should be at
> least as big as MAX_DATA. Values 4G and larger are clamped by
> nbdkit's own internal limits. Defaults to 4G-MIN for TRIM/ZERO, and
> MAX_DATA for CACHE.

Sounds reasonable otherwise.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/