[Libguestfs] [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics

Eric Blake eblake at redhat.com
Fri Jan 19 17:05:18 UTC 2018


On 01/19/2018 10:56 AM, Shaun McDowell wrote:

> Limitation: The kernel will (with today's default settings) typically be
> willing to send up to 128 requests of 128kB size to the driver in parallel.
> We wanted to support 128 parallel read operations on different areas of the
> disk without requiring 128 separate threads and connections for the driver.
> Right now in nbdkit that is impossible. The main loop in connection.c will
> pull an nbd request off the socket and block until that read request is
> complete before sending a response and getting the next request, blocking
> other requests on the socket unless running X connections/threads in
> parallel.

What version nbdkit are you using?  We recently added parallel reads in
1.1.17 (although some minor fixes went in later; current version is
1.1.25) that should allow you to have a single socket serving multiple
requests in parallel, in response to your setting of nbdkit's --thread
option, and if your plugin is truly parallel (nbdkit now ships both a
'file' and 'nbd' plugin that are truly parallel).

> Change: We introduced an additional set of functions to the nbdkit_plugin
> struct that supports asynchronous handling of the requests and a few helper
> functions for the plugin to use to respond when it has finished the
> request. This is very similar to the fuse filesystem low level api (async
> supported) vs the high level fuse fs api (sync only). The design goal here
> is that a single connection/thread on nbdkit can support as many requests
> in parallel as the plugin allows. The nbdkit side pulls the request off the
> socket and if the async function pointer is non-null it will wrap the
> request in an op struct and use the async plugin call for read/write/etc
> capturing any buffer allocated and some op details into the op pointer. The
> plugin async_* will start the op and return to nbdkit while the plugin
> works on it in the background. Nbdkit will then go back to the socket and
> begin the next request. Our plugin uses 1 connection/nbdkit thread and 2-4
> threads internally with boost asio over sockets to service the requests to
> cloud. We are able to achieve ~1GB/s (yes bytes) read/write performance to
> aws s3 from an ec2 node with 10 gigabit networking on < 100MB of memory in
> the driver with this approach.

Definitely post patches to the list!  My work to add parallel support
via --threads still spawns multiple threads (the plugin is operating
concurrently on multiple threads) while yours is a different approach of
breaking things into smaller stages that piece together and possible
with fewer threads.

> 
> Here are some of what our function prototypes look like that support an
> asynchronous nbdkit model
> 
>  #define CBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS        2
>  #define CBDKIT_THREAD_MODEL_PARALLEL                  3
>  #define CBDKIT_THREAD_MODEL_ASYNC                     4
> 
>  struct cbdkit_plugin {
>  ...
>   int (*pread) (void *handle, void *buf, uint32_t count, uint64_t offset);
>   int (*pwrite) (void *handle, const void *buf, uint32_t count, uint64_t
> offset);
>   int (*flush) (void *handle);
>   int (*trim) (void *handle, uint32_t count, uint64_t offset);
>   int (*zero) (void *handle, uint32_t count, uint64_t offset, int may_trim);
> 
>   int errno_is_preserved;
> 
>   void (*async_pread) (void *op, void *handle, void *buf, uint32_t count,
> uint64_t offset);
>   void (*async_pwrite) (void *op, void *handle, const void *buf, uint32_t
> count, uint64_t offset, int fua);
>   void (*async_flush) (void *op, void *handle);
>   void (*async_trim) (void *op, void *handle, uint32_t count, uint64_t
> offset, int fua);
>   void (*async_zero) (void *op, void *handle, uint32_t count, uint64_t
> offset, int may_trim, int fua);
>  ...
>  }
> 
> Additionally there are a few helper functions for the plugin to use to
> respond back to nbdkit when the job is eventually finished. The plugin
> contract when using the async functions is that every async func guarantees
> it will call an appropriate async_reply function.
> 
>  /* call for completion of successful async_pwrite, async_flush,
> async_trim, or async_zero */
>  extern CBDKIT_CXX_LANG_C int cbdkit_async_reply (void *op);
>  /* call for complete of successful async_pread */
>  extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_read (void *op);
>  /* call for completion of any async operation with error */
>  extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_error (void *op, uint32_t
> error);
> 
> If there is any interest in supporting async ops in the next api version I
> am able to share the entire modified nbdkit (cbdkit) source that we use
> that supports this async op framework, fua, as well as some buffer pooling.

Yes, please post patches.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20180119/5d178c25/attachment.sig>


More information about the Libguestfs mailing list