[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

Richard W.M. Jones rjones at redhat.com
Mon Jun 13 17:43:04 UTC 2022


On Mon, Jun 13, 2022 at 01:25:39PM -0400, Josef Bacik wrote:
> On Mon, Jun 13, 2022 at 6:24 AM Richard W.M. Jones <rjones at redhat.com> wrote:
> >
> > On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath wrote:
> > > Hello,
> > >
> > > I am trying to improve performance of the scenario where the kernel's
> > > NBD client talks to NBDKit's S3 plugin.
> > >
> > > For me, the main bottleneck is currently due to the fact that the kernel
> > > aligns requests to only 512 B, no matter the blocksize reported by
> > > nbdkit.
> > >
> > > Using a 512 B object size is not feasible (due to latency and request
> > > overhead). However, with a larger object size there are two conflicting
> > > objectives:
> > >
> > > 1. To maximize parallelism (which is important to reduce the effects of
> > > connection latency), it's best to limit the size of the kernel's NBD
> > > requests to the object size.
> > >
> > > 2. To minimize un-aligned writes, it's best to allow arbitrarily large
> > > NBD requests, because the larger the requests the larger the amount of
> > > full blocks that are written. Unfortunately this means that all objects
> > > touched by the request are written sequentially.
> > >
> > > I see a number of ways to address that:
> > >
> > > 1. Change the kernel's NBD code to honor the blocksize reported by the
> > >    NBD server. This would be ideal, but I don't feel up to making this
> > >    happen. Theoretical solution only.
> >
> > This would be the ideal solution.  I wonder how technically
> > complicated it would be actually?
> >
> > AIUI you'd have to modify nbd-client to query the block limits from
> > the server, which is the hardest part of this, but it's all userspace
> > code.  Then you'd pass those down to the kernel via the ioctl (see
> > drivers/block/nbd.c:__nbd_ioctl).  Then inside the kernel you'd call
> > blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure how
> > you set the max request size, or if that's possible).  See
> > block/blk-settings.c for details of these functions.
> >
> 
> Exactly this.  The kernel just does what the client tells it to do,
> and the kernel can be configured for whatever blocksize.
> Unfortunately there's not a way for the server to advertise to the
> client what to do, you have to configure it on the client.  Adding
> some code to userspace negotiation that happens is the right thing to
> do here to pull the blocksize, and then simply pass this into the
> configuration stuff in the nbd-client and it uses the appropriate
> netlink tag to set the blocksize.

For context, the NBD protocol can now advertise during the initial
handshake, minimum, preferred and maximum block sizes:

https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-size-constraints

nbdkit (since 1.30) supports this, for example:

$ nbdkit eval get_size='echo 256M' block_size='echo 64k 1M 32M' 

$ nbdinfo nbd://localhost
protocol: newstyle-fixed without TLS
export="":
	export-size: 268435456 (256M)
	uri: nbd://localhost:10809/
	contexts:
		base:allocation
		is_rotational: false
		is_read_only: true
		can_cache: false
		can_df: true
		can_fast_zero: false
		can_flush: false
		can_fua: false
		can_multi_conn: false
		can_trim: false
		can_zero: false
		block_size_minimum: 65536      <---
		block_size_preferred: 1048576  <---
		block_size_maximum: 33554432   <---

Rich.

> > As a quick test you could try calling blk_queue_io_* in the kernel
> > driver with hard-coded values, to see if that modifies the requests
> > that are seen by nbdkit.  Should give you some confidence before
> > making the full change.
> >
> > BTW I notice that the kernel NBD driver always reports that it's a
> > non-rotational device, ignoring the server setting ...
> 
> That I can fix easily, I'll get that done.  Thanks,
> 
> Josef

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


More information about the Libguestfs mailing list