[Libguestfs] How to speed up Kernel Client - S3 plugin use-case

Mon Jun 13 18:35:26 UTC 2022

On Jun 13 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote:
> On Mon, Jun 13, 2022 at 11:58:11AM +0100, Nikolaus Rath wrote:
>> On Jun 13 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote:
>> > On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath wrote:
>> >> Hello,
>> >> 
>> >> I am trying to improve performance of the scenario where the kernel's
>> >> NBD client talks to NBDKit's S3 plugin.
>> >> 
>> >> For me, the main bottleneck is currently due to the fact that the kernel
>> >> aligns requests to only 512 B, no matter the blocksize reported by
>> >> nbdkit.
>> >> 
>> >> Using a 512 B object size is not feasible (due to latency and request
>> >> overhead). However, with a larger object size there are two conflicting
>> >> objectives:
>> >> 
>> >> 1. To maximize parallelism (which is important to reduce the effects of
>> >> connection latency), it's best to limit the size of the kernel's NBD
>> >> requests to the object size.
>> >> 
>> >> 2. To minimize un-aligned writes, it's best to allow arbitrarily large
>> >> NBD requests, because the larger the requests the larger the amount of
>> >> full blocks that are written. Unfortunately this means that all objects
>> >> touched by the request are written sequentially.
>> >> 
>> >> I see a number of ways to address that:
>> >> 
>> >> 1. Change the kernel's NBD code to honor the blocksize reported by the
>> >>    NBD server. This would be ideal, but I don't feel up to making this
>> >>    happen. Theoretical solution only.
>> >
>> > This would be the ideal solution.  I wonder how technically
>> > complicated it would be actually?
>> >
>> > AIUI you'd have to modify nbd-client to query the block limits from
>> > the server, which is the hardest part of this, but it's all userspace
>> > code.  Then you'd pass those down to the kernel via the ioctl (see
>> > drivers/block/nbd.c:__nbd_ioctl).  Then inside the kernel you'd call
>> > blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure how
>> > you set the max request size, or if that's possible).  See
>> > block/blk-settings.c for details of these functions.
>> 
>> If it's only about getting the blocksize from the NBD server, then I
>> certainly feel up to the task.
>> 
>> However, nbd-client already has:
>> 
>>        -block-size block size
>> 
>>        -b     Use a blocksize of "block size". Default is 1024; allowed values  are  ei‐
>>               ther 512, 1024, 2048 or 4096
>> 
>> So my worry is that more complicated in-kernel changes will be needed to
>> make other values work. In particular, nbd_is_valid_blksize() (in nbd.c)
>> checks that the block size is less or equal to PAGE_SIZE.
>> 
>> (I'm interested in 32 kB and 512 kB block sizes)
>
> This setting controls ioctl(nbd, NBD_SET_BLKSIZE,...) which inside the
> kernel calls:
>
>   blk_queue_logical_block_size(nbd->disk->queue, blksize);
>   blk_queue_physical_block_size(nbd->disk->queue, blksize);
>
> These functions are documented in block/blk-settings.c, but basically
> control the size of LBAs.  For most devices that would be 512.  (ISTR
> we changed the default in NBD a while back too, since 1024 caused
> problems for creating and reading some filesystems.)
>
> You can't really increase this setting to 2M or whatever S3 needs,
> because firstly it has to be smaller than the page size as you pointed
> out above, but mainly it'll radically change how filesystems get
> created since they use the block size as a basic unit to size other
> disk structures.  In fact I wouldn't be surprised if most filesystems
> just don't function at all if the block size is massive.
>
> Nevertheless, tracing the code which sets this is instructive to see
> how you would adjust the same kernel code to set the minimum and
> preferred I/O settings via blk_queue_io_min / blk_queue_io_opt.  These
> settings are separate from the block size (although must be multiples
> of the block size.)

Ah, this is helpful. Thank you for clarifying!

I'll probably start with some experiments where I just hardcode a larger
value in the kernel and see what happens.

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«