[Libguestfs] nbdcpy: from scratch nbdcopy using io_uring

Mon Aug 23 17:20:26 UTC 2021

[adding the NBD list into cc]

On Mon, Aug 23, 2021 at 09:26:34PM +0530, Abhay Raj Singh wrote:
> I had an idea for optimizing my current approach, it's good in some
> ways but can be faster with some breaking changes to the protocol.
> 
> Currently, we read (from socket connected to source) one request at a time
> the simple flow looks like `read_header(io_uring) ---- success --->
> recv(data) --- success ---> send(data) & queue another read header`
> but it's not as efficient as it could be at best it's a hack.
> 
> Another approach I am thinking about is a large buffer
> where we can read all of the socket's data and process packets from
> that buffer as all the I/O is handled.
> this minimizes the number of read requests to the kernel as we do 1
> read for multiple NBD packets.
> 
> Further optimization requires changing the NBD protocol a bit
> Current protocol
> 1. Memory representation of a response (20-byte header + data)
> 2. Memory representation of a request (28-byte header + data)
> 
> HHHHH_DDDDDDDDD...
> HHHHHHH_DDDDDDDDD...
> 
> H and D represent 4 bytes, _ represents 0 bytes

You are correct that requests are currently 28 bytes header plus any
payload (where payload is currently only in NBD_CMD_WRITE).  But
responses are two different lengths: simple responses are 16 bytes +
payload (payload only for NBD_CMD_READ, and only if structured replies
not negotiated), while structured responses are 20 bytes + payload
(but while NBD_CMD_READ and NBD_CMD_BLOCK_STATUS require structured
replies, a compliant server can still send simple replies to other
commands).  So it's even trickier than you represent here, as reading
20-byte headers of a reply is not going to always do the right thing.

> 
> With the large buffer approach, we read data into a large buffer, then
> copy the NBD packet's data to a new buffer, strap a new header to it
> and send it.
> This copying is what we wanted to avoid in the first place.
> 
> If the response header was 28 bytes or the first 8-bytes of data were
> useless we could have just overwritten the header part and sent data
> directly from the large buffer, therefore avoiding the copy.
> 
> What are your thoughts?

There's already discussions about what it would take to extend the NBD
protocol to support 64-bit requests (not that we'd want to go beyond
current server restrictions of 32M or 64M maximum NBD_CMD_READ and
NBD_CMD_WRITE, but more so we can permit quick image zeroing via a
64-bit NBD_CMD_WRITE_ZEROES).  Your observation that having the
request and response headers be equally sized for more efficient
handling is worthwhile to consider in making such a protocol extension
- of necessity, it would have to be via an NBD_OPT_* option requested
by the client during negotiation and responded to affirmatively by the
server, before both sides then use the new-size packets in both
directions after NBD_OPT_GO (and a client would still have to be
prepared to fall back to the unequal-sized headers if the server
doesn't understand the option).

For that matter, is there a benefit to having cache-line-optimized
sizing, where all headers are exactly 32 bytes (both requests and
responses, and both simple and structured replies)?  I'm thinking
maybe NBD_OPT_FIXED_SIZE_HEADER might be a sane name for such an
option.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org