[Libguestfs] nbdcpy: from scratch nbdcopy using io_uring

Eric Blake eblake at redhat.com
Mon Jun 28 17:18:37 UTC 2021


On Sun, Jun 27, 2021 at 05:00:00PM +0530, Abhay Raj Singh wrote:
> I ran into a problem while working on receiving data from nbd source
> (Reply to NBD_CMD_READ)
> 
> As you know we need to parse the error in Reply Header before we can
> proceed reading the data.
> Let's say an error occurred so instead of
> HEADER1_DATA1..._HEADER2_DATA2... we will
> get HEADER1_HEADER2_ DATA2... (as far as I know) so submitting a recv
> request to io_uring
> with length = sizeof(HEADER1+DATA1) would cause problem as it won't
> detect NBD packet boundaries
> and will give us as many bytes we ask it (I may be wrong here that's
> what I read till now).

Two problems with requesting length = sizeof(HEADER1+DATA1):

First, as you pointed out, if the server errors out, you will only get
HEADER1 bytes back (including the error indication), and no data
bytes.

Second, once you start issuing out-of-order requests to the server
rather than synchronous waiting for a reply before beginning the next
request, then you are also at risk that the server might answer
HEADER2 prior to answering HEADER1.

> 
> A remedy to this would be just submit 'header reads' to io_uring when
> we get a read and if header says there were no errors
> we can be sure there is length bytes ready to be read in the
> buffer(rest of the NBD packet) and read won't block.

Yes, you really DO have to submit a read request for JUST a header,
and then based on what that header tells you are you finally able to
decipher what to expect next on the wire (another header, or WHICH
read you are getting a reply to).  As Rich said, processing the
headers via user-space copies is probably fine, where the real savings
come into play when processing the data payloads.

> Now, as far as I can tell this would work as I expect but our main
> concern is avoiding copy_user_enhanced_fast_string
> so this won't be nice.
> 
> Also attaching metadata (Operation) to read SQE doesn't make sense
> because As far as I know io_uring won't be able to tell
> the difference the read is for which io_uring request, Reply Header's
> handle will tell us which operation in operations vector
> does this NBD packet belong to.

Yeah, because of the out-of-order potential of the NBD protocol, you
will have to be careful that you are processing headers before knowing
where to send payloads.

> 
> Another solution would be opening multiple sockets one for each slot
> in operations vector, only one NBD operation runs on a socket
> i.e. only one inflight request per socket, that too sounds like a bad idea.

To some extent, multiple sockets is what Rich mentioned in the
multi-conn approach, but having one socket per parallel operation is
going to be slower than properly handling out-of-order traffic on one
socket (there may still be savings by having multiple sockets, but you
also want to be sure to handle multiple in-flight commands per
socket).


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list