[Libguestfs] nbdcpy: from scratch nbdcopy using io_uring
Eric Blake
eblake at redhat.com
Mon Jun 28 17:18:37 UTC 2021
On Sun, Jun 27, 2021 at 05:00:00PM +0530, Abhay Raj Singh wrote:
> I ran into a problem while working on receiving data from nbd source
> (Reply to NBD_CMD_READ)
>
> As you know we need to parse the error in Reply Header before we can
> proceed reading the data.
> Let's say an error occurred so instead of
> HEADER1_DATA1..._HEADER2_DATA2... we will
> get HEADER1_HEADER2_ DATA2... (as far as I know) so submitting a recv
> request to io_uring
> with length = sizeof(HEADER1+DATA1) would cause problem as it won't
> detect NBD packet boundaries
> and will give us as many bytes we ask it (I may be wrong here that's
> what I read till now).
Two problems with requesting length = sizeof(HEADER1+DATA1):
First, as you pointed out, if the server errors out, you will only get
HEADER1 bytes back (including the error indication), and no data
bytes.
Second, once you start issuing out-of-order requests to the server
rather than synchronous waiting for a reply before beginning the next
request, then you are also at risk that the server might answer
HEADER2 prior to answering HEADER1.
>
> A remedy to this would be just submit 'header reads' to io_uring when
> we get a read and if header says there were no errors
> we can be sure there is length bytes ready to be read in the
> buffer(rest of the NBD packet) and read won't block.
Yes, you really DO have to submit a read request for JUST a header,
and then based on what that header tells you are you finally able to
decipher what to expect next on the wire (another header, or WHICH
read you are getting a reply to). As Rich said, processing the
headers via user-space copies is probably fine, where the real savings
come into play when processing the data payloads.
> Now, as far as I can tell this would work as I expect but our main
> concern is avoiding copy_user_enhanced_fast_string
> so this won't be nice.
>
> Also attaching metadata (Operation) to read SQE doesn't make sense
> because As far as I know io_uring won't be able to tell
> the difference the read is for which io_uring request, Reply Header's
> handle will tell us which operation in operations vector
> does this NBD packet belong to.
Yeah, because of the out-of-order potential of the NBD protocol, you
will have to be careful that you are processing headers before knowing
where to send payloads.
>
> Another solution would be opening multiple sockets one for each slot
> in operations vector, only one NBD operation runs on a socket
> i.e. only one inflight request per socket, that too sounds like a bad idea.
To some extent, multiple sockets is what Rich mentioned in the
multi-conn approach, but having one socket per parallel operation is
going to be slower than properly handling out-of-order traffic on one
socket (there may still be savings by having multiple sockets, but you
also want to be sure to handle multiple in-flight commands per
socket).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list