[Libguestfs] [PATCH libnbd] states: In recv_into_rbuf and send_from_wbuf loop until EAGAIN.

Eric Blake eblake at redhat.com
Mon Jun 10 22:48:03 UTC 2019


On 6/9/19 1:34 PM, Richard W.M. Jones wrote:
> Previously we performed a single call to recv(2) or send(2) (or the
> GnuTLS equivalents), and even if more data/space was immediately
> available to receive/send we would return to poll.  Instead of this,
> loop until the socket returns EAGAIN.

In general, if you get a short read() or write(), it's BECAUSE the
socket blocked (and doing an immediate additional read or write will
likely return 0/EAGAIN because you are blocked).  So I don't think this
patch as-is is worthwhile.

But unless we have a larger buffer to read into, we're really limiting
ourselves by the length we read. I think you might actually get a
performance boost on read if you rework things to maintain a 4k or
similar buffer of pre-read data and consult that instead of going into
read (I already added the .pending() callback since gnutls in fact does
something like that already) - but then you have to try to read a full
buffer amount.

So, pre-patch, you had something like:

Most replies:
read(4) = 4
next state
read(4) = 4
next state
back to READY
poll()

vs. NBD_CMD_READ reply:
read(4) = 4
next state
read(1M) = 64k
block
poll()
read(1M-64k) = 64k
block
poll()
...

where this patch didn't really change short replies, but changes large
reads into:

read(4) = 4
next state
read(1M) = 64k
read(1M-64k) = 0
block
poll()
read(1M-64k) = 64k
...

so yeah, this patch does NOT look beneficial as-is.  But with my idea of
reading into a 4k buffer or so, we change to something like:

buffer is empty
service 4 to read simple_reply.magic
  read(4k) = 16; buffer contains 16 bytes pre-read
  consume 4 from buffer; buffer contains 12 bytes
next state
service 4 to read simple_reply.error
  consume 4 from buffer; buffer contains 8 bytes - we skipped read :)
next state
service 8 to read simple_reply.handle
  consume 8 from buffer; buffer is empty
next state
back to READY, time to block
poll()

or even:
service 4 to read simple_reply.magic
  read(4k) = 32; buffer contains 32 bytes pre-read
  consume 4 from buffer; buffer contains 28 bytes
next state
service 4 to read simple_reply.error
  consume 4 from buffer; buffer contains 24 bytes - we skipped read :)
next state
service 8 to read simple_reply.handle
  consume 8 from buffer; buffer contains 16 bytes
next state
back to READY, buffer is non-empty, so .pending returns true
service 4 to read simple_reply.magic
  consume 4 from buffer - we again skipped read
etc.

We do, however, lose a bit of efficiency where we are no longer reading
NBD_CMD_READ replies directly into the user's buffer - we'll have to
memcpy() any prefix (up to the length of the buffer, which is why sizing
the buffer larger than 4k is probably a loss) before doing a direct
read() into the tail of the user's buffer.  So we'd have to measure the
effects, both with a client that does lots of 512-byte reads, as well as
a client that does lots of large reads, to see if any pre-buffered
read()ing followed by memcpy() saves enough syscalls to be noticeable.

Then on the write side, there MIGHT be a minor performance boost if we
try and set MSG_MORE any time we complete a request but know that there
are more requests pending, at which point we can continue the state
machine onto that next request and pack multiple commands into one TCP
segment.  But that's different than trying to call write() in a loop
after a short write.

Loops like this are good for blocking connections, but we are
intentionally using non-blocking sockets, where a short read/write is
more likely because we are actually blocked than because of an
interruption from a signal where our next attempt will not actually be
blocked.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190610/ae09c042/attachment.sig>


More information about the Libguestfs mailing list