[Libguestfs] [PATCH libnbd v2 3/4] api: Implement concurrent writer.

Tue Jun 4 12:02:29 UTC 2019

On 6/4/19 6:16 AM, Richard W.M. Jones wrote:
> There are several races / deadlocks which I've thought about.  Let's
> see if I can remember them all ...
> 
> (1) This I experienced: nbd_aio_get_fd deadlocks if there are
> concurrent synchronous APIs going on.  A typical case is where you set
> up the concurrent writer thread before connecting, and then call a
> synchronous connect function such as connect_tcp.  The synchronous
> function grabs h->lock, then writes something, which eventually
> invokes the writer thread which calls nbd_aio_get_fd and deadlocks on
> h->lock.
> 
> -> Probably the writer thread should be forbidden from using
> nbd_handle.

Or rather, forbidden from using nbd_handle in a locking manner.  We
should basically document a whitelist of functions that are safe to call...

> 
> (2) The writer thread calls nbd_aio_get_fd, but the fd returned might
> might be closed before we use it, resulting in either EBADF or worse
> using another fd that happens to be opened around the same time.
> 
> -> I think the solution to this would be to allow the writer callback
> to signal that the socket is about to be closed (eg. add an extra flag
> parameter to the callback), which would kill the writer thread.

Yes, I think a flag argument to the writer callback can serve a couple
of additional purposes:

Right now, the writer callback is reached exactly once for most commands
(on data from h->request), and exactly twice for NBD_CMD_WRITE
(h->request, then the user's persistent buffer). A flag argument would
let us inform the user whether the buffer is transient (h->request is
volatile; the writer thread MUST memcpy it aside; but it is fixed in
length so the memcpy is not performance sensitive) or persistent (the
NBD_CMD_WRITE buffer was provided by the caller, and may be several
megabytes; memcpy() can negatively impact the cache and is pointless
since the original buffer is not going to be scribbled on); it also lets
us inform the writer thread on whether TCP_CORK would be worthwhile
(corking the outgoing socket for NBD_CMD_WRITE would let the OS buffer
things into a single wire transaction on both the request header and the
payload, rather than more overhead for two separate TCP packets,
particularly since we've disabled Nagle's algorithm).

Or, in pseudo-code:

nbd_aio_pread(,userbuf,) - >
  write_callback(data, h->request, len, 0);
nbd_aio_pwrite(,userbuf,) - >
  write_callback(data, h->request, len, LIBNBD_WRITE_CORK)
  write_callback(data, userbuf, len, LIBNBD_WRITE_PERSISTENT_BUF)
nbd_aio_shutdown() - >
  write_callback(data, NULL, 0, LIBNBD_WRITE_SHUTDOWN)

> 
> (3) nbd_concurrent_writer_error could lose errors.  This might happen
> if the socket is closed normally without writing anything, which would
> never check h->writer_error.
> 
> (4) nbd_concurrent_writer_error possibly deadlocks too since it needs
> to grab h->lock.  Basically the same as (1).

...and this function must be in the documented safe whitelist, but that
means that instead of grabbing h->lock, we have to implement error
reporting by use of atomics.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190604/f7d178b6/attachment.sig>