[Libguestfs] [nbdkit PATCH 2/2] nbd: Reorder cleanup to avoid getting stuck in poll()

Richard W.M. Jones rjones at redhat.com
Fri Mar 27 23:28:40 UTC 2020


On Fri, Mar 27, 2020 at 05:33:28PM -0500, Eric Blake wrote:
> We have been seeing sporadic hangs on test-nbd-tls-psk.sh, where even
> though the client to the 'nbdkit nbd' process has cleanly exited,
> things are stalled in .close where nbd is trying to pthread_join() the
> reader thread, while the reader thread is itself blocked on a poll()
> that will never make additional progress.  Tracing the race is
> difficult: nbd_shutdown() sends NBD_CMD_DISC to the server, and the
> NBD protocol does not require the server to send a response but does
> not forbid the server from using gnutls_bye to at least gracefully end
> the TLS session.  So where a plaintext server just closes the socket
> (and the resulting EOF exits our poll()), it appears that with TLS, we
> can sometimes hit the situation where the server is waiting for a
> response to gnutls_bye() (note that qemu does not use this function,
> but nbdkit does), while at the same time our client is waiting to see
> EOF on the socket, resulting in deadlock.  In the past (see commit
> 430f8141), we added shutdown(SHUT_WR) for plaintext clients to prod
> the server into closing the socket faster, but libnbd does not yet
> have that counterpart action.
> 
> But I can at least reuse the same mechanism we have for waking up the
> poll() loop when first sending a command to the server.  That is,
> since we already have a pipe-to-self in addition to reading from the
> server, it's trivial to argue that closing the pipe-to-self will
> guarantee that the reader thread sees something interesting to break
> out of its poll() loop, regardless of whether it also sees something
> interesting from the server after having sent NBD_CMD_DISC, and
> regardless of whether I need to add in more gnutls_bye() calls to
> either nbdkit or libnbd.
> 
> Fixes: ab7760fc
> Signed-off-by: Eric Blake <eblake at redhat.com>
> ---
> 
> May be incomplete: I might also need to break out of the reader loop
> when read() returns 0.
> 
>  plugins/nbd/nbd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
> index e5b8f338..23a7da06 100644
> --- a/plugins/nbd/nbd.c
> +++ b/plugins/nbd/nbd.c
> @@ -537,10 +537,10 @@ nbdplug_close_handle (struct handle *h)
>  {
>    if (nbd_shutdown (h->nbd, 0) == -1)
>      nbdkit_debug ("failed to clean up handle: %s", nbd_get_error ());
> +  close (h->fds[1]);
>    if ((errno = pthread_join (h->reader, NULL)))
>      nbdkit_debug ("failed to join reader thread: %m");
>    close (h->fds[0]);
> -  close (h->fds[1]);
>    nbd_close (h->nbd);
>    free (h);
>  }

Probably deserves a comment in the code to mention to future Eric (and
Richard) not to move that close call.  But yes this is also fine, so ACK.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




More information about the Libguestfs mailing list