[Libguestfs] [PATCH libnbd] lib: Kill subprocess in nbd_close.

Thu Jul 25 19:10:48 UTC 2019

On 7/25/19 1:28 PM, Eric Blake wrote:

> 
> Note that this patch is trying to speed up a slow waitpid() issued
> _after_ we call h->sock->ops->close().  You'd think that after we
> close() our end, that the remote end doing a read() would see an
> immediate EOF, and that would be enough for the server to shut down
> quickly instead of waiting for the delay to expire and finally notice
> during its write() that we are no longer reading.  However, POSIX says
> that when sockets are involved, a close() on our end does not trigger an
> immediate EOF on the other side's read end, but rather may start an
> asynchronous timer to try to finish the TCP connection gracefully to
> flush any pending traffic, where both directions have to agree that the
> connection is dead before either direction sees EOF/EPIPE.  And that's
> where the shutdown(2) API comes in handy: it forces the kernel to
> realize that one or both directions of traffic are no longer needed,
> making this asynchronous handling in close() much faster.  At least
> nbdkit is known to react favorably to a shutdown() request - see nbdkit
> commit 430f8141.
> 
> We may want to add h->sock->ops->shutdown() (especially if we want to
> handle SHUT_WR through adding more states for more graceful shutdown
> paths, including when TLS is involved), but even in the short-term,
> something as simple as adding shutdown(h->fd, SHUT_RDWR) immediately
> before the s->sock->ops->close() would be helpful.

That effect of wanting to use shutdown() is still real, but may not be
the only thing at play here. As we discussed on IRC, adding shutdown()
alone was not enough to make this specific nbdkit invocation get to
exit() any faster; the waitpid() was still stalling without a kill().
So we've identified a second issue: nbdkit is refusing to exit() until
the plugin and all filters unload, but the filter cannot unload until it
has no pending transactions, but the pending transaction is stuck in a
"non-interruptible" usleep() (since all the signal handlers installed by
nbdkit use SA_RESTART, the usleep() doesn't exit early due to EINTR; it
takes the external SIGHUP to force nbdkit to die without waiting for the
filter to clean up).  So the thought here is that maybe nbdkit should be
patched to add an nbdkit_usleep() function, which returns 0 if the full
sleep happened, or returns -1 if the sleep is interrupted early in one
thread due to another thread noticing EOF on the read()s from the
client.  The delay filter may be the only code to make use of
nbdkit_usleep, but having an easy way to recognize that the client has
gone away and thus waiting out the rest of the sleep is pointless will
make nbdkit much more responsive to a client close()ing the connection.
I'll work on the nbdkit patch.

Finally, calling kill() unconditionally in nbd_close() may or may not be
the nicest, so we have the idea of adding a new API (maybe named
nbd_kill) which the user can request which signal to send to the
underlying command pid, and control how long it waits before trying a
second signal attempt, all prior to calling nbd_close().  Rich will
tackle something along those lines.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190725/9d74953a/attachment.sig>