[Libguestfs] [nbdkit PATCH 2/2] nbd: Reorder cleanup to avoid getting stuck in poll()

Sat Mar 28 20:56:09 UTC 2020

On 3/27/20 6:28 PM, Richard W.M. Jones wrote:

>>
>> But I can at least reuse the same mechanism we have for waking up the
>> poll() loop when first sending a command to the server.  That is,
>> since we already have a pipe-to-self in addition to reading from the
>> server, it's trivial to argue that closing the pipe-to-self will
>> guarantee that the reader thread sees something interesting to break
>> out of its poll() loop, regardless of whether it also sees something
>> interesting from the server after having sent NBD_CMD_DISC, and
>> regardless of whether I need to add in more gnutls_bye() calls to
>> either nbdkit or libnbd.
>>
>> Fixes: ab7760fc
>> Signed-off-by: Eric Blake <eblake at redhat.com>
>> ---
>>
>> May be incomplete: I might also need to break out of the reader loop
>> when read() returns 0.

After some soak time, I was able to reproduce the hangs (after reverting 
my two commits that reordered testsuite cleanup) fairly reliably by 
running test-nbd-tls{,-psk}.sh in a loop 100 times without this patch, 
and could not reproduce it with this patch. But as mentioned on the 
other thread, I also finally saw what the real problem was (often, 
things look so much simpler in hindsight!) - calling nbd_shutdown is 
synchronous and results in two threads competing on poll() on the same 
fd, which is never a good idea.  Switching to nbd_aio_disconnect fixes 
the competition, and also passed my stress-test of 100 cycles without 
hitting the hang, so v2 of this patch will be along those lines instead.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org