[Libguestfs] [libnbd PATCH] info: Try harder for graceful disconnect from server

Wed Sep 27 21:22:48 UTC 2023

On Fri, Sep 22, 2023 at 04:03:19PM -0500, Eric Blake wrote:
> On Fri, Sep 22, 2023 at 09:47:55PM +0100, Richard W.M. Jones wrote:
> > On Fri, Sep 22, 2023 at 12:33:36PM -0500, Eric Blake wrote:
> > > There are a number of ways in which gathering information can fail.
> > > But when possible, it makes sense to let the server know that we are
> > > done talking, as it minimizes the likelihood that nbdinfo's error
> > > message will be obscured by an accompanying error message by the
> > > server complaining about an unclean disconnect.
> > > 
> > > For example, with a one-off qemu temporarily patched as mentioned in
> > > commit 0f8ee8c6 to advertise sizes larger than 2^63, kicking off
> > > 'qemu-nbd -f raw -r file &' produces:
> > > 
> > > pre-patch:
> > > 
> > > $ ./run nbdinfo --size nbd://localhost
> > > /home/eblake/libnbd/info/.libs/nbdinfo: nbd_get_size: server claims size 9223372036854781440 which does not fit in signed result: Value too large for defined data type
> > > qemu-nbd: option negotiation failed: Failed to read opts magic: Unexpected end-of-file before all data were read
> > 
> > This doesn't necessarily seem like a bug?
> 
> It's a quality of service thing - qemu is just being noisy that the
> client closed the socket abruptly without giving any reason why.  It
> doesn't change libnbd's behavior (nbdinfo has already reported its
> error message), but may confuse people reading qemu-nbd logs.
> 
> It may also be a case where we could patch qemu-nbd to not output a
> complaint if the client exited at a clean point in the back-and-forth
> transactions.  We still want to be noisy if the socket closes after
> the first byte has been read, but if there are zero bytes available,
> announcing that the client no longer cares does not add much value.
> 
> > 
> > This feels like quite a significant change in behaviour to me.  In
> > particular I'm worried if it interacts in some subtle way with the
> > forking done by the "[ ... ]" syntax for running servers on the
> > command line (or any of the other ways that libnbd might fork/exec).
> 
> Interesting observation.  atexit() handlers are not preserved across
> exec, and I think all our fork() paths end in either exec or _exit
> (also where atexit handlers are ignored), so I don't think we are
> risking calling the handler twice.
> 
> > 
> > Can we hold this patch until after 1.18 is released and then put it
> > in?  Should only be a week or two.
> 
> Sure, being conservative on this one is fine by me.  I'll delay it
> until after 1.18.
> 
> > 
> > Provisionally ACKed for post-1.18

Now in as commit fd4f3fea, so we can start getting some soak time under CI.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org