[Libguestfs] nbdkit & qemu 2.12: qemu-img: Protocol error: simple reply when structured reply chunk was expected

Richard W.M. Jones rjones at redhat.com
Sat Mar 23 17:00:32 UTC 2019


On Sat, Mar 23, 2019 at 09:52:41AM -0500, Eric Blake wrote:
> On 3/23/19 7:58 AM, Richard W.M. Jones wrote:
> > NB During this I discovered another (but minor) bug in qemu.  If you
> > feed qemu a long series of block status replies then it eventually
> > closes the connection.  However it does not print an error message.
> > 
> 
> Which version(s) of qemu? Was REQ_ONE in force or not? Was it a long
> series of status replies that could have been coalesced, or were they
> alternating status? Did the replies extend beyond the client's original
> request?
> 
> I'd like to reproduce the failure, to determine if qemu needs more patches.

Here's how to reproduce it.

You will need nbdkit block-status

  https://github.com/rwmjones/nbdkit/tree/block-status @ 8c3b6cbe81bdf

with the small patch attached to this email on top.

Then run it against qemu (from git) like this:

  ./nbdkit -fv memory size=64M \
      --run '/path/to/qemu-img convert $nbd /var/tmp/out'

You should see:

nbdkit: memory.0: debug: pread count=512 offset=0
nbdkit: memory.1: debug: extents count=67108864 offset=0 req_one=1
nbdkit: memory.3: debug: client sent NBD_CMD_DISC, closing connection
nbdkit: memory.3: debug: exiting worker thread memory.3

and qemu-img exits with code 1, but there is no message printed.
(You can use --run '... || echo FAIL' if you don't believe that
qemu-img is exiting with a failure.)

While preparing this email I noticed that v3.1.0 printed an
error message:

  qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS

So I have bisected this to the following commit:

d8b4bad846f08ff0f167b46dc156a5310b750484 is the first bad commit
commit d8b4bad846f08ff0f167b46dc156a5310b750484
Author: Vladimir Sementsov-Ogievskiy <vsementsov at virtuozzo.com>
Date:   Fri Nov 2 18:11:52 2018 +0300

    block/nbd-client: use traces instead of noisy error_report_err
    
    Reduce extra noise of nbd-client, change 083 correspondingly.
    
    In various commits (be41c100 in 2.10, f140e300 in 2.11, 78a33ab
    in 2.12), we added spots where qemu as an NBD client would report
    problems communicating with the server to stderr, because there
    was no where else to send the error to.  However, this is racy,
    particularly since the most common source of these errors is when
    either the client or the server abruptly hangs up, leaving one
    coroutine to report the error only if it wins (or loses) the
    race in attempting the read from the server before another
    thread completes its cleanup of a protocol error that caused the
    disconnect in the first place.  The race is also apparent in the
    fact that differences in the flush behavior of the server can
    alter the frequency of encountering the race in the client (see
    commit 6d39db96).
    
    Rather than polluting stderr, it's better to just trace these
    situations, for use by developers debugging a flaky connection,
    particularly since the real error that either triggers the abrupt
    disconnection in the first place, or that results from the EIO
    when a request can't receive a reply, DOES make it back to the
    user in the normal Error propagation channels.
    
    Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov at virtuozzo.com>
    Message-Id: <20181102151152.288399-4-vsementsov at virtuozzo.com>
    [eblake: drop depedence on error hint, enhance commit message]
    Signed-off-by: Eric Blake <eblake at redhat.com>

:040000 040000 8cabc7689a7ea1667850d7d670ad4791873ce0a5 80fcc304538a7c7d47d8cb0d15e69cf5f1472ea9 M	block
:040000 040000 eb3965069b953ae0a5e7f00169a22c91cc428d81 0dbd45dc86117204869fb8f5bae584be07806d5b M	tests

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
-------------- next part --------------
diff --git a/server/protocol.c b/server/protocol.c
index ac6eb2c..d406dc8 100644
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -503,6 +503,9 @@ send_structured_reply_block_status (struct connection *conn,
   if (blocks == NULL)
     return connection_set_status (conn, -1);
 
+  nr_blocks = 32;
+  blocks[0].length = htobe32 (4096);
+
   reply.magic = htobe32 (NBD_STRUCTURED_REPLY_MAGIC);
   reply.handle = handle;
   reply.flags = htobe16 (NBD_REPLY_FLAG_DONE);
@@ -526,7 +529,7 @@ send_structured_reply_block_status (struct connection *conn,
 
   /* Send each block descriptor. */
   for (i = 0; i < nr_blocks; ++i) {
-    r = conn->send (conn, &blocks[i], sizeof blocks[i]);
+    r = conn->send (conn, &blocks[0], sizeof blocks[0]);
     if (r == -1) {
       nbdkit_error ("write reply: %s: %m", name_of_nbd_cmd (cmd));
       return connection_set_status (conn, -1);


More information about the Libguestfs mailing list