[Libguestfs] [libnbd PATCH] tests: Enhance errors test

Tue Jul 2 14:01:35 UTC 2019

On 6/30/19 12:54 PM, Richard W.M. Jones wrote:
> On Thu, Jun 27, 2019 at 10:18:30PM -0500, Eric Blake wrote:
>> +  /* Queue up a write command so large that we block on POLLIN, then queue
>> +   * multiple disconnects. XXX The last one should fail.
>> +   */
>> +  if (nbd_aio_pwrite (nbd, buf, 2 * 1024 * 1024, 0, 0) == -1) {
>> +    fprintf (stderr, "%s: %s\n", argv[0], nbd_get_error ());
>> +    exit (EXIT_FAILURE);
>> +  }
>> +  if ((nbd_aio_get_direction (nbd) & LIBNBD_AIO_DIRECTION_WRITE) == 0) {
>> +    fprintf (stderr, "%s: test failed: "
>> +             "expect to be blocked on write\n",
>> +             argv[0]);
>> +    exit (EXIT_FAILURE);
>> +  }
> 
> This test fails when run under valgrind.  An abbreviated log shows
> what's happening:
> 
> libnbd: debug: nbd_aio_pwrite: event CmdIssue: READY -> ISSUE_COMMAND.START
> libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.START -> ISSUE_COMMAND.
> SEND_REQUEST
> libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.SEND_REQUEST -> ISSUE_C
> OMMAND.PREPARE_WRITE_PAYLOAD
> libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD -
>> ISSUE_COMMAND.SEND_WRITE_PAYLOAD
> libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.SEND_WRITE_PAYLOAD -> I
> SSUE_COMMAND.FINISH
> libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.FINISH -> READY
> /home/rjones/d/libnbd/tests/.libs/lt-errors: test failed: expect to be blocked on write
> 
> It seems as if this is caused by valgrinded code running more slowly,
> rather than an actual valgrind/memory error.

Or even that valgrind's interception of send()/recv() performs buffering
differently than we get by default from the kernel.  I don't know if
running strace on valgrind is a sensible enough thing to do to see
syscall behavior?

> 
> I wonder if we could remove the race using a custom nbdkit-sh-plugin
> which would block on writes until (eg) a local trigger file was
> touched?  Even that seems as if it would depend on the amount of data
> that the kernel is able to buffer.

I don't know how to make an nbdkit plugin stop the code in nbdkit/server
from read()ing from the client (the plugin code doesn't get to run until
the core has learned that the client wants a command serviced).  But it
may be possible to tweak things to send back-to-back write requests,
where even if the first write request gets sent completely, the plugin
can delay responding to that first write and use --filter=noparallel to
prevent the second command from reaching nbdkit.  I'll play with that,
to see if I can reproduce the valgrind race, as well as work around it
with back-to-back write commands to increase the likelihood of actually
preventing nbdkit from consuming the second command.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190702/bb71e10a/attachment.sig>