[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.

Sat Mar 7 11:18:12 UTC 2020

On Thu, Feb 27, 2020 at 03:12:11PM +0000, Richard W.M. Jones wrote:
> In tests/test-eval.sh we have a test which looks something like this:
> 
>   nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd'
>   if ! grep 'closed' file; then fail; fi
> 
> However there was a race condition here.  nbdkit exits when the --run
> command exits without waiting for the captive nbdkit subprocess.  Thus
> we couldn't be sure that the final 'closed' message got written to the
> file.  It worked most of the time, but on slow machines the test
> failed.
> 
> This indicates that we ought to wait for the captive nbdkit to exit.
> One reason is so that plugin cleanup is done before we continue after
> the captive nbdkit.  That should make shell scripts using captive
> nbdkit + any plugin that does significant cleanup on exit more
> predictable.
> 
> This adds the appropriate call to waitpid but we still ignore the exit
> status of the captive nbdkit subprocess (in fact it is most likely to
> fail since it will sent a SIGTERM signal).
> ---
>  server/captive.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/server/captive.c b/server/captive.c
> index 102a77ea..0a243d2b 100644
> --- a/server/captive.c
> +++ b/server/captive.c
> @@ -171,12 +171,14 @@ run_command (void)
>        r = EXIT_FAILURE;
>        break;
>      case 0:
> -      /* Captive nbdkit still running; kill it, but no need to wait
> -       * for it, as the --run program's exit status is good enough (if
> -       * the captive nbdkit fails to exit after SIGTERM, we have a
> -       * bigger bug to fix).
> +      /* Captive nbdkit is still running; kill it.  We want to wait
> +       * for nbdkit to exit since that ensures all cleanup is done in
> +       * the plugin before we return.  However we don't care if nbdkit
> +       * returns an error, the exit code we return always comes from
> +       * the --run command.
>         */
>        kill (pid, SIGTERM);
> +      waitpid (pid, NULL, 0);
>        break;
>      default:
>        /* Captive nbdkit exited unexpectedly; update the exit status. */

I *thought* we'd fixed this, but it failed on a recent ppc64le build
(of 1.19.2, so including this patch).  At the moment I can't think of
a plausible reason for why this is still failing.

https://github.com/libguestfs/nbdkit/blob/be44e94a7d0494ba562cb4c5c08bc5825d612881/tests/test-eval.sh

Rich.

FAIL: test-eval.sh
==================
+ requires qemu-img --version
+ files='eval.out eval.missing'
+ rm -f eval.out eval.missing
+ cleanup_fn rm -f eval.out eval.missing
+ _cleanup_hook[${#_cleanup_hook[@]}]='rm -f eval.out eval.missing'
+ nbdkit -U - eval 'get_size=echo 64M' 'pread=dd if=/dev/zero count=$3 iflag=count_bytes' 'missing=echo "in missing: $@" >> eval.missing; exit 2' unload= --run 'qemu-img info $nbd'
+ cat eval.out
image: nbd+unix://?socket=/tmp/nbdkitExyrix/socket
file format: raw
virtual size: 64 MiB (67108864 bytes)
disk size: unavailable
+ grep '67108864 bytes' eval.out
virtual size: 64 MiB (67108864 bytes)
+ cat eval.missing
in missing: config_complete
in missing: thread_model
in missing: get_ready
in missing: thread_model
in missing: preconnect false 
in missing: open false 
in missing: can_write 
in missing: can_flush 
in missing: is_rotational 
in missing: can_multi_conn 
in missing: can_cache 
in missing: can_extents 
+ grep 'in missing' eval.missing
in missing: config_complete
in missing: thread_model
in missing: get_ready
in missing: thread_model
in missing: preconnect false 
in missing: open false 
in missing: can_write 
in missing: can_flush 
in missing: is_rotational 
in missing: can_multi_conn 
in missing: can_cache 
in missing: can_extents 
+ grep 'in missing: config_complete' eval.missing
in missing: config_complete
+ grep 'in missing: thread_model' eval.missing
in missing: thread_model
in missing: thread_model
+ grep 'in missing: can_write' eval.missing
in missing: can_write 
+ grep 'in missing: is_rotational' eval.missing
in missing: is_rotational 
+ grep 'in missing: get_ready' eval.missing
in missing: get_ready
+ grep 'in missing: preconnect' eval.missing
in missing: preconnect false 
+ grep 'in missing: open' eval.missing
in missing: open false 
+ grep 'in missing: close' eval.missing
++ _run_cleanup_hooks
++ status=1
++ set +e
++ trap '' INT QUIT TERM EXIT ERR
++ echo ./test-eval.sh: run cleanup hooks: exit code 1
./test-eval.sh: run cleanup hooks: exit code 1
++ (( i = 0 ))
++ (( i < 1 ))
++ rm -f eval.out eval.missing
++ (( ++i  ))
++ (( i < 1 ))
++ exit 1
FAIL test-eval.sh (exit status: 1)

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org