[Libguestfs] [PATCH nbdkit] server: When using --run, wait for captive nbdkit to exit.
Richard W.M. Jones
rjones at redhat.com
Sat Mar 7 11:18:12 UTC 2020
On Thu, Feb 27, 2020 at 03:12:11PM +0000, Richard W.M. Jones wrote:
> In tests/test-eval.sh we have a test which looks something like this:
>
> nbdkit eval close='echo closed > file' --run 'qemu-img info $nbd'
> if ! grep 'closed' file; then fail; fi
>
> However there was a race condition here. nbdkit exits when the --run
> command exits without waiting for the captive nbdkit subprocess. Thus
> we couldn't be sure that the final 'closed' message got written to the
> file. It worked most of the time, but on slow machines the test
> failed.
>
> This indicates that we ought to wait for the captive nbdkit to exit.
> One reason is so that plugin cleanup is done before we continue after
> the captive nbdkit. That should make shell scripts using captive
> nbdkit + any plugin that does significant cleanup on exit more
> predictable.
>
> This adds the appropriate call to waitpid but we still ignore the exit
> status of the captive nbdkit subprocess (in fact it is most likely to
> fail since it will sent a SIGTERM signal).
> ---
> server/captive.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/server/captive.c b/server/captive.c
> index 102a77ea..0a243d2b 100644
> --- a/server/captive.c
> +++ b/server/captive.c
> @@ -171,12 +171,14 @@ run_command (void)
> r = EXIT_FAILURE;
> break;
> case 0:
> - /* Captive nbdkit still running; kill it, but no need to wait
> - * for it, as the --run program's exit status is good enough (if
> - * the captive nbdkit fails to exit after SIGTERM, we have a
> - * bigger bug to fix).
> + /* Captive nbdkit is still running; kill it. We want to wait
> + * for nbdkit to exit since that ensures all cleanup is done in
> + * the plugin before we return. However we don't care if nbdkit
> + * returns an error, the exit code we return always comes from
> + * the --run command.
> */
> kill (pid, SIGTERM);
> + waitpid (pid, NULL, 0);
> break;
> default:
> /* Captive nbdkit exited unexpectedly; update the exit status. */
I *thought* we'd fixed this, but it failed on a recent ppc64le build
(of 1.19.2, so including this patch). At the moment I can't think of
a plausible reason for why this is still failing.
https://github.com/libguestfs/nbdkit/blob/be44e94a7d0494ba562cb4c5c08bc5825d612881/tests/test-eval.sh
Rich.
FAIL: test-eval.sh
==================
+ requires qemu-img --version
+ files='eval.out eval.missing'
+ rm -f eval.out eval.missing
+ cleanup_fn rm -f eval.out eval.missing
+ _cleanup_hook[${#_cleanup_hook[@]}]='rm -f eval.out eval.missing'
+ nbdkit -U - eval 'get_size=echo 64M' 'pread=dd if=/dev/zero count=$3 iflag=count_bytes' 'missing=echo "in missing: $@" >> eval.missing; exit 2' unload= --run 'qemu-img info $nbd'
+ cat eval.out
image: nbd+unix://?socket=/tmp/nbdkitExyrix/socket
file format: raw
virtual size: 64 MiB (67108864 bytes)
disk size: unavailable
+ grep '67108864 bytes' eval.out
virtual size: 64 MiB (67108864 bytes)
+ cat eval.missing
in missing: config_complete
in missing: thread_model
in missing: get_ready
in missing: thread_model
in missing: preconnect false
in missing: open false
in missing: can_write
in missing: can_flush
in missing: is_rotational
in missing: can_multi_conn
in missing: can_cache
in missing: can_extents
+ grep 'in missing' eval.missing
in missing: config_complete
in missing: thread_model
in missing: get_ready
in missing: thread_model
in missing: preconnect false
in missing: open false
in missing: can_write
in missing: can_flush
in missing: is_rotational
in missing: can_multi_conn
in missing: can_cache
in missing: can_extents
+ grep 'in missing: config_complete' eval.missing
in missing: config_complete
+ grep 'in missing: thread_model' eval.missing
in missing: thread_model
in missing: thread_model
+ grep 'in missing: can_write' eval.missing
in missing: can_write
+ grep 'in missing: is_rotational' eval.missing
in missing: is_rotational
+ grep 'in missing: get_ready' eval.missing
in missing: get_ready
+ grep 'in missing: preconnect' eval.missing
in missing: preconnect false
+ grep 'in missing: open' eval.missing
in missing: open false
+ grep 'in missing: close' eval.missing
++ _run_cleanup_hooks
++ status=1
++ set +e
++ trap '' INT QUIT TERM EXIT ERR
++ echo ./test-eval.sh: run cleanup hooks: exit code 1
./test-eval.sh: run cleanup hooks: exit code 1
++ (( i = 0 ))
++ (( i < 1 ))
++ rm -f eval.out eval.missing
++ (( ++i ))
++ (( i < 1 ))
++ exit 1
FAIL test-eval.sh (exit status: 1)
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org
More information about the Libguestfs
mailing list