[Libguestfs] [PATCH nbdkit 4/4] tests: Add a helper function which waits for nbdkit to start up.

Tue Sep 11 21:14:36 UTC 2018

On Tue, Sep 11, 2018 at 03:02:47PM -0500, Eric Blake wrote:
[Lots of issues]

I'll fix all these, thanks.

> >  files="blocksize1.img blocksize1.log blocksize1.sock blocksize1.pid
> >@@ -72,27 +73,13 @@ cleanup ()
> >  trap cleanup INT QUIT TERM EXIT ERR
> >  # Run two parallel nbdkit; to compare the logs and see what changes.
> >-nbdkit -P blocksize1.pid -U blocksize1.sock \
> >+start_nbdkit -P blocksize1.pid -U blocksize1.sock \
> >         --filter=log file logfile=blocksize1.log blocksize1.img
> >-nbdkit -P blocksize2.pid -U blocksize2.sock --filter=blocksize \
> >+start_nbdkit -P blocksize2.pid -U blocksize2.sock --filter=blocksize \
> >         --filter=log file logfile=blocksize2.log blocksize2.img \
> >         minblock=1024 maxdata=512k maxlen=1M
> 
> If the first process starts but the second fails, then you have not
> set pid1=, and leak the first process.  This is a regression from
> the old code, which managed to capture pid1 before triggering the
> trap to cleanup(), and thus killed the successful nbdkit.  Several
> tests are impacted.

So one other idea I had is to have start_nbdkit create a trap function
to clean up the nbdkit process (we'd still need trap functions in the
main program for deleting temporary files).  I don't know if this
would work, but if it does it would fix the above problem.

> Less importantly, the old code performed startup in parallel (kick
> off both nbdkit processes, then start the loop; if a pid file was
> not produced by either one of the nbdkit processes, the test
> determined that within 10 seconds); now startup is serial (you don't
> start a second nbdkit until the first one has succeeded). For a test
> with two nbdkit processes, that could expand typical test time from
> 1 second to 2 (if the pid file creation is not instantaneous, but
> the first sleep 1 was sufficient for both processes to be ready);
> and expand worst-case failure detection under heavy load from 10
> into 20 seconds (if the first succeeded on the last iteration, but
> the second timed out).  I don't see anything wrong with the change,
> although it might be worth a mention in the commit message as
> intentional.

I'm not too worried about this since it should cause only a minor
increase in time.

> I'm also seeing a common pattern of assigning pid=$(cat file.pid)
> right after calling start_nbdkt.  Would it be worth changing the
> signature of start_nbdkit() to also populate a pid variable on
> success, as in:
> 
> start_nbdkit blocksize1.pid pid1 myarg1
> 
> start_nbdkit()
> {
>   pidfile=$1
>   pidvar=$2
>   shift 2
>   ...
>   pid=$(cat $pidfile)
>   eval $pidvar=$pid

It was nice that start_nbdkit has the same "signature" as
a regular nbdkit command.

> which would also have the added benefit of avoiding the regression
> of pid1 not being set if the second nbdkit process doesn't start?
> 
> But overall, I like the idea of refactoring the common code for less
> repetition.

I'll have a think about the trap function stuff, but most
likely tomorrow.

Thanks!

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW