[Libguestfs] [PATCH v2 00/11] rvh-upload: Various fixes and cleanups

Martin Kletzander mkletzan at redhat.com
Tue Nov 19 14:24:35 UTC 2019


On Tue, Nov 19, 2019 at 02:14:32PM +0000, Richard W.M. Jones wrote:
>On Tue, Nov 19, 2019 at 02:36:36PM +0100, Martin Kletzander wrote:
>> nbdkit: python[1]: error: /var/tmp/rhvupload.jngN1W/rhv-upload-plugin.py: close: error: ['Traceback (most recent call last):\n', '  File "/var/tmp/rhvupload.jngN1W/rhv-upload-plugin.py", line 362, in close\n', "FileNotFoundError: [Errno 2] No such file or directory: '/var/tmp/rhvupload.jngN1W/diskid.0'\n"]
>> nbdkit: debug: python: unload plugin
>>
>> So it might be because virt-v2v already removed that directory and
>> did not wait for nbdkit to completely end.  I'm testing with older
>> commit of virt-v2v now.
>
>This is very likely.
>
>Shutdown on error is complicated.  Virt-v2v starts one or more nbdkit
>processes in the background and then simply runs “qemu-img convert”.
>If nbdkit notices an error then it returns an error over NBD to
>qemu-img.  If qemu-img exits with an error then virt-v2v exits.
>
>Before virt-v2v exits, it runs any exit handlers.  In particular if
>you're using OCaml's at_exit, C's atexit(3) or wrappers like
>Tools_utils.unlink_on_exit or Tools_utils.rmdir_on_exit, then those
>have already run before nbdkit starts to shut down.
>
>Nbdkit should receive a signal from the kernel when its parent process
>(virt-v2v) goes away, because we're using prctl + PR_SET_PDEATHSIG +
>SIGTERM (via ‘nbdkit --exit-with-parent’).  Note this happens *after*
>virt-v2v has fully exited.
>
>Hence what I say about the above being likely, since
>rmdir_on_exit "/var/tmp/rhvupload.XXXXXX" is being called from
>virt-v2v on exit:
>
>  https://github.com/libguestfs/virt-v2v/blob/b8b9dcc90dbd91aec4b6bb82dd511d453f77aab9/v2v/output_rhv_upload.ml#L104
>
>To further complicate things, in nbdkit < 1.16 the shutdown path from
>a signal was pretty racy.  nbdkit 1.16 attempts to fix the shutdown
>path so that we now properly wait for all threads to exit before
>exiting nbdkit.  The upstream commit is:
>
>  https://github.com/libguestfs/nbdkit/commit/07806d6d5511bb5da2dfae2bf0009a5edd992f3a
>
>nbdkit 1.16 is available in Fedora 31+ and RHEL 8.2 AV (out of brew at
>the moment), and while it probably won't make any difference here, if
>possible you should upgrade to it.  It's fully backwards compatible.
>
>Oh and finally if we're running in a systemd unit, then systemd might
>try to kill everything when virt-v2v exits (but before nbdkit exits)
>and it's anyone's guess what happens then.  Good luck!  Probably best
>to try to make the code as bulletproof as possible so it doesn't
>depend on clean ups always running correctly.
>

I am running nbdkit from current master there, so that should be fine.  But
since it is ran by virt-v2v-wrapper on a fedora VM inside oVirt, it is running
under systemd unit.

I should say this is not the main issue, it's just something that happens on a
clean-up path after another error has happened.

>Rich.
>
>-- 
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>Fedora Windows cross-compiler. Compile Windows programs, test, and
>build Windows installers. Over 100 libraries supported.
>http://fedoraproject.org/wiki/MinGW
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20191119/c8c759d0/attachment.sig>


More information about the Libguestfs mailing list