[libvirt] Quantifying libvirt errors in launching the libguestfs appliance

Martin Kletzander mkletzan at redhat.com
Wed Jan 13 15:25:14 UTC 2016


On Wed, Jan 13, 2016 at 10:18:42AM +0000, Richard W.M. Jones wrote:
>As people may know, we frequently encounter errors caused by libvirt
>when running the libguestfs appliance.
>
>I wanted to find out exactly how frequently these happen and classify
>the errors, so I ran the 'virt-df' tool overnight 1700 times.  This
>tool runs several parallel qemu:///session libvirt connections both
>creating a short-lived appliance guest.
>
>Note that I have added Cole's patch to fix https://bugzilla.redhat.com/1271183
>"XML-RPC error : Cannot write data: Transport endpoint is not connected"
>
>Results:
>
>The test failed 538 times (32% of the time), which is pretty dismal.
>To be fair, virt-df is aggressive about how it launches parallel
>libvirt connections.  Most other virt-* tools use only a single
>libvirt connection and are consequently more reliable.
>
>Of the failures, 518 (96%) were of the form:
>
>  process exited while connecting to monitor: qemu: could not load kernel '/home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/kernel': Permission denied
>
>which is https://bugzilla.redhat.com/921135 or maybe
>https://bugzilla.redhat.com/1269975.  It's not clear to me if these
>bugs have different causes, but if they do then potentially we're
>seeing a mix of both since my test has no way to distinguish them.
>

It looks to me as the same problem.  And as the same problem we were
talking about bunch of time and, apparently, didn't get to a conclusion.

For each of the kernels, libvirt labels them (with both DAC and selinux
labels), then proceeds to launching qemu.  If this is done parallel, the
race is pretty obvious.  Could you remind me why you couldn't use
<seclabel model='none'/> or <seclabel relabel='no'/> or something that
would mitigate this?  If we cannot use this, then we need to implement
the <seclabel/> element for kernel and initrd.

>19 of the failures (4%) were of the form:
>
>  process exited while connecting to monitor: fread() failed
>
>which I believe is a previously unknown bug.  I have filed it as
>https://bugzilla.redhat.com/1298122
>

I think even this one might be the case, maybe selinux stops qemu from
reading the kernel/initrd.

>Finally there was 1 failure:
>
>  Unable to read from monitor: Connection reset by peer
>
>which I believe is also a new bug.  I have filed it as
>https://bugzilla.redhat.com/1298124
>

This, I believe, means QEMU exited (as in the previous one), just at
different point in time.

>I would be good if libvirt could routinely test the case of multiple
>parallel launches of qemu:///session, since it still contains bugs
>even after Cole's fixes.
>
>Rich.
>
>--
>Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
>Read my programming and virtualization blog: http://rwmj.wordpress.com
>virt-top is 'top' for virtual machines.  Tiny program with many
>powerful monitoring features, net stats, disk stats, logging, etc.
>http://people.redhat.com/~rjones/virt-top
>
>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20160113/34e153b8/attachment-0001.sig>


More information about the libvir-list mailing list