libvirt segfaults with "internal,error: Missing monitor reply object", during block live-migration

Alex Walender awalende at cebitec.uni-bielefeld.de
Thu Jul 30 14:13:09 UTC 2020


Dear libvirt community,


Using recent Ubuntu Stein Cloud Packages, we are observing random
libvirtd live-migration crashes on the target host.
Libvirt is having a SEGFAULT with the qemu driver. Transferring block
devices usually works without issues.
However, the following memory transfer is causing the target libvirtd
randomly to close down its socket, resulting in a roll-backed migration
process.I can reproduce this with large VMs, which have a large memory pool.

The last error message we see in libvirt logs is:
error : qemuMonitorJSONCommandWithFd:315 : internal error: Missing
monitor reply object

With this, libvirt segfaults and restarts.
Before we encountered this issue, we used an older nova-compute package
(19.0.3).
Not sure if this made a difference with usage of libvirtd-api.
After upgrade, we also see a lot of recurring errors during migration:

warning : qemuDomainObjBeginJobInternal:7044 : Cannot start job (query,
none, none) for domain instance-00008f56; current job is (none, none,
migration in) owned by (0 <null>, 0 <null>, 0
remoteDispatchDomainMigratePrepare3Params (flags=0x809b)) for (0s, 0s,
14834s)
error : qemuDomainObjBeginJobInternal:7066 : Timed out during operation:
cannot acquire state change lock (held by
monitor=remoteDispatchDomainMigratePrepare3Params)

They don't abort the running migration process, but spam every minute to
the systemd journal.

Source and destination run the same packages:

Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-99-generic x86_64)
OpenStack Stein (Ubuntu Cloud Archive)
Libvirt+QEMU_x86
keystone-common 2:15.0.1-0ubuntu1~cloud0
libvirt-daemon 5.0.0-1ubuntu2.6~cloud0
qemu-system-x86 1:3.1+dfsg-2ubuntu3.7~cloud0
neutron-linuxbridge-agent 2:14.2.0-0ubuntu1~cloud0
neutron-plugin-ml2 2:14.2.0-0ubuntu1~cloud0
nova-compute 2:19.2.0-0ubuntu1~cloud0
nova-compute-libvirt 2:19.2.0-0ubuntu1~cloud0

I have attached source/destination debug logs from libvirtd and
nova-compute here:

https://denzelx.ddns.net/index.php/s/KPJ7vv4aTcb69XD

Any help would be nice!


Best Regards

-- 
M.Sc Alex Walender
de.NBI Cloud Bielefeld Administrator
Center for Biotechnology (CeBiTec)

University of Bielefeld
33594 Bielefeld
Germany
room: M3-118
phone: +49 (521) 106 2907

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20200730/74529c4e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20200730/74529c4e/attachment.sig>


More information about the libvirt-users mailing list