libvirt-6.5.0 breaks host passthrough migration
jdenemar at redhat.com
Tue Jul 14 08:31:03 UTC 2020
On Mon, Jul 13, 2020 at 14:04:25 +0200, Jiri Denemark wrote:
> On Sat, Jul 11, 2020 at 13:44:19 -0400, Mark Mielke wrote:
> > On Sat, Jul 11, 2020 at 6:04 AM Mark Mielke <mark.mielke at gmail.com> wrote:
> > > On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke <mark.mielke at gmail.com> wrote:
> > >
> > >> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark <jdenemar at redhat.com>
> > >> wrote:
> > >>
> > >>> The implementation seems to be doing exactly what the commit message
> > >>>
> > >> says. The migratable=off default should be used only when QEMU does not
> > >>> support -cpu host,migratable=on|off, that is only when QEMU is very old.
> > >>> Every non-ancient version of libvirt should have the
> > >>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
> > >>> migrateble=on default.
> > >>>
> > >> QEMU_CAPS_CPU_MIGRATABLE only from the <cpu> element? If so, doesn't this
> > >> mean that it is not explicitly listed for host-passthrough, and this means
> > >> the check is not detecting whether it is enabled or not properly?
> > >>
> > > Trying to understand what is going on more - I see "migratable" seems to
> > > be ok when launching a new machine, but the failure scenario was live
> > > migration from 6.4.0 to 6.5.0.
> > >
> > > Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0,
> > > and live migration grabs the capabilities from the source, where the
> > > absence of this capability makes it presume an older Qemu in the above code?
> > >
> > Sorry all - I am having trouble reproducing now. The expected use cases are
> > now working.
> > Is it possible that the "migratable" flag might have been missing on some
> > of the instances, although migration worked fine, and despite having used
> > Qemu 4.2 and Qemu 5.0?
> When an updated libvirtd which knows about this new capability starts,
> it would reprobe all QEMU capabilities (lazily, i.e., once they are
> needed). However, if there is a running domain, libvirt will use cached
> capabilities probed when the domain was started. I suspect migrating
> such domain could be a problem. I'll try to reproduce locally.
OK, I did not reproduce the failure, because migratable=off doesn't
enable anything more than migratable=on (likely because L1 VM in my
nested environment does not have any non-migratable features enabled).
But I was able to reproduce the issue itself and the migration could
clearly fail if migratable=off enabled some non-migratable features. The
reproducer is actually easy and one doesn't even need to migrate to see
libvirt did something wrong:
1. run libvirtd older then 6.5.0
2. start a domain with host-passthrough CPU (QEMU would default to
3. upgrade libvirt to 6.5.0 and restart libvirtd
4. virsh dumpxml $DOMAIN_STARTED_IN_STEP_2
Now you would see
<cpu mode='host-passthrough' check='none' migratable='off'/>
which differs from the default used by QEMU. Migrating such domain would
succeed anyway, because it was actually started with migratable='on'.
But when such domain is migrated to libvirt 6.5.0, we would honor the
migratable attribute and start QEMU with -cpu host,migratable=off which
could cause failures when trying to migrate this domain again.
The problem is exactly where I was afraid it could be. When libvirtd
starts, it reads the QEMU capabilities probed by older libvirt
(QEMU_CAPS_CPU_MIGRATABLE would be off) and wrongly updates the XML of
the running domain. I'll prepare a patch to fix this.
More information about the libvir-list