[libvirt] [PATCH] RFC: Support QEMU live uprgade

Wed Nov 13 13:10:31 UTC 2013

On Wed, Nov 13, 2013 at 12:15:30PM +0800, Zheng Sheng ZS Zhou wrote:
> Hi Daniel,
> 
> on 2013/11/12/ 20:23, Daniel P. Berrange wrote:> On Tue, Nov 12, 2013 at 08:14:11PM +0800, Zheng Sheng ZS Zhou wrote:
> >> Hi all,
> >>
> >> Recently QEMU developers are working on a feature to allow upgrading
> >> a live QEMU instance to a new version without restarting the VM. This
> >> is implemented as live migration between the old and new QEMU process
> >> on the same host [1]. Here is the the use case:
> >>
> >> 1) Guests are running QEMU release 1.6.1.
> >> 2) Admin installs QEMU release 1.6.2 via RPM or deb.
> >> 3) Admin starts a new VM using the updated QEMU binary, and asks the old
> >> QEMU process to migrate the VM to the newly started VM.
> >>
> >> I think it will be very useful to support QEMU live upgrade in libvirt.
> >> After some investigations, I found migrating to the same host breaks
> >> the current migration code. I'd like to propose a new work flow for
> >> QEMU live migration. It is to implement the above step 3).
> > 
> > How does it break migration code ? Your patch below is effectively
> > re-implementing the multistep migration workflow, leaving out many
> > important features (seemless reconnect to SPICE clients for example)
> > which is really bad for our ongoing code support burden, so not
> > something I want to see.
> > 
> > Daniel
> > 
> 
> Actually I wrote another hacking patch to investigate how we
> can re-use existing framework to do local migration. I found
> the following problems.
> 
> (1) When migrate to different host, the destination domain uses
> the same UUID and name as the source, and this is OK. When migrate
> to localhost, destination domain UUID and name causes conflict
> with the source. In QEMU driver, it maintains a hash table of
> domain objects, the reference key is the UUID of the virtual
> machine. The closeCallbacks is also a hash table with domain
> UUID as key, and maybe there are other data structures using
> UUID as key. This implies we use a different name and UUID
> for the destination domain. In the migration framework, during
> the Begin and Prepare stage, it calls virDomainDefCheckABIStability
> to prevent us using a different UUID, and it also checks the
> hostname and host UUID to be different. If we want to enable
> local migration, we have to skip these check and generate new
> UUID and name for destination domain. Of course we restore the
> original UUID after migration. UUID is used in higher level
> management software to identify virtual machines. It should
> stay the same after QEMU live upgrade.

This point is something that needs to be solved regardless of
whether using migration framework, or re-inventing the migration
framework. The QEMU driver fundamentally assumes that there is
only ever one single VM with a given UUID, and a VM has only
1 process. IMHO name + uuid must be preserved during any live
upgrade process, otherwise mgmt will get confused. This has
more problems becasue 'name' is used for various resources
created by QEMU on disk - eg the monitor command path. We can't
have 2 QEMUs using the same name, but at the same time that's
exactly what we'd need here.

> (2) If I understand the code correctly, libvirt uses thread
> pool to handle RPC requests. This means local migration may
> cause deadlock in P2P migration mode. Suppose there are some
> concurrent local migration requests and all the worker threads
> are occupied by these requests. When source libvirtd connects
> destination libvirtd on the same host to negotiate the migration,
> the negotiation request is queued, but the negotiation request
> will never be handled, because the original migration request
> from client is waiting for the negotiation request to finish
> to progress, while the negotiation request is queued waiting
> for the original request to end. This is one of the dealock
> risk I can think of.
> I guess in traditional migration mode, in which the client
> opens two connections to source and destination libvirtd,
> there is also risk to cause deadlock.

Yes, it sounds like you could get deadlock even with 2 separate
libvirtds, if both them were migrating to the other concurrently.

> (3) Libvirt supports Unix domain socket transport, but
> this is only used in a tunnelled migration. For native
> migration, it only supports TCP. We need to enable Unix
> domain socket transport in native migration. Now we already
> have a hypervisor migration URI argument in the migration
> API, but there is no support for parsing and verifying a
> "unix:/full/path" URI and passing that URI transparently
> to QEMU. We can add this to current migration framework
> but direct Unix socket transport looks meaningless for
> normal migration.

Actually as far as QEMU is concerned libvirt uses fd: migration
only. Again though this points seems pretty much unrelated to
the question of how we design the APIs & structure the code.

> (4) When migration fails, the source domain is resumed, and
> this may not work if we enable page-flipping in QEMU. With
> page-flipping enabled, QEMU transfers memory page ownership
> to the destination QEMU, so the source virtual machine
> should be restarted but not resumed when the migration fails.

IMHO that is not an acceptable approach. The whole point of doing
live upgrades in place, is that you consider the VMs to be
"precious". If you were OK with VMs being killed & restarted then
we'd not bother doing any of this live upgrade pain at all.

So if we're going to support live upgrades, we *must* be able to
guarantee that they will either succeed, or the existing QEMU is
left intact.  Killing the VM and restarting is not an option on
failure.

> So I propose a new and compact work flow dedicated for QEMU
> live upgrade. After all, it's an upgrade operation based on
> tricky migration. When developing the previous RFC patch for
> the new API, I focused on the correctness of the work flow,
> so many other things are missing. I think I can add things
> like Spice seamless migration when I submitting new versions.

This way lies madness. We do not want 2 impls of the internal
migration framework.

> I am also really happy if you could give me some advice to
> re-use the migration framework. Re-using the current framework
> can saves a lot of effort.

I consider using the internal migration framework a mandatory
requirement here, even if the public API is different.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|