[libvirt] [PATCH] RFC: Support QEMU live uprgade

Wed Nov 13 13:14:51 UTC 2013

On Tue, Nov 12, 2013 at 09:54:44PM -0700, Eric Blake wrote:
> On 11/12/2013 05:14 AM, Zheng Sheng ZS Zhou wrote:
> >>From 2b659584f2cbe676c843ddeaf198c9a8368ff0ff Mon Sep 17 00:00:00 2001
> > From: Zhou Zheng Sheng <zhshzhou at linux.vnet.ibm.com>
> > Date: Wed, 30 Oct 2013 15:36:49 +0800
> > Subject: [PATCH] RFC: Support QEMU live uprgade
> > 
> > This patch is to support upgrading QEMU version without restarting the
> > virtual machine.
> > 
> > Add new API virDomainQemuLiveUpgrade(), and a new virsh command
> > qemu-live-upgrade. virDomainQemuLiveUpgrade() migrates a running VM to
> > the same host as a new VM with new name and new UUID. Then it shutdown
> > the original VM and drop the new VM definition without shutdown the QEMU
> > process of the new VM. At last it attaches original VM to the new QEMU
> > process.
> > 
> > Firstly the admin installs new QEMU package, then he runs
> >   virsh qemu-live-upgrade domain_name
> > to trigger our virDomainQemuLiveUpgrade() upgrading flow.
> 
> In general, I agree that we need a new API (in fact, I think I helped
> suggest why we need it as opposed to reusing existing migration API,
> precisely for some of the deadlock reasons you called out in your reply
> to Dan).  But the new API should still reuse as much of the existing
> migration code as possible (refactor that to be reusable, rather than
> bulk copying into completely new code).  High-level review below (I
> didn't test whether things work or look for details like memory leaks,
> so much as a first impression of style problems and even some major
> design problems).

I really don't like the idea of adding a new API for this - IMHO we
need to address the deadlock scenario and fit this into our existing
migration APIs. In particular calling this "live upgrades" is wrong,
as that is just a specific use case. Functionally this is "localhost
migration" and so belongs in the migration APIs.

As mentioned in my other message, I believe the deadlock scenario
mentioned could even occurr in non-localhost migration, if two
libvirtds were doing migrating concurrent migrations in opposite
directions. So this seems like something we need to look at fixing
somehow. Perhaps it needs a dedicated thread pool, or spawn on
demand thread, just for doing the specific migration RPC call that
could deadlock, so we can guarantee we can always succeed in it ?

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|