[libvirt] RFC: libvirt support for QEMU live patching

Mon Sep 18 09:03:34 UTC 2017

On Mon, Sep 18, 2017 at 09:37:14AM +0200, Martin Kletzander wrote:
> On Fri, Sep 15, 2017 at 09:18:18AM +0100, Daniel P. Berrange wrote:
> > On Fri, Sep 15, 2017 at 01:27:31PM +0530, Madhu Pavan wrote:
> > > Hi,
> > > QEMU live patching should be just a matter of updating the QEMU RPM package
> > > and then live migrating the VMs to another QEMU instance on the same host
> > > (which would point to the just installed new QEMU executable).
> > > I think it will be useful to support it from libvirt side. After some
> > > searching I found a
> > > RFC patch posted in Nov 2013. Here is the link to it
> > > https://www.redhat.com/archives/libvir-list/2013-November/msg00372.html
> > > Approach followed in above mentioned link is as follows:
> > > 1. newDef = deep copy oldVm definition
> > > 2. newVm = create VM using newDef, start QEMU process with all vCPUs paused
> > > 3. oldVm migrate to newVm using unix socket
> > > 4. shutdown oldVm
> > > 5. newPid = newVm->pid
> > > 6. finalDef = live deep copy of newVm definition
> > > 7. Drop the newVm from qemu domain table without shutting down QEMU process
> > > 8. Assign finalDef to oldVm
> > > 9. oldVm attaches to QEMU process newPid using finalDef
> > > 10.resume all vCPUs in oldVm
> > > 
> > > I can see it didn't get communities approval for having problems in handling
> > > UUID
> > > of the vm's. To fix the problem we need to teach libvirt to manage two qemu
> > > processes
> > > at once both tied to same UUID. I would like to know if there is any
> > > interested approach
> > > to get this done. I would like to send patches on this.
> > > 
> > > Is there any specific reason why it is not been pursued for the last 4 year?
> > 
> > It isn't possible to make it work correctly in the general case, because
> > both QEMU processes want to own the same files on disk. eg both might want
> > to listen on a UNIX socket /foo/bar, but only one can do this. If you let
> > the new QEMU delete the original QEMU's sockets, then you either break or
> > delay incoming connections during the migration time, and you make it
> > impossible to roll back on failure, or both. This kind of thing is not
> > something that's acceptable for the usage scenerio described, which would
> > need to be bulletproof to be usable in production.
> > 
> 
> Can't we utilize namespaces for this?  Lot of the things could be
> separated, so we could fire up a new VM that's "containerized" like
> this, migrate to it and then fire up a new one and migrate back.  If the
> first migration fails then we can still fallback.  If it does not, then
> the second one "should" not either.

As well as increasing complexity and thus chances of failure, this also
doubles the performance impact on the guest.

More generally I think the high level idea of in-place live-upgrades for
guests on a host is flawed.  Even if we had the ability to in-place upgrade
QEMU for running guest, there are always going to be cases where a security
upgrade requires you to reboot the host system - kernel/glibc flaw, or fixes
to other servicves that can't be restarted eg dbus. Any person hosting VMs
will always always need to be able to handle migrating VMs to a *separate*
host to deal with such security updates. For added complexity, determinining
which security upgrades could be done by restarting QEMU vs which need to have
a full host restart is not trivial, unless intimately familiar with the software
stack.

So from an administrative / operational POV, defining two separate procedures
for dealing with upgrades is unwise. You have the chance of picking the wrong
one and leaving a security fix accidentally unpatched, or if you take one path
95% of the time and the other path 5% of the time, chances are you're going
to screw up the less used path due to lack of practice. It makes more sense
to simplify the operational protocols by standardizing of cross-host migration,
followed by host reboot, for applying patches. It simplifies the decision
matrix removing complexity, and ensures you are well praticed following the
same approach every time. The only cost is having 1 spare server against
which to perform the migrations, but if your VMs are at all important, you
will have spare hardware already to deal with unforseen hardware problems.
If you absolutely can't afford spare hardware, you could just run your
entire "host" OS in a container, and then spin up a second "host" OS on
the same machine and migrate into that. From libvirt's POV this is still
cross-host migration, so needs no special case.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|