[libvirt] KVM processes -- should we be able to attach them to the libvirtd process?

Wed May 6 08:17:43 UTC 2009

On Tue, May 05, 2009 at 11:38:13PM -0500, Matthew Farrellee wrote:
> Daniel P. Berrange wrote:
> > On Tue, May 05, 2009 at 04:13:38PM -0400, Hugh O. Brock wrote:
> >> Not too long ago we took a patch that allowed QEMU VMs to keep running
> >> even if libvirtd died or was restarted.
> >>
> >> I was talking to Matt Farrellee (cc'd) this afternoon about
> >> manageability, and he feels fairly strongly that this behavior should be
> >> optional -- in other words, it should be possible to guarantee that if
> >> libvirtd dies, it will take all the VMs with the "die-with-libvirtd"
> >> flag set down with it.
> >>
> >> I'm not sure this API is portable to Xen, but it would work on any
> >> hypervisor that represents the VM as a normal process.
> >>
> >> Does this strike anyone else as useful behavior?
> > 
> > This isn't really a model we want in the architecture. That the QEMU
> > instances used to die when libvirtd died was an unfortunate artifact
> > of the fact that QEMU was the parent process leader. These days all VMs
> > are fully daemonized, so there is no parent/child relationship. In fact
> > QEMU was really the odd-ball in this respect, because with Xen/OpenVZ/LXC
> > and VirtualBox, VMs have always happily continued when libvirtd stopped
> > or died, as do storage pools and virtual networks.
> > 
> > This is important because it ensures we can automatically restart the
> > libvirtd daemon during RPM upgrades, and provides robustness should a
> > bug cause the daemon to crash - the daemon can be trivially restarted
> > and continue with no interruption to services being managed. 
> > 
>
> It doesn't appear to be the case that the libvirtd daemon can trivially
> restart and continue with no interruptions. Right now it loses track of VMs.

That a is a bug then, if you can reproduce it, please file a BZ ticket
so we can track it down & fix it.

> In a scenario where VMs are not deployed and locked to specific physical
> nodes, it can be highly valuable to have ways to ensure a VM is no
> longer running when a layer of its management stops functioning.

IMHO this is a problem to be solved by clustering software. If the
clustering software detects a failure with the management service,
then it should power fence the entire node. Relying on management
service failure to kill the VMs will never be reliable enough.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|