[libvirt] Loosing lxc guests when restarting libvirt

Daniel P. Berrange berrange at redhat.com
Thu Jan 5 10:46:53 UTC 2017


On Sun, Dec 25, 2016 at 12:21:18AM +0100, Guido Günther wrote:
> On Sat, Dec 24, 2016 at 05:14:44PM +0100, Guido Günther wrote:
> > Hi Cedric,x
> > On Wed, Dec 21, 2016 at 02:36:39PM +0100, Cedric Bosdonnat wrote:
> > > Hey Christian,
> > > 
> > > On Tue, 2016-12-20 at 12:29 +0100, Christian Ehrhardt wrote:
> > > > Hi,
> > > > I found an issue in libvirt related to libvirt-lxc, but fail to find the root cause.
> > > > 
> > > > The TL;DR is: libvirt-lxc guests get killed on libvirt restart due to "internal error: No valid cgroup for machine"
> > > > 
> > > > It was able to reproduce libvirt 1.3.1, 2.4 and 2.5 as packages in Ubuntu and Debian.
> > > > I wanted to ask for two things:
> > > > - wider coverage where this does reproduce
> > > 
> > > I couldn't reproduce here with openSUSE Tumbleweed and libvirt 2.5 packages.
> > 
> > I had a short look and it seems like this sequence is killing all running
> > libvirt-lxc guests reliably:
> > 
> >   # no lxc guest running yet
> >   export LIBVIRT_DEFAULT_URI=lxc:///
> >   DOMAIN=sl
> >   systemctl daemon-reload
> > 
> >   # start lxc guest
> >   virsh start ${DOMAIN}
> >   sleep 1  # give vm some time to start
> >   systemctl restart libvirtd
> 
> Using ftrae I can see that systemd moves the process into the wrong
> cgroup on start:
> 
> systemd-1     [000] ....   652.333068: cgroup_attach_task: dst_root=3 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
> systemd-1     [000] ....   652.333117: cgroup_attach_task: dst_root=3 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
> systemd-1     [000] ....   652.333160: cgroup_attach_task: dst_root=6 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
> systemd-1     [000] ....   652.333203: cgroup_attach_task: dst_root=4 dst_id=107 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
> systemd-1     [000] ....   652.333245: cgroup_attach_task: dst_root=8 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
> systemd-1     [000] ....   652.333286: cgroup_attach_task: dst_root=7 dst_id=84 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
> 
> I've attached the script to reproduce this and would be happy about
> ideas of the root cause.

Ok, so when libvirt starts an LXC guest, it creates a machine slice with
system to hold the container processes. The machine slice has the container
PID 1 as its leader, but libvirt also adds the libvirt_lxc controller and
and any qemu-nbd processes to the cgroups assoicated with this machine
slice..... except it only does this for resource cgroups its using and
does *not* do this for the systemd cgroup.

So if you query libvirtd.service status, it'll show libvirt_lxc being
associated with that, instead of the machine slice

# systemctl status libvirtd.service
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-01-05 10:38:02 GMT; 10s ago
     Docs: man:libvirtd(8)
           http://libvirt.org
 Main PID: 6723 (libvirtd)
    Tasks: 20 (limit: 4915)
   CGroup: /system.slice/libvirtd.service
           ├─1547 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ├─1548 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ├─6723 /usr/sbin/libvirtd --listen
           └─6888 /usr/libexec/libvirt_lxc --name sl --console 25 --security=selinux --handshake 28


# systemctl status machine-lxc\\x2d6888\\x2dsl.scope 
● machine-lxc\x2d6888\x2dsl.scope - Container lxc-6888-sl
   Loaded: loaded (/run/systemd/transient/machine-lxc\x2d6888\x2dsl.scope; transient; vendor preset: disabled)
Transient: yes
   Active: active (running) since Thu 2017-01-05 10:38:04 GMT; 13s ago
    Tasks: 1 (limit: 16384)
   Memory: 812.0K
      CPU: 25ms
   CGroup: /machine.slice/machine-lxc\x2d6888\x2dsl.scope
           └─6889 /bin/bash



Now, when you do a restart of libvirtd.service, systemd will ensure that all
the processes associated with that service are in the right cgroups, moving
them if needed. systemd only refreshes its view of cgroup placement when
you do a daemon-reload. Hence it only notices that libvirt moved libvirt_lxc
after doing a daemon-reload. Anyway, systemd moves libvirt_lxc back into
the cgroups associated with libvirtd.service.

I think to fix this, we will need to ensure that we move libvirt_lxc into
the machine slice for the systemd cgroup controller too.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|




More information about the libvir-list mailing list