[libvirt] Loosing lxc guests when restarting libvirt

Guido Günther agx at sigxcpu.org
Sat Dec 24 23:21:18 UTC 2016


On Sat, Dec 24, 2016 at 05:14:44PM +0100, Guido Günther wrote:
> Hi Cedric,x
> On Wed, Dec 21, 2016 at 02:36:39PM +0100, Cedric Bosdonnat wrote:
> > Hey Christian,
> > 
> > On Tue, 2016-12-20 at 12:29 +0100, Christian Ehrhardt wrote:
> > > Hi,
> > > I found an issue in libvirt related to libvirt-lxc, but fail to find the root cause.
> > > 
> > > The TL;DR is: libvirt-lxc guests get killed on libvirt restart due to "internal error: No valid cgroup for machine"
> > > 
> > > It was able to reproduce libvirt 1.3.1, 2.4 and 2.5 as packages in Ubuntu and Debian.
> > > I wanted to ask for two things:
> > > - wider coverage where this does reproduce
> > 
> > I couldn't reproduce here with openSUSE Tumbleweed and libvirt 2.5 packages.
> 
> I had a short look and it seems like this sequence is killing all running
> libvirt-lxc guests reliably:
> 
>   # no lxc guest running yet
>   export LIBVIRT_DEFAULT_URI=lxc:///
>   DOMAIN=sl
>   systemctl daemon-reload
> 
>   # start lxc guest
>   virsh start ${DOMAIN}
>   sleep 1  # give vm some time to start
>   systemctl restart libvirtd

Using ftrae I can see that systemd moves the process into the wrong
cgroup on start:

systemd-1     [000] ....   652.333068: cgroup_attach_task: dst_root=3 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1     [000] ....   652.333117: cgroup_attach_task: dst_root=3 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1     [000] ....   652.333160: cgroup_attach_task: dst_root=6 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1     [000] ....   652.333203: cgroup_attach_task: dst_root=4 dst_id=107 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1     [000] ....   652.333245: cgroup_attach_task: dst_root=8 dst_id=80 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1     [000] ....   652.333286: cgroup_attach_task: dst_root=7 dst_id=84 dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc

I've attached the script to reproduce this and would be happy about
ideas of the root cause.
Cheers,
 -- Guido
-------------- next part --------------
#!/bin/bash
set -e

export LIBVIRT_DEFAULT_URI=lxc:///
DOMAIN=sl

function cleanup () {
  set +x
  echo "Running cleanup"
  echo 0 > /sys/kernel/debug/tracing/events/cgroup/enable
  virsh -c lxc:/// destroy sl || true
  if [ -n "$SUCCESS" ]; then
    echo "Finished succesfully"
  else
    echo "Got an error."
  fi
}

trap cleanup exit

cat <<EOF >dom.xml
<domain type='lxc'>
  <name>sl</name>
  <memory unit='KiB'>256000</memory>
  <currentMemory unit='KiB'>256000</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <os>
    <type>exe</type>
    <init>/bin/bash</init>
  </os>
  <features>
    <privnet/>
  </features>
  <clock offset='utc'/>
  <devices>
    <filesystem type='mount' accessmode='passthrough'>
      <source dir='/'/>
      <target dir='/'/>
    </filesystem>
    <console type='pty'>
      <target type='lxc' port='0'/>
    </console>
  </devices>
</domain>
EOF
virsh define dom.xml || true

echo 1 > /sys/kernel/debug/tracing/events/cgroup/enable
# Restart systemd, this triggers the problem
echo "systemctl deamon-reload start" > /sys/kernel/debug/tracing/trace_marker
systemctl daemon-reload
echo "systemctl deamon-reload finished" > /sys/kernel/debug/tracing/trace_marker

set -x
# Start the lxc container
echo "virsh start ${DOMAIN} start" > /sys/kernel/debug/tracing/trace_marker
virsh start ${DOMAIN}
echo "virsh start ${DOMAIN} finished" > /sys/kernel/debug/tracing/trace_marker

virsh list
PID=$(virsh -c lxc:/// list --state-running | sed -ne 's/ \([0-9]\+\) .*/\1/p')
WATCH=/proc/$PID/cgroup
echo "Before ${WATCH}"
cat ${WATCH}
sleep 1

# Restart libvirtd
echo "sysemctl stop libvirtd start" > /sys/kernel/debug/tracing/trace_marker
systemctl stop libvirtd
echo "sysemctl stop libvirtd finished" > /sys/kernel/debug/tracing/trace_marker

echo "sysemctl start libvirtd start" > /sys/kernel/debug/tracing/trace_marker
systemctl start libvirtd
echo "sysemctl start libvirtd finished" > /sys/kernel/debug/tracing/trace_marker

# Check if container is still there
echo "After"
cat ${WATCH}
if ! virsh list | grep -qs "${DOMAIN}[[:space:]]\+running"; then
  echo 'Domain disappeared!'
  exit 1
fi
SUCCESS=1


More information about the libvir-list mailing list