[PATCH 1/1] vircgroup: Cleanup nested cgroups

Eric Farman farman at linux.ibm.com
Thu Apr 8 03:28:48 UTC 2021



On 4/7/21 9:07 AM, Pavel Hrdina wrote:
> On Fri, Mar 26, 2021 at 05:25:03PM +0100, Eric Farman wrote:
>> The introduction of nested cgroups used a little macro
>> virCgroupGetNested() to retrieve the nested cgroup
>> pointer, if one exists. But this macro isn't used when
>> removing cgroups, resulting in some messages:
>>
>>    Mar 25 20:55:17 fedora33 libvirtd[955]: unable to open '/sys/fs/cgroup/machine.slice/machine-qemu\x2d1\x2dguest.scope/': No such file or directory
>>    Mar 25 20:55:17 fedora33 libvirtd[955]: Failed to remove cgroup for guest
>>
>> That directory exists while the guest is running, as it
>> was created by systemd/machined, so the code probably meant
>> to open the libvirt/ subdirectory from that point.
>>
>> Similarly, there happen to be BPF-related file descriptors
>> that don't get cleaned up in this process too, because they
>> are anchored off the nested cgroup location:
>>
>>    [test at fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
>>    35
>>    [test at fedora33 ~]# virsh create guest.xml
>>    Domain 'guest' created from guest.xml
>>
>>    [test at fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
>>    42
>>    [test at fedora33 ~]# virsh shutdown guest
>>    Domain 'guest' is being shutdown
>>
>>    [test at fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
>>    37
>>    [test at fedora33 ~]# virsh create guest.xml
>>    Domain 'guest' created from guest.xml
>>
>>    [test at fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
>>    44
>>    [test at fedora33 ~]# virsh shutdown guest
>>    Domain 'guest' is being shutdown
>>
>>    [test at fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
>>    39
>>
>> Let's fix this by using the same macro when removing cgroups,
>> so that it picks up the right structure and can remove the
>> associated resources properly.
>>
>> Fixes: 184245f53b94 ("vircgroup: introduce nested cgroup to properly work with systemd")
>> Signed-off-by: Eric Farman <farman at linux.ibm.com>
>> ---
>>   src/util/vircgroup.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> I don't thing this patch is correct. With systemd we would get the same
> error without the nested cgroup as well. It's because we terminate the
> qemu process which makes systemd remove the VM root cgroup as well.

I don't experience any problems reverting the blamed patch. The qemu 
cgroup code doesn't make any distinction about systemd or not; it just 
calls the virCgroupRemove() to clean up the resources that were set up 
here during init:

qemuInitCgroup()
   virCgroupNewMachine()
     virCgroupNewMachineSystemd()
       virCgroupNewNested()

The group pointer that's stashed in qemu's struct is that of the 
machine-qemu...scope group, rather than the nested group, but nothing in 
the cleanup path touches group->nested. My initial patch is certainly 
flawed (as you explain below), so maybe something like this is better?

@@ -2615,6 +2615,9 @@ virCgroupRemove(virCgroupPtr group)
  {
      size_t i;

+    if (group->nested)
+       virCgroupRemove(group->nested);
+
      for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
          if (group->backends[i]) {
              int rc = group->backends[i]->remove(group);

Not great, since it cleans up the nested group but then still attempts 
to clean up the machine-qemu...scope group that was setup by systemd. 
This group wasn't setup by virCgroupV2MakeGroup(), so calling 
virCgroupV2Remove() seems wrong too. Not sure how to address that.

> 
> This happens only on cgroup controllers managed by systemd. For example
> if you switch to cgroups v1 where each controller is in separate
> directory not all controllers supported by libvirt are also supported by
> systemd. In this case libvirt creates all the cgroups by itself and is
> responsible to cleanup as well. With this patch we would not remove the
> VM root cgroups in these controllers. This would affect following
> controllers:
> 
>      cpuset
>      freezer
>      net_cls
>      net_prio
>      perf_event
> 
> You can verify what happens with cgroups v1 by adding
> systemd.unified_cgroup_hierarchy=0 to your kernel cmdline.

Neat, thanks for that tip.

Thanks,
Eric

> 
> Pavel
> 




More information about the libvir-list mailing list