[libvirt] [PATCH 4/9] util: cgroups do not implicitly add task to new machine cgroup

Daniel P. Berrange berrange at redhat.com
Fri Feb 26 12:21:02 UTC 2016


On Fri, Feb 26, 2016 at 01:16:15PM +0100, Henning Schild wrote:
> On Fri, 26 Feb 2016 11:13:13 +0000
> "Daniel P. Berrange" <berrange at redhat.com> wrote:
> 
> > On Tue, Feb 23, 2016 at 04:58:39PM +0100, Henning Schild wrote:
> > > virCgroupNewMachine used to add the pidleader to the newly created
> > > machine cgroup. Do not do this implicit anymore.
> > > 
> > > Signed-off-by: Henning Schild <henning.schild at siemens.com>
> > > ---
> > >  src/lxc/lxc_cgroup.c   | 11 +++++++++++
> > >  src/qemu/qemu_cgroup.c | 11 +++++++++++
> > >  src/util/vircgroup.c   | 22 ----------------------
> > >  3 files changed, 22 insertions(+), 22 deletions(-)  
> > 
> > NACK to this patch once again.
> > 
> > This does not actually work as you think it does.
> > 
> > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > > index 11f33ab..aef8e8c 100644
> > > --- a/src/util/vircgroup.c
> > > +++ b/src/util/vircgroup.c
> > > @@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char *name,
> > >          }
> > >      }
> > >  
> > > -    if (virCgroupAddTask(*group, pidleader) < 0) {
> > > -        virErrorPtr saved = virSaveLastError();
> > > -        virCgroupRemove(*group);
> > > -        virCgroupFree(group);
> > > -        if (saved) {
> > > -            virSetError(saved);
> > > -            virFreeError(saved);
> > > -        }
> > > -    }  
> > 
> > Just above this we called virSystemdCreateMachine.  Systemd will
> > create the cgroup and add the pidleader to those cgroups. Systemd
> > may add the pidleader to just the 'systemd' controller, or it may
> > add the pidleader to *ALL* controllers. We have no way of knowing.
> > 
> > This virCgroupAddTask call deals with whatever systemd chose not
> > todo, so we can guarantee consistent behaviour with the pidleader
> > in all cgroups.
> > 
> > By removing this you make this method non-deterministic - the pid
> > may or may not be in the cpu controller now. THis is bad because
> > it can lead to QEMU/LXC driver code working in some cases but
> > failing in other cases.
> > 
> > Furthermore, this existing does not cause any problems for the
> > scenario you care about. THis cgroup placement is being set
> > in between the time libvirtd calls fork() and exec(). With your
> > later patch 5, we ensure that the PID is moved across into the
> > emulator cgroup, before we call exec(). When we call exec all
> > memory mappings will be replaced, so QEMU will stil start with
> > the correct vCPU placement and memory allocation placement.
> 
> I agree having the task in the wrong cgroup before the exec() seems
> harmless. But i am not sure all the fiddling with cgroups is indeed
> harmless and does not cause i.e. kernel work on cores that should be
> left alone. I have the feeling allowing the task in the parent cgroup
> is a bad idea, no matter how short the window seems to be.
> 
> Right now the parent cgroup contains all cpus found in machine.slice,
> which for pinned VMs is too much. How about we calculate the size of the
> child cgroups before and make the parent the union of them. Or give the
> parent the emulator pinning and extend it for the vcpus later.
> But that might turn out pretty complicated as well, getting the order
> right with the mix of cpusets and sched_setaffinity().
> 
> > Just just drop this patch please.

The point is though that we have *no* choice. Systemd can put the task
in the cpu controller and we've no way to prevent that. So the code *has*
to be able to cope with that happening. Therefore this patch is wrong
it just makes behaviour non-deterministic increasing the chances that
we don't correctly handle the case where systemd adds the task to the
cpu controllers

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list