[libvirt] [PATCH 2/3] qemu: Set cpuset.cpus for domain process

Martin Kletzander mkletzan at redhat.com
Fri Jun 7 09:08:32 UTC 2013


On 05/24/2013 11:08 AM, Osier Yang wrote:
> When either "cpuset" of <vcpu> is specified, or the "placement" of
> <vcpu> is "auto", only setting the cpuset.mems might cause the guest
> starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
> "auto"):
> 

After spending a lot of time with this, I'm still not convinced that
this makes the the domain unbootable.  Even when mempolicy is set to
'strict' and the cpuset.cpus are from different node than cpuset.mems,
the allocation shouldn't fail, it just slows down access to the memory.
 Might be a kernel bug?

> 1) Related XMLs
>   <vcpu placement='auto'>4</vcpu>
>   <numatune>
>     <memory mode='strict' placement='auto'/>
>   </numatune>
> 
> 2) Host NUMA topology
>   % numactl --hardware
>   available: 8 nodes (0-7)
>   node 0 cpus: 0 4 8 12 16 20 24 28
>   node 0 size: 16374 MB
>   node 0 free: 11899 MB
>   node 1 cpus: 32 36 40 44 48 52 56 60
>   node 1 size: 16384 MB
>   node 1 free: 15318 MB
>   node 2 cpus: 2 6 10 14 18 22 26 30
>   node 2 size: 16384 MB
>   node 2 free: 15766 MB
>   node 3 cpus: 34 38 42 46 50 54 58 62
>   node 3 size: 16384 MB
>   node 3 free: 15347 MB
>   node 4 cpus: 3 7 11 15 19 23 27 31
>   node 4 size: 16384 MB
>   node 4 free: 15041 MB
>   node 5 cpus: 35 39 43 47 51 55 59 63
>   node 5 size: 16384 MB
>   node 5 free: 15202 MB
>   node 6 cpus: 1 5 9 13 17 21 25 29
>   node 6 size: 16384 MB
>   node 6 free: 15197 MB
>   node 7 cpus: 33 37 41 45 49 53 57 61
>   node 7 size: 16368 MB
>   node 7 free: 15669 MB
> 
> 4) cpuset.cpus will be set as: (from debug log)
> 
> 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
> Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
> to '0-63'
> 
> 5) The advisory nodeset got from querying numad (from debug log)
> 
> 2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
> Nodeset returned from numad: 1
> 
> 6) cpuset.mems will be set as: (from debug log)
> 
> 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
> Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
> to '0-7'
> 
> I.E, the domain process's memory is restricted on the first NUMA node,
> however, it can use all of the CPUs, which will likely cause the domain
> process to fail to start because of the kernel fails to allocate
> memory with the the memory policy as "strict".
> 
> % tail -n 20 /var/log/libvirt/qemu/toy.log
> ...
> 2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 :
> Handshake with parent is done
> char device redirected to /dev/pts/2 (label charserial0)
> kvm_init_vcpu failed: Cannot allocate memory
> ...
> ---
>  src/qemu/qemu_cgroup.c | 39 +++++++++++++++++++++++++++++++++------
>  1 file changed, 33 insertions(+), 6 deletions(-)
> 
> diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
> index cf46993..5bfae77 100644
> --- a/src/qemu/qemu_cgroup.c
> +++ b/src/qemu/qemu_cgroup.c
> @@ -635,7 +635,8 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
>                        virBitmapPtr nodemask)
>  {
>      qemuDomainObjPrivatePtr priv = vm->privateData;
> -    char *mask = NULL;
> +    char *mem_mask = NULL;
> +    char *cpu_mask = NULL;
>      int rc;
>      int ret = -1;
>  
> @@ -649,17 +650,17 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
>  
>          if (vm->def->numatune.memory.placement_mode ==
>              VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)
> -            mask = virBitmapFormat(nodemask);
> +            mem_mask = virBitmapFormat(nodemask);
>          else
> -            mask = virBitmapFormat(vm->def->numatune.memory.nodemask);
> +            mem_mask = virBitmapFormat(vm->def->numatune.memory.nodemask);
>  
> -        if (!mask) {
> +        if (!mem_mask) {
>              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>                             _("failed to convert memory nodemask"));
>              goto cleanup;
>          }
>  
> -        rc = virCgroupSetCpusetMems(priv->cgroup, mask);
> +        rc = virCgroupSetCpusetMems(priv->cgroup, mem_mask);
>  
>          if (rc != 0) {
>              virReportSystemError(-rc,
> @@ -669,9 +670,35 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
>          }
>      }
>  
> +    if (vm->def->cpumask ||
> +        (vm->def->placement_mode ==
> +         VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)) {
> +        if (vm->def->placement_mode ==
> +            VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)
> +            cpu_mask = virBitmapFormat(nodemask);

You're right that we are not setting the cpuset.cpus at all and we
should, but in the scenario you described in the commit message this
will set the cpuset.cpus to the same value as cpuset.mems.  Let's say
'numad -w X:Y' will return '3'.  In this case cpuset.mems will be set
correctly, but cpuset.cpus must be set to all cpus on that node
('34,38,42,46,50,54,58,62' in this case).

ACK if you convert the node to CPUs as described.

Martin.




More information about the libvir-list mailing list