[libvirt] [RFC PATCH 8/8] qemu: Set cpuset.mems even if the numatune mode is not strict

Osier Yang jyang at redhat.com
Mon May 13 09:02:03 UTC 2013


On 13/05/13 14:46, Hu Tao wrote:
> On Thu, May 09, 2013 at 06:22:17PM +0800, Osier Yang wrote:
>> When the numatune memory mode is not "strict", the cpuset.mems
>> inherits the parent's setting, which causes problem like:
>>
>> % virsh dumpxml rhel6_local | grep interleave -2
>>    <vcpu placement='static'>2</vcpu>
>>    <numatune>
>>      <memory mode='interleave' nodeset='1-2'/>
>>    </numatune>
>>    <os>
>>
>> % cat /proc/3713/status | grep Mems_allowed_list
>>    Mems_allowed_list:	0-3
>>
>> % virsh numatune rhel6_local
>>    numa_mode      : interleave
>>    numa_nodeset   : 0-3
> Yes the information is misleading.
>
>> Though the domain process's memory binding is set with libnuma
>> after the cgroup setting.
>>
>> The reason for only allowing "strict" mode in current code is the
>> cpuset.mems doesn't understand the memory policy modes (interleave,
>> prefered, strict), it actually equals to the "strict" mode ("strict"
>> means the allocation will fail if the memory cannot be allocated on
>> the target node. Default operation is to fall back to other nodes.
> Default is localalloc.
>> >From man numa(3)). However, writing the the cpuset.mems even if the
>> numatune memory mode is not strict should be better than the blind
>> inheritance anyway.
> It's OK to interleave mode, combined with cpuset.memory_spread_xxx.

  - cpuset.memory_spread_page flag: if set, spread page cache evenly on 
allowed nodes
  - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on 
allowed nodes

Looks reasonable.

> But what about preferred mode? comparing:
>
> strict:  Strict means the allocation will fail if the memory cannot be
>           allocated on the target node.
>
> preferred: The system will attempt to allocate memory  from  the
>             preferred node, but will fall back to other nodes if no
> 	   memory is available on the the preferred node.

For "preferred" mode, I have no idea, there is no related cgroup file(s) 
like
memory_spread_*. If we set cpuset.mems with the nodeset, it means
the memory allocation will behave like 'strict', which is not expected.

>> ---
>> However, I'm not comfortable with the solution, since anyway the
>> modes except "strict" are not meaningful for cpuset.mems.
>>
>> Another problem what I'm not sure about is: If the cpuset.cpus will
>> affect the libnuma setting? Assuming without this patch, domain
>> process's cpuset.mems will be set as '0-7' (8 NUMA nodes, each has 8
>> CPUs). And the numatune memory mode is "interleave", and libnuma set
>> the memory binding as "1-2". Even with this patch applied, setting
>> cpuset.mems as "1-2", any potential problem?
>>
>> So this patch is mainly for raising up the problem, and to see if
>> guys have any opinions. @hutao, since these codes are from you, any
>> opinions/idea? Thanks.
>> ---
>>   src/qemu/qemu_cgroup.c | 18 +++++++++++++-----
>>   1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
>> index 33eebd7..22fe25b 100644
>> --- a/src/qemu/qemu_cgroup.c
>> +++ b/src/qemu/qemu_cgroup.c
>> @@ -597,11 +597,9 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
>>       if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET))
>>           return 0;
>>   
>> -    if ((vm->def->numatune.memory.nodemask ||
>> -         (vm->def->numatune.memory.placement_mode ==
>> -          VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) &&
>> -        vm->def->numatune.memory.mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
>> -
>> +    if (vm->def->numatune.memory.nodemask ||
>> +        (vm->def->numatune.memory.placement_mode ==
>> +         VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) {
>>           if (vm->def->numatune.memory.placement_mode ==
>>               VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)
>>               mem_mask = virBitmapFormat(nodemask);
>> @@ -614,6 +612,16 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
>>               goto cleanup;
>>           }
>>   
>> +        if (vm->def->numatune.memory.mode ==
>> +            VIR_DOMAIN_NUMATUNE_MEM_PREFERRED &&
>> +            strlen(mem_mask) != 1) {
>> +            virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>> +                           _("NUMA memory tuning in 'preferred' mode "
>> +                             "only supports single node"));
>> +            goto cleanup;
>> +
>> +        }
>> +
>>           rc = virCgroupSetCpusetMems(priv->cgroup, mem_mask);
>>   
>>           if (rc != 0) {
>> -- 
>> 1.8.1.4
>>
>> --
>> libvir-list mailing list
>> libvir-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list