[libvirt] [PATCH] qemu: don't setup cpuset.mems if memory mode in numatune is 'preferred'

Martin Kletzander mkletzan at redhat.com
Fri Nov 7 11:18:45 UTC 2014

On Fri, Nov 07, 2014 at 05:36:43PM +0800, Wang Rui wrote:
>On 2014/11/5 16:07, Martin Kletzander wrote:
>>>>> diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
>>>>> index b5bdb36..8685d6f 100644
>>>>> --- a/src/qemu/qemu_cgroup.c
>>>>> +++ b/src/qemu/qemu_cgroup.c
>>>>> @@ -618,6 +618,11 @@ qemuSetupCpusetMems(virDomainObjPtr vm,
>>>>>     if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET))
>>>>>         return 0;
>>>>> +    if (virDomainNumatuneGetMode(vm->def->numatune, -1) !=
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>> One question, is it problem only for 'preferred' or 'interleaved' as
>>>> well?  Because if it's only problem for 'preferred', then the check is
>>>> wrong.  If it's problem for 'interleaved' as well, then the commit
>>>> message is wrong.
>>> 'interleave' with a single node(such as nodeset='0') will cause the same error.
>>> But 'interleave' mode should not live with a single node. So maybe there's
>>> another bugfix to check 'interleave' with single node.
>> Well, I'd be OK with just changing the commit message to mention that.
>> This fix is still a valid one and will fix both issues, won't it?
>>> If configured with 'interleave' and multiple nodes(such as nodeset='0-1'),
>>> VM can be started successfully. And cpuset.mems is set to the same nodeset.
>>> So I'll revise my patch.
>>> I'll send patches V2. Conclusion:
>>> 1/3 : add check for 'interleave' mode with single numa node
>>> 2/3 : fix this problem in qemu
>>> 3/3 : fix this problem in lxc
>>> Is it OK?
>>>> Anyway, after either one is fixed, I can push this.
>I tested this problem again and found that this error occurred with each
>memory mode. It is broke by commit 411cea638f6ec8503b7142a31e58b1cd85dbeaba
>which is produced by me.
>    qemu: move setting emulatorpin ahead of monitor showing up
>I'm sorry for that.
>That patch moved qemuSetupCgroupForEmulator before qemuSetupCgroupPostInit.
>I have ideas to fix that.
>1. Move qemuSetupCgroupPostInit ahead of monitor showing up, too.
>   Of course it's before qemuSetupCgroupForEmulator.
>   This action to fix the bug which is introduced by me.
>   (RFC)

That cannot be done, IIRC, because we need monitor to get the
vCPU <-> thread mapping from it.

>2. Anyway the first problem is fixed, I have found the second problem which
>   is I wanted to fix originally. If memory mode is 'preferred' and with
>   one node (such as nodeset='0'), domain's memory is not in node 0
>   absolutely. Assumption that node 0 doesn't have enough memory, memory
>   can be allocated on node 1. Then if we set cpuset.mems to '0', it may
>   cause OOM.
>   The solution is checking memory mode in (lxc)qemuSetupCpusetMems as my
>   patch on Tuesday.  Such as
>   +    if (virDomainNumatuneGetMode(vm->def->numatune, -1) !=

Either this (as it makes sense to restrict qemu even for 'interleave'
or the previous check is fine too (just because that was what we did
before, I just rewrote it with few problems.

>3. After the first problem has been fixed, we can start domains with xml:
>  <numatune>
>    <memory mode='interleave' nodeset='0'/>
>  </numatune>
>  Is a single node '0' valid for 'interleave' ? I take 'interleave' as
>  'at least two nodes'.

Well, interleave of 1 node is effectively 'strict', isn't it?  What
errors do you get if you try that?  (my kernel stopped accepting
numa=fake=2 as a cmdline parameter :( )

Anyway, I think the best way would be mimicking the old behaviour by
just adding your first proposed fix "if (mode != STRICT) return 0",
just fit the fixed up comit message.

