[libvirt] [PATCH 5/6] qemuProcessHook: Call virNuma*() iff needed

Michal Privoznik mprivozn at redhat.com
Mon Mar 30 13:46:18 UTC 2015


On 30.03.2015 15:15, Ján Tomko wrote:
> On Fri, Mar 27, 2015 at 03:26:53PM +0100, Michal Privoznik wrote:
>> Once upon a time, there was a little domain. And the domain was pinned
>> onto a NUMA node and hasn't fully allocated its memory:
>>
>>   <memory unit='KiB'>2355200</memory>
>>   <currentMemory unit='KiB'>1048576</currentMemory>
>>
>>   <numatune>
>>     <memory mode='strict' nodeset='0'/>
>>   </numatune>
>>
>> Oh little me, said the domain, what will I do with so few memory.
> 
> s/few/little/
> 
>> If I only had a few megabytes more. But the old admin noticed
>> whimpering, barely audible to untrained human ear. And good admin he
> 
> the whimpering
> the good admin (?)
> 
>> was, he gave the domain yet more memory. But the old NUMA topology
>> witch forbidden to allocate more memory on the node zero. So he
> 
> forbidden -> forbade or forbid
> 
>> decided to allocate it on a different node:
>>
>> virsh # numatune little_domain --nodeset 0-1
>>
>> virsh # setmem little_domain 2355200
>>
>> The little domain was happy. For a while. Until bad, sharp teeth
> 
> a bad? the bad?
> 
>> shaped creature came. Every process in the system was afraid of him.
>> The OOM Killer they called him. Oh no, he's after the little domain.
>> There's no escape.
>>
>> Do you kids know why? Because when the little domain was born, her
>> father, Libvirt, called numa_set_membind(). So even if the admin
>> allowed her to allocate memory from other nodes in the cgroups, the
>> membind() forbid it.
>>
>> So what's the lesson? Libvirt should rely on cgroups, whenever
>> possible and use numa_set_membind() as the last ditch effort.
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>  src/qemu/qemu_process.c | 32 ++++++++++++++++++++++++++++----
>>  1 file changed, 28 insertions(+), 4 deletions(-)
>>
> 
> I don't have much experience proofreading children's books,
> but the logic looks okay to me.
> 
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 79f763e..cba042d 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data)
>>      int fd;
>>      virBitmapPtr nodeset = NULL;
>>      virDomainNumatuneMemMode mode;
>> +    bool doNuma = true;
>>  
>>      /* This method cannot use any mutexes, which are not
>>       * protected across fork()
>> @@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data)
>>          goto cleanup;
>>  
>>      mode = virDomainNumatuneGetMode(h->vm->def->numa, -1);
>> -    nodeset = virDomainNumatuneGetNodeset(h->vm->def->numa,
>> -                                          priv->autoNodeset, -1);
>>  
>> -    if (virNumaSetupMemoryPolicy(mode, nodeset) < 0)
>> -        goto cleanup;
>> +    if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
>> +        virCgroupPtr cgroup = NULL;
>> +
>> +        /* Create dummy cgroup ... */
>> +        if (virCgroupNewSelf(&cgroup) < 0)
>> +            goto cleanup;
> 
> The domain's cgroup is accessible under priv->cgroup here, you can use
> that one instead.

Good point, I though that CGroups are created later in the process. But
due to handshaking with child, we are guaranteed that we can access
priv->cgroup. Will send v2 in a while.

Michal




More information about the libvir-list mailing list