[libvirt] [PATCH 5/6] qemuProcessHook: Call virNuma*() iff needed

Ján Tomko jtomko at redhat.com
Mon Mar 30 13:15:38 UTC 2015


On Fri, Mar 27, 2015 at 03:26:53PM +0100, Michal Privoznik wrote:
> Once upon a time, there was a little domain. And the domain was pinned
> onto a NUMA node and hasn't fully allocated its memory:
> 
>   <memory unit='KiB'>2355200</memory>
>   <currentMemory unit='KiB'>1048576</currentMemory>
> 
>   <numatune>
>     <memory mode='strict' nodeset='0'/>
>   </numatune>
> 
> Oh little me, said the domain, what will I do with so few memory.

s/few/little/

> If I only had a few megabytes more. But the old admin noticed
> whimpering, barely audible to untrained human ear. And good admin he

the whimpering
the good admin (?)

> was, he gave the domain yet more memory. But the old NUMA topology
> witch forbidden to allocate more memory on the node zero. So he

forbidden -> forbade or forbid

> decided to allocate it on a different node:
> 
> virsh # numatune little_domain --nodeset 0-1
> 
> virsh # setmem little_domain 2355200
> 
> The little domain was happy. For a while. Until bad, sharp teeth

a bad? the bad?

> shaped creature came. Every process in the system was afraid of him.
> The OOM Killer they called him. Oh no, he's after the little domain.
> There's no escape.
> 
> Do you kids know why? Because when the little domain was born, her
> father, Libvirt, called numa_set_membind(). So even if the admin
> allowed her to allocate memory from other nodes in the cgroups, the
> membind() forbid it.
> 
> So what's the lesson? Libvirt should rely on cgroups, whenever
> possible and use numa_set_membind() as the last ditch effort.
> 
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
>  src/qemu/qemu_process.c | 32 ++++++++++++++++++++++++++++----
>  1 file changed, 28 insertions(+), 4 deletions(-)
> 

I don't have much experience proofreading children's books,
but the logic looks okay to me.

> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 79f763e..cba042d 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data)
>      int fd;
>      virBitmapPtr nodeset = NULL;
>      virDomainNumatuneMemMode mode;
> +    bool doNuma = true;
>  
>      /* This method cannot use any mutexes, which are not
>       * protected across fork()
> @@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data)
>          goto cleanup;
>  
>      mode = virDomainNumatuneGetMode(h->vm->def->numa, -1);
> -    nodeset = virDomainNumatuneGetNodeset(h->vm->def->numa,
> -                                          priv->autoNodeset, -1);
>  
> -    if (virNumaSetupMemoryPolicy(mode, nodeset) < 0)
> -        goto cleanup;
> +    if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
> +        virCgroupPtr cgroup = NULL;
> +
> +        /* Create dummy cgroup ... */
> +        if (virCgroupNewSelf(&cgroup) < 0)
> +            goto cleanup;

The domain's cgroup is accessible under priv->cgroup here, you can use
that one instead.

> +
> +        /* ... just to detect if cpuset cgroup is present */
> +        if (virCgroupHasController(cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) {
> +            /* Because if it's not, we must use virNuma* APIs to bind
> +             * memory onto desired nodes. CGroup way is preferred, as
> +             * it allows runtime tuning, while virNuma - well, once
> +             * set and child (qemu) is exec()-ed, we can't do
> +             * anything about the settings. virNuma* does not take
> +             * any PID argument after all. */

Can this comment be shortened?

Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20150330/24c0d185/attachment-0001.sig>


More information about the libvir-list mailing list