[libvirt-users] HugePages - can't start guest that requires them

G. Richard Bellamy rbellamy at pteradigm.com
Fri Feb 20 20:32:57 UTC 2015


On Tue, Feb 10, 2015 at 1:14 AM, Michal Privoznik <mprivozn at redhat.com> wrote:
> On 09.02.2015 18:19, G. Richard Bellamy wrote:
>> First I'll quickly summarize my understanding of how to configure numa...
>>
>> In "//memoryBacking/hugepages/page[@nodeset]" I am telling libvirt to
>> use hugepages for the guest, and to get those hugepages from a
>> particular host NUMA node.
>
> No, @nodeset refers to guest NUMA nodes.
>
>>
>> In "//numatune/memory[@nodeset]" I am telling libvirt to pin the
>> memory allocation to the guest from a particular host numa node.
>
> The <memory/> element tells what to do with not explicitly pinned guest
> NUMA nodes.
>
>> In "//numatune/memnode[@nodeset]" I am telling libvirt which guest
>> NUMA node (cellid) should come from which host NUMA node (nodeset).
>
> Correct. This way you can explicitly pin guest onto host NUMA nodes.
>
>>
>> In "//cpu/numa/cell[@id]" I am telling libvirt how much memory to
>> allocate to each guest NUMA node (cell).
>
> Yes. Each <cell/> creates guest NUMA node. It interconnects vCPUs and
> guest memory - which vCPUs should lie in which guest NUMA node, and how
> much memory should be available for that particular guest NUMA node.
>
>>
>> Basically, I thought "nodeset", regardless of where it existed in the
>> domain xml, referred to the host's NUMA node, and "cell" (<cell id=/>
>> or @cellid) refers to the guest's NUMA node.
>>
>> However....
>>
>> Atlas [1] starts without issue, prometheus [2] fails with "libvirtd[]:
>> hugepages: node 2 not found". I found a patch that contains the code
>> responsible for throwing this error [3],
>>
>> +        if (def->cpu && def->cpu->ncells) {
>> +            /* Fortunately, we allow only guest NUMA nodes to be continuous
>> +             * starting from zero. */
>> +            pos = def->cpu->ncells - 1;
>> +        }
>> +
>> +        next_bit = virBitmapNextSetBit(page->nodemask, pos);
>> +        if (next_bit >= 0) {
>> +            virReportError(VIR_ERR_XML_DETAIL,
>> +                           _("hugepages: node %zd not found"),
>> +                           next_bit);
>> +            return -1;
>> +        }
>>
>> Without digging too deeply into the actual code, and just inferring
>> from the above, it looks like we are reading the number of cells set
>> in "//cpu/numa" with def->cpu->ncells, and comparing it to the number
>> of nodesets in "//memoryBacking//hugepages". I think this means that I
>> misunderstand what the nodeset is for in that element...
>>
>> Of note is the fact that my host has non-contiguous NUMA node numbers:
>> 2015-02-09 08:53:06
>> root at eanna i ~ # numastat
>>                            node0           node2
>> numa_hit               216225024       440311113
>> numa_miss                      0          795018
>> numa_foreign              795018               0
>> interleave_hit             15835           15783
>> local_node             214029815       221903122
>> other_node               2195209       219203009
>>
>> Thanks again for any help.
>>
>
> Libvirt should be perfectly able to cope with noncontinuous host NUMA
> nodes. However, noncontinuous guest NUMA nodes are not supported yet -
> but it shouldn't matter since users have full control over creating
> guest NUMA nodes.
>
> Anyway, if you find the documentation incomplete in any sense, any part,
> or you feel that rewording some paragraphs may help, feel free to
> propose a patch and I'll review it.

Thanks again Michal, I'm slowly zeroing in to a good resolution here.
I think the documentation is clear enough - it's the fact that a guest
NUMA node can be referred to as either cell(id) or nodeset, depending
on element context - that's what threw me.

I've modified my config [1] based on my understanding, and am running
into a new error. Basically I'm hitting the oom-killer [2] even though
the hard_limit [3] of memtune is below the total number of hugepages
set for that NUMA nodeset.

[1] http://sprunge.us/BadI
[2] http://sprunge.us/eELZ
[3] http://sprunge.us/GYXM




More information about the libvirt-users mailing list