[libvirt-users] HugePages - can't start guest that requires them

G. Richard Bellamy rbellamy at pteradigm.com
Mon Feb 9 17:19:19 UTC 2015


First I'll quickly summarize my understanding of how to configure numa...

In "//memoryBacking/hugepages/page[@nodeset]" I am telling libvirt to
use hugepages for the guest, and to get those hugepages from a
particular host NUMA node.

In "//numatune/memory[@nodeset]" I am telling libvirt to pin the
memory allocation to the guest from a particular host numa node.
In "//numatune/memnode[@nodeset]" I am telling libvirt which guest
NUMA node (cellid) should come from which host NUMA node (nodeset).

In "//cpu/numa/cell[@id]" I am telling libvirt how much memory to
allocate to each guest NUMA node (cell).

Basically, I thought "nodeset", regardless of where it existed in the
domain xml, referred to the host's NUMA node, and "cell" (<cell id=/>
or @cellid) refers to the guest's NUMA node.

However....

Atlas [1] starts without issue, prometheus [2] fails with "libvirtd[]:
hugepages: node 2 not found". I found a patch that contains the code
responsible for throwing this error [3],

+        if (def->cpu && def->cpu->ncells) {
+            /* Fortunately, we allow only guest NUMA nodes to be continuous
+             * starting from zero. */
+            pos = def->cpu->ncells - 1;
+        }
+
+        next_bit = virBitmapNextSetBit(page->nodemask, pos);
+        if (next_bit >= 0) {
+            virReportError(VIR_ERR_XML_DETAIL,
+                           _("hugepages: node %zd not found"),
+                           next_bit);
+            return -1;
+        }

Without digging too deeply into the actual code, and just inferring
from the above, it looks like we are reading the number of cells set
in "//cpu/numa" with def->cpu->ncells, and comparing it to the number
of nodesets in "//memoryBacking//hugepages". I think this means that I
misunderstand what the nodeset is for in that element...

Of note is the fact that my host has non-contiguous NUMA node numbers:
2015-02-09 08:53:06
root at eanna i ~ # numastat
                           node0           node2
numa_hit               216225024       440311113
numa_miss                      0          795018
numa_foreign              795018               0
interleave_hit             15835           15783
local_node             214029815       221903122
other_node               2195209       219203009

Thanks again for any help.

[1]: http://sprunge.us/jZgS
[2]: http://sprunge.us/iETF
[3] https://www.redhat.com/archives/libvir-list/2014-September/msg00090.html

On Wed, Feb 4, 2015 at 12:03 PM, G. Richard Bellamy
<rbellamy at pteradigm.com> wrote:
> *facepalm*
>
> Now that I'm re-reading the documentation it's obvious that <page/>
> and @nodeset are for the guest, "This tells the hypervisor that the
> guest should have its memory allocated using hugepages instead of the
> normal native page size." Pretty clear there.
>
> Thank you SO much for the guidance, I'll return to my tweaking. I'll
> report back here with my results.
>
>
>
>
> On Wed, Feb 4, 2015 at 12:17 AM, Michal Privoznik <mprivozn at redhat.com> wrote:
>> On 04.02.2015 01:59, G. Richard Bellamy wrote:
>>> As I mentioned, I got the instances to launch... but they're only
>>> taking HugePages from "Node 0", when I believe my setup should pull
>>> from both nodes.
>>>
>>> [atlas] http://sprunge.us/FSEf
>>> [prometheus] http://sprunge.us/PJcR
>>
>> [pasting interesting nits from both XMLs]
>>
>> <domain type='kvm' id='2'>
>>   <name>atlas</name>
>>   <uuid>d9991b1c-2f2d-498a-9d21-51f3cf8e6cd9</uuid>
>>   <memory unit='KiB'>16777216</memory>
>>   <currentMemory unit='KiB'>16777216</currentMemory>
>>   <memoryBacking>
>>     <hugepages>
>>       <page size='2048' unit='KiB' nodeset='0'/>
>>     </hugepages>
>>     <nosharepages/>
>>   </memoryBacking>
>>   <!-- no numa pining -->
>> </domain>
>>
>>
>> <domain type='kvm' id='3'>
>>   <name>prometheus</name>
>>   <uuid>dda7d085-701b-4d0a-96d4-584678104fb3</uuid>
>>   <memory unit='KiB'>16777216</memory>
>>   <currentMemory unit='KiB'>16777216</currentMemory>
>>   <memoryBacking>
>>     <hugepages>
>>       <page size='2048' unit='KiB' nodeset='2'/>
>>     </hugepages>
>>     <nosharepages/>
>>   </memoryBacking>
>>   <!-- again no numa pining -->
>> </domain>
>>
>> So, at start, the @nodeset attribute to <page/> element refers to guest
>> numa nodes, not host ones. And since you don't define any numa nodes for
>> your guests, it's useless. Side note - I wonder if we should make
>> libvirt fail explicitly in this case.
>>
>> Moreover, you haven't pinned your guests onto any host numa nodes. This
>> means it's up to the host kernel and its scheduler where guest will take
>> memory from. And subsequently hugepages as well. I think you want to add:
>>
>>   <numatune>
>>     <memory mode='strict' nodeset='0'/>
>>   </numatune>
>>
>> to guest XMLs, where @nodeset refers to host numa nodes and tells where
>> the guest should be placed. There are other modes too so please see
>> documentation to tune the XML to match your use case perfectly.
>>
>> Michal




More information about the libvirt-users mailing list