[libvirt] [RFC PATCH] NUMA tuning support

Bill Gray bgray at redhat.com
Thu May 5 20:43:22 UTC 2011


Thanks for the feedback Lee!

One reason to use "membind" instead of "preferred" is that one can 
prefer only a single node.  For large guests, you can specify multiple 
nodes with "membind".  I think "preferred" would be preferred if it 
allowed multiple nodes.

- Bill


On 05/05/2011 10:33 AM, Lee Schermerhorn wrote:
> On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
>> Hi, All,
>>
>> This is a simple implenmentation for NUMA tuning support based on binary
>> program 'numactl', currently only supports to bind memory to specified nodes,
>> using option "--membind", perhaps it need to support more, but I'd like
>> send it early so that could make sure if the principle is correct.
>>
>> Ideally, NUMA tuning support should be added in qemu-kvm first, such
>> as they could provide command options, then what we need to do in libvirt
>> is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't
>> support it yet, what we could do currently is only to use numactl,
>> it forks process, a bit expensive than qemu-kvm supports NUMA tuning
>> inside with libnuma, but it shouldn't affects much I guess.
>>
>> The NUMA tuning XML is like:
>>
>> <numatune>
>>    <membind nodeset='+0-4,8-12'/>
>> </numatune>
>>
>> Any thoughts/feedback is appreciated.
>
> Osier:
>
> A couple of thoughts/observations:
>
> 1) you can accomplish the same thing -- restricting a domain's memory to
> a specified set of nodes -- using the cpuset cgroup that is already
> associated with each domain.  E.g.,
>
> 	cgset -r cpuset.mems=<nodeset>  /libvirt/qemu/<domain>
>
> Or the equivalent libcgroup call.
>
> However, numactl is more flexible; especially if you intend to support
> more policies:  preferred, interleave.  Which leads to the question:
>
> 2) Do you really want the full "membind" semantics as opposed to
> "preferred" by default?  Membind policy will restrict the VMs pages to
> the specified nodeset and will initiate reclaim/stealing and wait for
> pages to become available or the task is OOM-killed because of mempolicy
> when all of the nodes in nodeset reach their minimum watermark.  Membind
> works the same as cpuset.mems in this respect.  Preferred policy will
> keep memory allocations [but not vcpu execution] local to the specified
> set of nodes as long as there is sufficient memory, and will silently
> "overflow" allocations to other nodes when necessary.  I.e., it's a
> little more forgiving under memory pressure.
>
> But then pinning a VM's vcpus to the physical cpus of a set of nodes and
> retaining the default local allocation policy will have the same effect
> as "preferred" while ensuring that the VM component tasks execute
> locally to the memory footprint.  Currently, I do this by looking up the
> cpulist associated with the node[s] from  e.g.,
> /sys/devices/system/node/node<i>/cpulist and using that list with the
> vcpu.cpuset attribute.  Adding a 'nodeset' attribute to the
> cputune.vcpupin element would simplify specifying that configuration.
>
> Regards,
> Lee
>
>




More information about the libvir-list mailing list