[libvirt] [numatune PATCH v2] Support NUMA tuning

Fri May 20 07:22:10 UTC 2011

于 2011年05月19日 15:34, Daniel Veillard 写道:
> On Sun, May 15, 2011 at 09:37:21PM -0400, Mark Wagner wrote:
>> On 05/12/2011 06:45 AM, Daniel P. Berrange wrote:
>>> On Thu, May 12, 2011 at 06:22:49PM +0800, Osier Yang wrote:
>>>> Hi, All
>>>>
>>>> This series adopts Daniel's suggestion on v1, using libnuma but
>>>> not invoking numactl to set the NUMA policy. Add support for
>>>> "interleave" and "preferred" modes, except the "strict" mode
>>>> supported in v1.
>>>>
>>>> The new XML is like:
>>>>
>>>> <numatune>
>>>>    <memory model="interleave" nodeset="+0-4,8-12"/>
>>>> <numatune>
>>>>
>>>> I persist in using the numactl nodeset syntax to represent
>>>> the "nodeset", as I think the purpose of adding NUMA tuning
>>>> support is to provide the use for NUMA users, keeping the
>>>> syntax same as numactl will make them feel better.
>>>
>>> Compatibility with numactl syntax is an explicit non-goal.
>>> numactl is just one platform specific impl.  Compatibility
>>> with numactl syntax is of no interest to the ESX or VirtualBox
>>> drivers. The libvirt NUMA syntax should be using other
>>> existing libvirt XML as the design compatibility target.
>>>
>>
>> I won't argue semantic of XML with you, but please keep in mind
>> that one of the main differences between using a numactl like
>> mechanism and taskset is that the NUMA mechanisms also let you
>> bind to specific, NUMA node memory, as well as specifying the
>> access type.
>>
>> So from the outside looking in, keeping things in terms of cpusets
>> would seem to not be in full agreement with the RFE for NUMA support.
>> I would think that the specification of NUMA binding would need to
>> include NUMA nodes and specify memory bindings as well as the
>> access type. From a performance perspective, support for true
>> NUMA is what is the last hurdle that is keeping libvirt from being
>> used in high performance situations.
>>
>> I think that specifying things in terms of nodes instead of
>> cpus will make it easier for the end user. So I guess I need
>> to withdraw the part about not arguing XML...
>
>    Hi Mark,
>
> I'm not 100% sure I understand what you disagreeing with:
>    - it seems to me that the proposed model does allow the specification
>      of the nodes and the memory binding associated
>    - I wonder if you just object to the "nodeset" attribute name here
>    - please note that "Node" in the context of libvirt has the specific
>      meaning of the whole physical machine http://libvirt.org/goals.html
>      that terminology was set up 5 years ago and present in many places
>      of the libvirt API. On the other hand "nodeset" is being used in
>      other places to specify a set of cpu nodes in a NUMA context.
>

I guess Mark is not objecting to the attribute name "nodeset", seems
he means if we use same syntax as "cpuset", it's not the full
agreement with PRE "NUMA support", as we will lose some syntax that
libnuma uses.

As a conclusion after the discussion, we will use "nodeset" as the
attribute name, and with same syntax of "cpuset", and we won't use
the nodestring parsing function "numa_parse_nodestring", which is
provided by libnuma, if we don't want to make things a mess:

"numa_parse_nodestring" only accepts "!" (also "+", but as we won't
support "+", so skip it here) at the beginning of the specified node
string, e.g "0-4,!8-12" is not valid, however, our current "cpuset"
syntax allows "^" could be specified anywhere, e.g. "0-8,^2-4" is
valid, so even if we convert "^" to "!" before passing the string
to "numa_parse_nodestring", that's still doesn't make sense, unless
we declare in the documents, that we use same syntax of "cpuset",
however, the "^" must be specified at the beginning, but that's
no better than introducing a different syntax. On the other hand,
"numa_parse_nodestring" doesn't support syntax like "!6", so in
one word, if we will use same syntax with "cpuset", we can't/won't
use the numa parsing function.

We will use "virDomainCpuSetParse" to parse the value of "nodeset"
to bit mask. and then pass it to numa setting functions, we need to
do some conversion before pass it for numa functions' use though,
as the datatypes are different.

Even if we modify current "cpuset" parsing function to support
"^2-4", that will still diffrent with what "!" means in libnuma.

That means we will use a nearly completely diffrent syntax with
libnuma to represents NUMA nodes in libvirt, with losing sementics
of both "+" and "!" in presentation layer.

Thoughts?

Regards
Osier