[libvirt] [RFC] phi support in libvirt

Feng, Shaohe shaohe.feng at intel.com
Mon Dec 26 11:57:42 UTC 2016

Thanks, Daniel.

So how about:

for the NUMA format,
we still uses "memory" to describe the mcdram.
But we remove the cpus elements.
   <cell id='3' memory='8' unit='GiB'/> </numa>
   <cell id='4' memory='8' unit='GiB'/> </numa>

At present, for this kind CPUless NUMA , we only support mcdram as 
memroy backend.

     <mcdram nodeset="3-4"/>

And we reject a CPUless NUMA without memroy backend.
Maybe we will allow it in futures after qemu can handle it well.

A question:
1. Should libvirt probe the "host-nodes" for this kind of memory to make 
a smart map?

The qemu arguments will be as follow:
memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node3 \
-numa node,nodeid=3,memdev=node3 \

memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node4 \
-numa node,nodeid=4,memdev=node4 \

2. or we let user specify the host-nodes.
     <mcdram nodeset="3-4", host-nodes="0-1"/>

ShaoHe Feng

On 2016年12月21日 18:25, Daniel P. Berrange wrote:
> On Wed, Dec 21, 2016 at 12:51:29PM +0800, Feng, Shaohe wrote:
>> Thanks.  Dolpher.
>> Reply inline.
>> On 2016年12月21日 11:56, Du, Dolpher wrote:
>>> Shaohe was dropped from the loop, adding him back.
>>>> -----Original Message-----
>>>> From: He Chen [mailto:he.chen at linux.intel.com]
>>>> Sent: Friday, December 9, 2016 3:46 PM
>>>> To: Daniel P. Berrange <berrange at redhat.com>
>>>> Cc: libvir-list at redhat.com; Du, Dolpher <dolpher.du at intel.com>; Zyskowski,
>>>> Robert <robert.zyskowski at intel.com>; Daniluk, Lukasz
>>>> <lukasz.daniluk at intel.com>; Zang, Rui <rui.zang at intel.com>;
>>>> jdenemar at redhat.com
>>>> Subject: Re: [libvirt] [RFC] phi support in libvirt
>>>>> On Mon, Dec 05, 2016 at 04:12:22PM +0000, Feng, Shaohe wrote:
>>>>>> Hi all:
>>>>>> As we are know Intel® Xeon phi targets high-performance computing and
>>>>>> other parallel workloads.
>>>>>> Now qemu has supported phi virtualization,it is time for libvirt to
>>>>>> support phi.
>>>>> Can you provide pointer to the relevant QEMU changes.
>>>> Xeon Phi Knights Landing (KNL) contains 2 primary hardware features, one
>>>> is up to 288 CPUs which needs patches to support and we are pushing it,
>>>> the other is Multi-Channel DRAM (MCDRAM) which does not need any changes
>>>> currently.
>>>> Let me introduce more about MCDRAM, MCDRAM is on-package
>>>> high-bandwidth
>>>> memory (~500GB/s).
>>>> On KNL platform, hardware expose MCDRAM as a seperate, CPUless and
>>>> remote NUMA node to OS so that MCDRAM will not be allocated by default
>>>> (since MCDRAM node has no CPU, every CPU regards MCDRAM node as
>>>> remote
>>>> node). In this way, MCDRAM can be reserved for certain specific
>>>> applications.
>>>>>> Different from the traditional X86 server, There is a special numa
>>>>>> node with Multi-Channel DRAM (MCDRAM) on Phi, but without any CPU .
>>>>>> Now libvirt requires nonempty cpus argument for NUMA node, such as.
>>>>>> <numa>
>>>>>>     <cell id='0' cpus='0-239' memory='80' unit='GiB'/>
>>>>>>     <cell id='1' cpus='240-243' memory='16' unit='GiB'/> </numa>
>>>>>> In order to support phi virtualization, libvirt needs to allow a numa
>>>>>> cell definition without 'cpu' attribution.
>>>>>> Such as:
>>>>>> <numa>
>>>>>>     <cell id='0' cpus='0-239' memory='80' unit='GiB'/>
>>>>>>     <cell id='1' memory='16' unit='GiB'/> </numa>
>>>>>> When a cell without 'cpu', qemu will allocate memory by default MCDRAM
>>>> instead of DDR.
>>>>> There's separate concepts at play which your description here is mixing up.
>>>>> First is the question of whether the guest NUMA node can be created with
>>>> only RAM or CPUs, or a mix of both.
>>>>> Second is the question of what kind of host RAM (MCDRAM vs DDR) is used
>>>> as the backing store for the guest
>>>> Guest NUMA node shoulde be created with memory only (keep the same as
>>>> host's) and the more important things is the memory should bind to (come
>>>> from) host MCDRAM node.
>> So I suggest libvirt distinguish the MCDRAM
>> And the MCDRAM numa config as follow, add a "mcdram" attribute for "cell"
>> element:
>> <numa>
>>    <cell id='1'  mcdram='16' unit='GiB'/> </numa>
>>    <cell id='0' cpus='0-239' memory='80' unit='GiB'/>
> No, that is not backwards compatible for applications using libvirt.
> We already have a place for storing info about memory backing type,
> which we use for huge pages. mcdram should use the same approach
> IMHO. eg
> <domain>
>    ...
>    <memoryBacking>
>      <mcdram nodeset="3-4"/>
>    </memoryBacking>
> </domain>
> to indicate that nodes 3 & 4 should use mcdram
> Regards,
> Daniel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20161226/d126cd03/attachment-0001.htm>

More information about the libvir-list mailing list