[libvirt] [RFC] phi support in libvirt

Du, Dolpher dolpher.du at intel.com
Tue Dec 27 07:21:03 UTC 2016


For your question, I would suggest to use the second form, this is consistent with qemu, and will not bring platform specific knowledge to libvirt layer:
2. or we let user specify the host-nodes.
  <memoryBacking>
    <mcdram nodeset="3-4", host-nodes="0-1"/>
  </memoryBacking>
</domain>

Regards,
Dolpher
From: Feng, Shaohe
Sent: Monday, December 26, 2016 7:58 PM
To: Daniel P. Berrange <berrange at redhat.com>
Cc: Du, Dolpher <dolpher.du at intel.com>; He Chen <he.chen at linux.intel.com>; libvir-list at redhat.com; Zyskowski, Robert <robert.zyskowski at intel.com>; Daniluk, Lukasz <lukasz.daniluk at intel.com>; Zang, Rui <rui.zang at intel.com>; jdenemar at redhat.com
Subject: Re: [libvirt] [RFC] phi support in libvirt

Thanks, Daniel.

So how about:

for the NUMA format,
we still uses "memory" to describe the mcdram.
But we remove the cpus elements.
<numa>
  <cell id='3' memory='8' unit='GiB'/> </numa>
  <cell id='4' memory='8' unit='GiB'/> </numa>

At present, for this kind CPUless NUMA ,  we only support mcdram as memroy backend.

<domain>
  ...
  <memoryBacking>
    <mcdram nodeset="3-4"/>
  </memoryBacking>
</domain>

And we reject a CPUless NUMA without memroy backend.
Maybe we will allow it in futures after qemu can handle it well.


A question:
1. Should libvirt probe the "host-nodes" for this kind of memory to make a smart map?

The qemu arguments will be as follow:
-object memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node3 \
-numa node,nodeid=3,memdev=node3 \

-object memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node4 \
-numa node,nodeid=4,memdev=node4 \


2. or we let user specify the host-nodes.
  <memoryBacking>
    <mcdram nodeset="3-4", host-nodes="0-1"/>
  </memoryBacking>
</domain>


BR
ShaoHe Feng

On 2016年12月21日 18:25, Daniel P. Berrange wrote:

On Wed, Dec 21, 2016 at 12:51:29PM +0800, Feng, Shaohe wrote:

Thanks.  Dolpher.



Reply inline.





On 2016年12月21日 11:56, Du, Dolpher wrote:

Shaohe was dropped from the loop, adding him back.



-----Original Message-----

From: He Chen [mailto:he.chen at linux.intel.com]

Sent: Friday, December 9, 2016 3:46 PM

To: Daniel P. Berrange <berrange at redhat.com><mailto:berrange at redhat.com>

Cc: libvir-list at redhat.com<mailto:libvir-list at redhat.com>; Du, Dolpher <dolpher.du at intel.com><mailto:dolpher.du at intel.com>; Zyskowski,

Robert <robert.zyskowski at intel.com><mailto:robert.zyskowski at intel.com>; Daniluk, Lukasz

<lukasz.daniluk at intel.com><mailto:lukasz.daniluk at intel.com>; Zang, Rui <rui.zang at intel.com><mailto:rui.zang at intel.com>;

jdenemar at redhat.com<mailto:jdenemar at redhat.com>

Subject: Re: [libvirt] [RFC] phi support in libvirt



On Mon, Dec 05, 2016 at 04:12:22PM +0000, Feng, Shaohe wrote:

Hi all:



As we are know Intel® Xeon phi targets high-performance computing and

other parallel workloads.

Now qemu has supported phi virtualization,it is time for libvirt to

support phi.

Can you provide pointer to the relevant QEMU changes.



Xeon Phi Knights Landing (KNL) contains 2 primary hardware features, one

is up to 288 CPUs which needs patches to support and we are pushing it,

the other is Multi-Channel DRAM (MCDRAM) which does not need any changes

currently.



Let me introduce more about MCDRAM, MCDRAM is on-package

high-bandwidth

memory (~500GB/s).



On KNL platform, hardware expose MCDRAM as a seperate, CPUless and

remote NUMA node to OS so that MCDRAM will not be allocated by default

(since MCDRAM node has no CPU, every CPU regards MCDRAM node as

remote

node). In this way, MCDRAM can be reserved for certain specific

applications.



Different from the traditional X86 server, There is a special numa

node with Multi-Channel DRAM (MCDRAM) on Phi, but without any CPU .



Now libvirt requires nonempty cpus argument for NUMA node, such as.

<numa>

   <cell id='0' cpus='0-239' memory='80' unit='GiB'/>

   <cell id='1' cpus='240-243' memory='16' unit='GiB'/> </numa>



In order to support phi virtualization, libvirt needs to allow a numa

cell definition without 'cpu' attribution.



Such as:

<numa>

   <cell id='0' cpus='0-239' memory='80' unit='GiB'/>

   <cell id='1' memory='16' unit='GiB'/> </numa>



When a cell without 'cpu', qemu will allocate memory by default MCDRAM

instead of DDR.

There's separate concepts at play which your description here is mixing up.



First is the question of whether the guest NUMA node can be created with

only RAM or CPUs, or a mix of both.

Second is the question of what kind of host RAM (MCDRAM vs DDR) is used

as the backing store for the guest

Guest NUMA node shoulde be created with memory only (keep the same as

host's) and the more important things is the memory should bind to (come

from) host MCDRAM node.

So I suggest libvirt distinguish the MCDRAM



And the MCDRAM numa config as follow, add a "mcdram" attribute for "cell"

element:

<numa>

  <cell id='1'  mcdram='16' unit='GiB'/> </numa>

  <cell id='0' cpus='0-239' memory='80' unit='GiB'/>



No, that is not backwards compatible for applications using libvirt.



We already have a place for storing info about memory backing type,

which we use for huge pages. mcdram should use the same approach

IMHO. eg



<domain>

  ...

  <memoryBacking>

    <mcdram nodeset="3-4"/>

  </memoryBacking>

</domain>



to indicate that nodes 3 & 4 should use mcdram



Regards,

Daniel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20161227/c907e433/attachment-0001.htm>


More information about the libvir-list mailing list