[libvirt] [RFC PATCH auto partition NUMA guest domains v1 0/2] auto partition guests providing the host NUMA topology

Wim Ten Have wim.ten.have at oracle.com
Tue Sep 25 10:02:40 UTC 2018


From: Wim ten Have <wim.ten.have at oracle.com>

This patch extends the guest domain administration adding support
to automatically advertise the host NUMA node capabilities obtained
architecture under a guest by creating a vNUMA copy.

The mechanism is enabled by setting the check='numa' attribute under
the CPU 'host-passthrough' topology:
  <cpu mode='host-passthrough' check='numa' .../> 

When enabled the mechanism automatically renders the host capabilities
provided NUMA architecture, evenly balances the guest reserved vcpu
and memory amongst its vNUMA composed cells and have the cell allocated
vcpus pinned towards the host NUMA node physical cpusets.  This in such
way that the host NUMA topology is still in effect under the partitioned
guest domain.

Below example auto partitions the host 'lscpu' listed physical NUMA detail
under a guest domain vNUMA description.

    [root at host ]# lscpu 
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                240
    On-line CPU(s) list:   0-239
    Thread(s) per core:    2
    Core(s) per socket:    15
    Socket(s):             8
    NUMA node(s):          8
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 62
    Model name:            Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz
    Stepping:              7
    CPU MHz:               3449.555
    CPU max MHz:           3600.0000
    CPU min MHz:           1200.0000
    BogoMIPS:              5586.28
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              38400K
    NUMA node0 CPU(s):     0-14,120-134
    NUMA node1 CPU(s):     15-29,135-149
    NUMA node2 CPU(s):     30-44,150-164
    NUMA node3 CPU(s):     45-59,165-179
    NUMA node4 CPU(s):     60-74,180-194
    NUMA node5 CPU(s):     75-89,195-209
    NUMA node6 CPU(s):     90-104,210-224
    NUMA node7 CPU(s):     105-119,225-239
    Flags:                 ...

The guest 'anuma' without the auto partition rendering enabled
reads;   "<cpu mode='host-passthrough' check='none'/>"

    <domain type='kvm'>
      <name>anuma</name>
      <uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid>
      <memory unit='KiB'>67108864</memory>
      <currentMemory unit='KiB'>67108864</currentMemory>
      <vcpu placement='static'>16</vcpu>
      <os>
        <type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
        <boot dev='hd'/>
      </os>
      <features>
        <acpi/>
        <apic/>
        <vmport state='off'/>
      </features>
      <cpu mode='host-passthrough' check='none'/>
      <clock offset='utc'>
        <timer name='rtc' tickpolicy='catchup'/>
        <timer name='pit' tickpolicy='delay'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>destroy</on_crash>
      <pm>
        <suspend-to-mem enabled='no'/>
        <suspend-to-disk enabled='no'/>
      </pm>
      <devices>
        <emulator>/usr/bin/qemu-system-x86_64</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2'/>
          <source file='/var/lib/libvirt/images/anuma.qcow2'/>

Enabling the auto partitioning the guest 'anuma' XML is rewritten
as listed below;   "<cpu mode='host-passthrough' check='numa'>"

    <domain type='kvm'>
      <name>anuma</name>
      <uuid>3f439f5f-1156-4d48-9491-945a2c0abc6d</uuid>
      <memory unit='KiB'>67108864</memory>
      <currentMemory unit='KiB'>67108864</currentMemory>
      <vcpu placement='static'>16</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='0-14,120-134'/>
        <vcpupin vcpu='1' cpuset='15-29,135-149'/>
        <vcpupin vcpu='2' cpuset='30-44,150-164'/>
        <vcpupin vcpu='3' cpuset='45-59,165-179'/>
        <vcpupin vcpu='4' cpuset='60-74,180-194'/>
        <vcpupin vcpu='5' cpuset='75-89,195-209'/>
        <vcpupin vcpu='6' cpuset='90-104,210-224'/>
        <vcpupin vcpu='7' cpuset='105-119,225-239'/>
        <vcpupin vcpu='8' cpuset='0-14,120-134'/>
        <vcpupin vcpu='9' cpuset='15-29,135-149'/>
        <vcpupin vcpu='10' cpuset='30-44,150-164'/>
        <vcpupin vcpu='11' cpuset='45-59,165-179'/>
        <vcpupin vcpu='12' cpuset='60-74,180-194'/>
        <vcpupin vcpu='13' cpuset='75-89,195-209'/>
        <vcpupin vcpu='14' cpuset='90-104,210-224'/>
        <vcpupin vcpu='15' cpuset='105-119,225-239'/>
      </cputune>
      <os>
        <type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
        <boot dev='hd'/>
      </os>
      <features>
        <acpi/>
        <apic/>
        <vmport state='off'/>
      </features>
      <cpu mode='host-passthrough' check='numa'>
        <topology sockets='8' cores='1' threads='2'/>
        <numa>
          <cell id='0' cpus='0,8' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='10'/>
              <sibling id='1' value='21'/>
              <sibling id='2' value='31'/>
              <sibling id='3' value='21'/>
              <sibling id='4' value='21'/>
              <sibling id='5' value='31'/>
              <sibling id='6' value='31'/>
              <sibling id='7' value='31'/>
            </distances>
          </cell>
          <cell id='1' cpus='1,9' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='21'/>
              <sibling id='1' value='10'/>
              <sibling id='2' value='21'/>
              <sibling id='3' value='31'/>
              <sibling id='4' value='31'/>
              <sibling id='5' value='21'/>
              <sibling id='6' value='31'/>
              <sibling id='7' value='31'/>
            </distances>
          </cell>
          <cell id='2' cpus='2,10' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='31'/>
              <sibling id='1' value='21'/>
              <sibling id='2' value='10'/>
              <sibling id='3' value='21'/>
              <sibling id='4' value='31'/>
              <sibling id='5' value='31'/>
              <sibling id='6' value='21'/>
              <sibling id='7' value='31'/>
            </distances>
          </cell>
          <cell id='3' cpus='3,11' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='21'/>
              <sibling id='1' value='31'/>
              <sibling id='2' value='21'/>
              <sibling id='3' value='10'/>
              <sibling id='4' value='31'/>
              <sibling id='5' value='31'/>
              <sibling id='6' value='31'/>
              <sibling id='7' value='21'/>
            </distances>
          </cell>
          <cell id='4' cpus='4,12' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='21'/>
              <sibling id='1' value='31'/>
              <sibling id='2' value='31'/>
              <sibling id='3' value='31'/>
              <sibling id='4' value='10'/>
              <sibling id='5' value='21'/>
              <sibling id='6' value='21'/>
              <sibling id='7' value='31'/>
            </distances>
          </cell>
          <cell id='5' cpus='5,13' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='31'/>
              <sibling id='1' value='21'/>
              <sibling id='2' value='31'/>
              <sibling id='3' value='31'/>
              <sibling id='4' value='21'/>
              <sibling id='5' value='10'/>
              <sibling id='6' value='31'/>
              <sibling id='7' value='21'/>
            </distances>
          </cell>
          <cell id='6' cpus='6,14' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='31'/>
              <sibling id='1' value='31'/>
              <sibling id='2' value='21'/>
              <sibling id='3' value='31'/>
              <sibling id='4' value='21'/>
              <sibling id='5' value='31'/>
              <sibling id='6' value='10'/>
              <sibling id='7' value='21'/>
            </distances>
          </cell>
          <cell id='7' cpus='7,15' memory='8388608' unit='KiB'>
            <distances>
              <sibling id='0' value='31'/>
              <sibling id='1' value='31'/>
              <sibling id='2' value='31'/>
              <sibling id='3' value='21'/>
              <sibling id='4' value='31'/>
              <sibling id='5' value='21'/>
              <sibling id='6' value='21'/>
              <sibling id='7' value='10'/>
            </distances>
          </cell>
        </numa>
      </cpu>
      <clock offset='utc'>
        <timer name='rtc' tickpolicy='catchup'/>
        <timer name='pit' tickpolicy='delay'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>destroy</on_crash>
      <pm>
        <suspend-to-mem enabled='no'/>
        <suspend-to-disk enabled='no'/>
      </pm>
      <devices>
        <emulator>/usr/bin/qemu-system-x86_64</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2'/>
          <source file='/var/lib/libvirt/images/anuma.qcow2'/>

Finally the auto partitioned guest anuma 'lscpu' listed virtual vNUMA detail.

    [root at anuma ~]# lscpu
    Architecture:        x86_64
    CPU op-mode(s):      32-bit, 64-bit
    Byte Order:          Little Endian
    CPU(s):              16
    On-line CPU(s) list: 0-15
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           8
    NUMA node(s):        8
    Vendor ID:           GenuineIntel
    CPU family:          6
    Model:               62
    Model name:          Intel(R) Xeon(R) CPU E7-8895 v2 @ 2.80GHz
    Stepping:            7
    CPU MHz:             2793.268
    BogoMIPS:            5586.53
    Virtualization:      VT-x
    Hypervisor vendor:   KVM
    Virtualization type: full
    L1d cache:           32K
    L1i cache:           32K
    L2 cache:            4096K
    L3 cache:            16384K
    NUMA node0 CPU(s):   0,8
    NUMA node1 CPU(s):   1,9
    NUMA node2 CPU(s):   2,10
    NUMA node3 CPU(s):   3,11
    NUMA node4 CPU(s):   4,12
    NUMA node5 CPU(s):   5,13
    NUMA node6 CPU(s):   6,14
    NUMA node7 CPU(s):   7,15
    Flags:               ...

Wim ten Have (2):
  domain: auto partition guests providing the host NUMA topology
  qemuxml2argv: add tests that exercise vNUMA auto partition topology

 docs/formatdomain.html.in                     |   7 +
 docs/schemas/cputypes.rng                     |   1 +
 src/conf/cpu_conf.c                           |   3 +-
 src/conf/cpu_conf.h                           |   1 +
 src/conf/domain_conf.c                        | 166 ++++++++++++++++++
 .../cpu-host-passthrough-nonuma.args          |  25 +++
 .../cpu-host-passthrough-nonuma.xml           |  18 ++
 .../cpu-host-passthrough-numa.args            |  29 +++
 .../cpu-host-passthrough-numa.xml             |  18 ++
 tests/qemuxml2argvtest.c                      |   2 +
 10 files changed, 269 insertions(+), 1 deletion(-)
 create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args
 create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml
 create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args
 create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml

-- 
2.17.1




More information about the libvir-list mailing list