[PATCH v2 2/4] qemu_domain.c: NUMA CPUs auto-fill for incomplete topologies

Michal Privoznik mprivozn at redhat.com
Thu Jun 18 10:34:32 UTC 2020


On 6/10/20 8:35 PM, Daniel Henrique Barboza wrote:
> Libvirt allows the user to define an incomplete NUMA topology, where
> the sum of all CPUs in each cell is less than the total of VCPUs.
> What ends up happening is that QEMU allocates the non-enumerated CPUs
> in the first NUMA node. This behavior is being flagged as 'to be
> deprecated' at least since QEMU commit ec78f8114bc4 ("numa: use
> possible_cpus for not mapped CPUs check").
> 
> In [1], Maxiwell suggested that we forbid the user to define such
> topologies. In his review [2], Peter Krempa pointed out that we can't
> break existing guests, and suggested that Libvirt should emulate the
> QEMU behavior of putting the remaining vCPUs in the first NUMA node
> in these cases.
> 
> This patch implements Peter Krempa's suggestion. Since we're going
> to most likely end up with disjointed NUMA configuration in node 0
> after the auto-fill, we're making auto-fill dependent on QEMU_CAPS_NUMA.
> 
> A following patch will update the documentation not just to inform
> about the auto-fill mechanic with incomplete NUMA topologies, but also
> to discourage the user to create such topologies in the future. This
> approach also makes Libvirt independent of whether QEMU changes
> its current behavior since we're either auto-filling the CPUs in
> node 0 or the user (hopefully) is aware that incomplete topologies,
> although supported in Libvirt, are to be avoided.
> 
> [1] https://www.redhat.com/archives/libvir-list/2019-June/msg00224.html
> [2] https://www.redhat.com/archives/libvir-list/2019-June/msg00263.html
> Signed-off-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
> ---
>   src/qemu/qemu_domain.c | 47 ++++++++++++++++++++++++++++++++++++++++++
>   src/qemu/qemu_domain.h |  4 ++++
>   src/qemu/qemu_driver.c |  9 ++++++++
>   3 files changed, 60 insertions(+)
> 
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 2dad823a86..76191e028b 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -4963,6 +4963,50 @@ qemuDomainDefTsegPostParse(virDomainDefPtr def,
>   }
>   
>   
> +/**
> + * qemuDomainDefNumaCPUsRectify:
> + * @numa: pointer to numa definition
> + * @maxCpus: number of CPUs this numa is supposed to have
> + *
> + * This function emulates the (to be deprecated) behavior of filling
> + * up in node0 with the remaining CPUs, in case of an incomplete NUMA
> + * setup, up to getVcpusMax.
> + *
> + * Returns: 0 on success, -1 on error
> + */
> +int
> +qemuDomainDefNumaCPUsRectify(virDomainDefPtr def, virQEMUCapsPtr qemuCaps)
> +{
> +    unsigned int vcpusMax, numacpus;
> +
> +    /* QEMU_CAPS_NUMA tells us if QEMU is able to handle disjointed
> +     * NUMA CPU ranges. The filling process will create a disjointed
> +     * setup in node0 most of the time. Do not proceed if QEMU
> +     * can't handle it.*/
> +    if (virDomainNumaGetNodeCount(def->numa) == 0 ||
> +        !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA))
> +        return 0;
> +
> +    vcpusMax = virDomainDefGetVcpusMax(def);
> +    numacpus = virDomainNumaGetCPUCountTotal(def->numa);
> +
> +    if (numacpus < vcpusMax) {
> +        if (virDomainNumaFillCPUsInNode(def->numa, 0, vcpusMax) < 0)
> +            return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +
> +static int
> +qemuDomainDefNumaCPUsPostParse(virDomainDefPtr def,
> +                               virQEMUCapsPtr qemuCaps)
> +{
> +    return qemuDomainDefNumaCPUsRectify(def, qemuCaps);
> +}
> +
> +
>   static int
>   qemuDomainDefPostParseBasic(virDomainDefPtr def,
>                               void *opaque G_GNUC_UNUSED)
> @@ -5049,6 +5093,9 @@ qemuDomainDefPostParse(virDomainDefPtr def,
>       if (qemuDomainDefTsegPostParse(def, qemuCaps) < 0)
>           return -1;
>   
> +    if (qemuDomainDefNumaCPUsPostParse(def, qemuCaps) < 0)
> +        return -1;
> +
>       return 0;
>   }
>   
> diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
> index 41d3f1561d..e78a2b935d 100644
> --- a/src/qemu/qemu_domain.h
> +++ b/src/qemu/qemu_domain.h
> @@ -1297,3 +1297,7 @@ qemuDomainInitializePflashStorageSource(virDomainObjPtr vm);
>   bool
>   qemuDomainDiskBlockJobIsSupported(virDomainObjPtr vm,
>                                     virDomainDiskDefPtr disk);
> +
> +int
> +qemuDomainDefNumaCPUsRectify(virDomainDefPtr def,
> +                             virQEMUCapsPtr qemuCaps);
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 88517ba6a7..ff9414f3c4 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -4999,6 +4999,7 @@ qemuDomainSetVcpusMax(virQEMUDriverPtr driver,
>                         unsigned int nvcpus)
>   {
>       g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);
> +    g_autoptr(virQEMUCaps) qemuCaps = NULL;
>       unsigned int topologycpus;
>   
>       if (def) {
> @@ -5029,6 +5030,14 @@ qemuDomainSetVcpusMax(virQEMUDriverPtr driver,
>       if (virDomainDefSetVcpusMax(persistentDef, nvcpus, driver->xmlopt) < 0)
>           return -1;
>   
> +    /* re-adjust NUMA nodes if needed */
> +    if (!(qemuCaps = virQEMUCapsCacheLookup(driver->qemuCapsCache,
> +                                            persistentDef->emulator)))

This will not do what you think it will do. Although, perhaps for 
QEMU_CAPS_NUMA (introduced in what QEMU 2.1.0? point is ancient QEMU) it 
doesn't matter because the likely hood is very small. But, 
virQEMUCapsCacheLookup() will return caps as detected now, by current 
version of libvirt. But what we need is to compare the capability 
against the set of caps when the domain was started (vm->priv->qemuCaps).

For instance (and this shows how unlikely this corner case is, but still):

1) a domain is started with qemu-1.5.0 (or what the minimal version is)
2) yum update qemu
3) virQEMUCapsCacheLookup() will now fetch new caps for the updated QEMU.


Michal




More information about the libvir-list mailing list