[PATCH] qemu: Relax memory pre-allocation rules

Daniel P. Berrangé berrange at redhat.com
Mon Nov 30 10:16:24 UTC 2020


On Mon, Nov 30, 2020 at 11:06:14AM +0100, Michal Privoznik wrote:
> Currently, we configure QEMU to prealloc memory almost by
> default. Well, by default for NVDIMMs, hugepages and if user
> asked us to (via memoryBacking <allocation mode="immediate"/>).
> 
> However, there are two cases where this approach is not the best:
> 
> 1) in case when guest's NVDIMM is backed by real life NVDIMM. In
>    this case users should put <pmem/> into the <memory/> device
>    <source/>, like this:
> 
>    <memory model='nvdimm' access='shared'>
>      <source>
>        <path>/dev/pmem0</path>
>        <pmem/>
>      </source>
>    </memory>
> 
>    Instructing QEMU to do prealloc in this case means that each
>    page of the NVDIMM is "touched" (the first byte is read and
>    written back - see QEMU commit v2.9.0-rc1~26^2) which cripples
>    device wear.
> 
> 2) if free-page-reporting is turned on. While the
>    free-page-reporting feature might not have a catchy or obvious
>    name, when enabled it instructs KVM and subsequently QEMU to
>    free pages no longer used by guest resulting in smaller memory
>    footprint. And preallocating whole memory goes against this.
> 
> The BZ comment 11 mentions another, third case 'virtio-mem' but
> that is not implemented in libvirt, yet.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
>  src/qemu/qemu_command.c                               | 11 +++++++++--
>  .../memory-hotplug-nvdimm-pmem.x86_64-latest.args     |  2 +-
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 479bcc0b0c..3df8b5ac76 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -2977,7 +2977,11 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
>      if (discard == VIR_TRISTATE_BOOL_ABSENT)
>          discard = def->mem.discard;
>  
> -    if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE)
> +    /* The whole point of free_page_reporting is that as soon as guest frees
> +     * any memory it is freed in the host too. Prealloc doesn't make much sense
> +     * then. */
> +    if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE &&
> +        def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON)
>          prealloc = true;

If the user asked for allocation == immediate, we should not be
silently ignoring that request. Isn't the scenario described simply
a wierd user configuration scenario and if they don't want that, then
then they can set     <allocation mode="ondemand"/> instead.

>      if (virDomainNumatuneGetMode(def->numa, mem->targetNode, &mode) < 0 &&
> @@ -3064,7 +3068,10 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
>  
>          if (mem->nvdimmPath) {
>              memPath = g_strdup(mem->nvdimmPath);
> -            prealloc = true;



> +            /* If the NVDIMM is a real device then there's nothing to prealloc.
> +             * If anyhing, we would be only wearing off the device. */
> +            if (!mem->nvdimmPmem)
> +                prealloc = true;

I wonder if QEMU itself should take this optimization to skip its
allocation logic ? 

>          } else if (useHugepage) {
>              if (qemuGetDomainHupageMemPath(priv->driver, def, pagesize, &memPath) < 0)
>                  return -1;
> diff --git a/tests/qemuxml2argvdata/memory-hotplug-nvdimm-pmem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-nvdimm-pmem.x86_64-latest.args
> index cac02a6f6d..fb4ae4b518 100644
> --- a/tests/qemuxml2argvdata/memory-hotplug-nvdimm-pmem.x86_64-latest.args
> +++ b/tests/qemuxml2argvdata/memory-hotplug-nvdimm-pmem.x86_64-latest.args
> @@ -20,7 +20,7 @@ file=/tmp/lib/domain--1-QEMUGuest1/master-key.aes \
>  -object memory-backend-ram,id=ram-node0,size=224395264 \
>  -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
>  -object memory-backend-file,id=memnvdimm0,mem-path=/tmp/nvdimm,share=no,\
> -prealloc=yes,size=536870912,pmem=yes \
> +size=536870912,pmem=yes \
>  -device nvdimm,node=0,memdev=memnvdimm0,id=nvdimm0,slot=0 \
>  -uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \
>  -display none \
> -- 
> 2.26.2
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list