[PATCH] qemu: Relax memory pre-allocation rules

Michal Privoznik mprivozn at redhat.com
Mon Nov 30 10:58:46 UTC 2020


On 11/30/20 11:21 AM, Peter Krempa wrote:
> On Mon, Nov 30, 2020 at 11:06:14 +0100, Michal Privoznik wrote:
>> Currently, we configure QEMU to prealloc memory almost by
>> default. Well, by default for NVDIMMs, hugepages and if user
>> asked us to (via memoryBacking <allocation mode="immediate"/>).
>>
>> However, there are two cases where this approach is not the best:
>>
>> 1) in case when guest's NVDIMM is backed by real life NVDIMM. In
>>     this case users should put <pmem/> into the <memory/> device
>>     <source/>, like this:
>>
>>     <memory model='nvdimm' access='shared'>
>>       <source>
>>         <path>/dev/pmem0</path>
>>         <pmem/>
>>       </source>
>>     </memory>
>>
>>     Instructing QEMU to do prealloc in this case means that each
>>     page of the NVDIMM is "touched" (the first byte is read and
>>     written back - see QEMU commit v2.9.0-rc1~26^2) which cripples
>>     device wear.
>>
>> 2) if free-page-reporting is turned on. While the
>>     free-page-reporting feature might not have a catchy or obvious
>>     name, when enabled it instructs KVM and subsequently QEMU to
>>     free pages no longer used by guest resulting in smaller memory
>>     footprint. And preallocating whole memory goes against this.
>>
>> The BZ comment 11 mentions another, third case 'virtio-mem' but
>> that is not implemented in libvirt, yet.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>   src/qemu/qemu_command.c                               | 11 +++++++++--
>>   .../memory-hotplug-nvdimm-pmem.x86_64-latest.args     |  2 +-
>>   2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
>> index 479bcc0b0c..3df8b5ac76 100644
>> --- a/src/qemu/qemu_command.c
>> +++ b/src/qemu/qemu_command.c
>> @@ -2977,7 +2977,11 @@ qemuBuildMemoryBackendProps(virJSONValuePtr *backendProps,
>>       if (discard == VIR_TRISTATE_BOOL_ABSENT)
>>           discard = def->mem.discard;
>>   
>> -    if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE)
>> +    /* The whole point of free_page_reporting is that as soon as guest frees
>> +     * any memory it is freed in the host too. Prealloc doesn't make much sense
>> +     * then. */
>> +    if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE &&
>> +        def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON)
>>           prealloc = true;
> 
> IIUC def->mem.allocation is an explicit user-specified configuration, in
> such case we should not override the user wish based on a different
> configuration.
> 
> If they don't make sense together technically, we should reject the
> config rather than silently changing it. If it's just semantically wrong
> and pre-existing we may leave it be.
> 
> Additionally the patch is missing a test case for this scenario.
> 

Yeah, Dan already pointed out that this is not really desired. So I will 
leave this hunk out. But to address your point - would it make sense to 
add a valiador check? I mean, something like:

   if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE &&
       def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON) {
       virReportError("this combination doesn't make much sense");
       return -1;
   }


Technically, it is possible to fully allocate memory on domain startup 
and then leave QEMU to free pages (which happens as soon virtio_balloon 
module is loaded), but IMO it doesn't make much sense. Semantically, at 
least to me these two options are mutually exclusive.

Michal




More information about the libvir-list mailing list