[PATCH 03/10] conf: Introduce virtio-mem <memory/> model

David Hildenbrand david at redhat.com
Mon Jan 25 14:17:22 UTC 2021


On 22.01.21 19:16, Daniel Henrique Barboza wrote:
> 
> 
> On 1/22/21 9:50 AM, Michal Privoznik wrote:
>> The virtio-mem is paravirtualized mechanism of adding/removing
>> memory to/from a VM. A virtio-mem-pci device is split into blocks
>> of equal size which are then exposed (all or only a requested
>> portion of them) to the guest kernel to use as regular memory.
>> Therefore, the device has two important attributes:
>>
>>    1) block-size, which defines the size of a block
>>    2) requested-size, which defines how much memory (in bytes)
>>       is the device requested to expose to the guest.
>>
>> The 'block-size' is configured on command line and immutable
>> throughout device's lifetime. The 'requested-size' can be set on
>> the command line too, but also is adjustable via monitor. In
>> fact, that is how management software places its requests to
>> change the memory allocation. If it wants to give more memory to
>> the guest it changes 'requested-size' to a bigger value, and if it
>> wants to shrink guest memory it changes the 'requested-size' to a
>> smaller value. Note, value of zero means that guest should
>> release all memory offered by the device. Of course, guest has to
>> cooperate. Therefore, there is a third attribute 'size' which is
>> read only and reflects how much memory the guest still has. This
>> can be different to 'requested-size', obviously. Because of name
>> clash, I've named it 'actualsize' and it is dealt with in future
>> commits (it is a runtime information anyway).
>>
>> In the backend, memory for virtio-mem is backed by usual objects:
>> memory-backend-{ram,file,memfd} and their size puts the cap on
>> the amount of memory that a virtio-mem device can offer to a
>> guest. But we are already able to express this info using <size/>
>> under <target/>.
>>
>> Therefore, we need only two more elements to cover 'block-size'
>> and 'requested-size' attributes. This is the XML I've came up
>> with:
>>
>>    <memory model='virtio-mem'>
>>      <source>
>>        <nodemask>1-3</nodemask>
>>        <pagesize unit='KiB'>2048</pagesize>
>>      </source>
>>      <target>
>>        <size unit='KiB'>2097152</size>
>>        <node>0</node>
>>        <block unit='KiB'>2048</block>
>>        <requested unit='KiB'>1048576</requested>
>>      </target>
>>      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
>>    </memory>
>>
>> I hope by now it is obvious that:
>>
>>    1) 'requested-size' must be an integer multiple of
>>       'block-size', and
>>    2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
>>       address.
>>
>> Then there is a limitation that the minimal 'block-size' is
>> transparent huge page size (I'll leave this without explanation).
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>   docs/formatdomain.rst                         | 35 ++++++++--
>>   docs/schemas/domaincommon.rng                 | 11 ++++
>>   src/conf/domain_conf.c                        | 53 ++++++++++++++-
>>   src/conf/domain_conf.h                        |  3 +
>>   src/conf/domain_validate.c                    | 39 +++++++++++
>>   src/qemu/qemu_alias.c                         |  1 +
>>   src/qemu/qemu_command.c                       |  1 +
>>   src/qemu/qemu_domain.c                        | 10 +++
>>   src/qemu/qemu_domain_address.c                | 37 ++++++++---
>>   src/qemu/qemu_validate.c                      |  8 +++
>>   src/security/security_apparmor.c              |  1 +
>>   src/security/security_dac.c                   |  2 +
>>   src/security/security_selinux.c               |  2 +
>>   tests/domaincapsmock.c                        |  9 +++
>>   .../memory-hotplug-virtio-mem.xml             | 66 +++++++++++++++++++
>>   ...emory-hotplug-virtio-mem.x86_64-latest.xml |  1 +
>>   tests/qemuxml2xmltest.c                       |  1 +
>>   17 files changed, 264 insertions(+), 16 deletions(-)
>>   create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
>>   create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
>>
>> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
>> index af540391db..2938758ec2 100644
>> --- a/docs/formatdomain.rst
>> +++ b/docs/formatdomain.rst
>> @@ -7267,6 +7267,18 @@ Example: usage of the memory devices
>>            <size unit='KiB'>524288</size>
>>          </target>
>>        </memory>
>> +     <memory model='virtio-mem'>
>> +       <source>
>> +         <nodemask>1-3</nodemask>
>> +         <pagesize unit='KiB'>2048</pagesize>
>> +       </source>
>> +       <target>
>> +         <size unit='KiB'>2097152</size>
>> +         <node>0</node>
>> +         <block unit='KiB'>2048</block>
>> +         <requested unit='KiB'>1048576</requested>
>> +       </target>
>> +     </memory>
>>      </devices>
>>      ...
>>   
>> @@ -7274,7 +7286,8 @@ Example: usage of the memory devices
>>      Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since
>>      1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module.
>>      :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized
>> -   persistent memory device. :since:`Since 7.1.0`
>> +   persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model
>> +   to add paravirtualized memory device. :since: `Since 7.1.0`
>>   
>>   ``access``
>>      An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
>> @@ -7297,10 +7310,11 @@ Example: usage of the memory devices
>>      allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0`
>>   
>>   ``source``
>> -   For model ``dimm`` this element is optional and allows to fine tune the
>> -   source of the memory used for the given memory device. If the element is not
>> -   provided defaults configured via ``numatune`` are used. If ``dimm`` is
>> -   provided, then the following optional elements can be provided as well:
>> +   For model ``dimm`` and model ``virtio-mem`` this element is optional and
>> +   allows to fine tune the source of the memory used for the given memory
>> +   device. If the element is not provided defaults configured via ``numatune``
>> +   are used. If the element is provided, then the following optional elements
>> +   can be provided:
>>   
>>      ``pagesize``
>>         This element can be used to override the default host page size used for
>> @@ -7366,6 +7380,17 @@ Example: usage of the memory devices
>>         so other backend types should use the ``readonly`` element. :since:`Since
>>         5.0.0`
>>   
>> +   ``block``
>> +     For ``virtio-mem`` only.
>> +     The size of an individual block, granularity of division of memory module.
>> +     Must be power of two and at least equal to size of a transparent hugepage
>> +     (2MiB on x84_64). The default is hypervisor dependant.
> 
> I don't think that 'dependant' is wrong in this context but 'dependent' is more
> common.
> 
>> +
>> +   ``requested``
>> +     For ``virtio-mem`` only.
>> +     The total size of blocks exposed to the guest. Must respect ``block``
>> +     granularity.
>> +
>>   :anchor:`<a id="elementsIommu"/>`
>>   
>>   IOMMU devices
>> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
>> index a4bddcf132..5bc120073e 100644
>> --- a/docs/schemas/domaincommon.rng
>> +++ b/docs/schemas/domaincommon.rng
>> @@ -6020,6 +6020,7 @@
>>             <value>dimm</value>
>>             <value>nvdimm</value>
>>             <value>virtio-pmem</value>
>> +          <value>virtio-mem</value>
>>           </choice>
>>         </attribute>
>>         <optional>
>> @@ -6104,6 +6105,16 @@
>>               <ref name="unsignedInt"/>
>>             </element>
>>           </optional>
>> +        <optional>
>> +          <element name="block">
>> +            <ref name="scaledInteger"/>
>> +          </element>
>> +        </optional>
>> +        <optional>
>> +          <element name="requested">
>> +            <ref name="scaledInteger"/>
>> +          </element>
>> +        </optional>
>>           <optional>
>>             <element name="label">
>>               <element name="size">
>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>> index dab4f10326..f8c5a40b24 100644
>> --- a/src/conf/domain_conf.c
>> +++ b/src/conf/domain_conf.c
>> @@ -1310,6 +1310,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel,
>>                 "dimm",
>>                 "nvdimm",
>>                 "virtio-pmem",
>> +              "virtio-mem",
>>   );
>>   
>>   VIR_ENUM_IMPL(virDomainShmemModel,
>> @@ -5359,6 +5360,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDefPtr mem,
>>           }
>>           break;
>>   
>> +    case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>>       case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>>       case VIR_DOMAIN_MEMORY_MODEL_NONE:
>>       case VIR_DOMAIN_MEMORY_MODEL_LAST:
>> @@ -15322,6 +15324,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node,
>>   
>>       switch (def->model) {
>>       case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> +    case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>>           if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt,
>>                                    &def->pagesize, false, false) < 0)
>>               return -1;
>> @@ -15388,7 +15391,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
>>                                &def->size, true, false) < 0)
>>           return -1;
>>   
>> -    if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
>> +    switch (def->model) {
>> +    case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
>>           if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt,
>>                                    &def->labelsize, false, false) < 0)
>>               return -1;
>> @@ -15407,6 +15411,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
>>   
>>           if (virXPathBoolean("boolean(./readonly)", ctxt))
>>               def->readonly = true;
>> +        break;
>> +
>> +    case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> +        if (virDomainParseMemory("./block", "./block/@unit", ctxt,
>> +                                 &def->blocksize, false, false) < 0)
>> +            return -1;
>> +
>> +        if (virDomainParseMemory("./requested", "./requested/@unit", ctxt,
>> +                                 &def->requestedsize, false, false) < 0)
>> +            return -1;
>> +        break;
>> +
>> +    case VIR_DOMAIN_MEMORY_MODEL_NONE:
>> +    case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> +    case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
>> +    case VIR_DOMAIN_MEMORY_MODEL_LAST:
>> +        break;
>>       }
>>   
>>       return 0;
>> @@ -17214,11 +17235,14 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def,
>>           /* target info -> always present */
>>           if (tmp->model != mem->model ||
>>               tmp->targetNode != mem->targetNode ||
>> -            tmp->size != mem->size)
>> +            tmp->size != mem->size ||
>> +            tmp->blocksize != mem->blocksize ||
>> +            tmp->requestedsize != mem->requestedsize)
>>               continue;
>>   
>>           switch (mem->model) {
>>           case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> +        case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>>               /* source stuff -> match with device */
>>               if (tmp->pagesize != mem->pagesize)
>>                   continue;
>> @@ -22784,6 +22808,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDefPtr src,
>>           return false;
>>       }
>>   
>> +    if (src->blocksize != dst->blocksize) {
>> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> +                       _("Target memory device block size '%llu' doesn't match "
>> +                         "source memory device block size '%llu'"),
>> +                       dst->blocksize, src->blocksize);
>> +        return false;
>> +    }
>> +
>> +    if (src->requestedsize != dst->requestedsize) {
>> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> +                       _("Target memory device requested size '%llu' doesn't match "
>> +                         "source memory device requested size '%llu'"),
>> +                       dst->requestedsize, src->requestedsize);
>> +        return false;
>> +    }
>> +
>>       if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
>>           if (src->labelsize != dst->labelsize) {
>>               virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> @@ -26507,6 +26547,7 @@ virDomainMemorySourceDefFormat(virBufferPtr buf,
>>   
>>       switch (def->model) {
>>       case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> +    case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>>           if (def->sourceNodes) {
>>               if (!(bitmap = virBitmapFormat(def->sourceNodes)))
>>                   return -1;
>> @@ -26563,6 +26604,14 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf,
>>       if (def->readonly)
>>           virBufferAddLit(&childBuf, "<readonly/>\n");
>>   
>> +    if (def->blocksize) {
>> +        virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n",
>> +                          def->blocksize);
>> +
>> +        virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n",
>> +                          def->requestedsize);
>> +    }
>> +
>>       virXMLFormatElement(buf, "target", NULL, &childBuf);
>>   }
>>   
>> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
>> index 95ad052891..5d89ecfe9d 100644
>> --- a/src/conf/domain_conf.h
>> +++ b/src/conf/domain_conf.h
>> @@ -2308,6 +2308,7 @@ typedef enum {
>>       VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */
>>       VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */
>>       VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */
>> +    VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */
>>   
>>       VIR_DOMAIN_MEMORY_MODEL_LAST
>>   } virDomainMemoryModel;
>> @@ -2328,6 +2329,8 @@ struct _virDomainMemoryDef {
>>       int targetNode;
>>       unsigned long long size; /* kibibytes */
>>       unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */
>> +    unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */
>> +    unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */
>>       bool readonly; /* valid only for NVDIMM */
>>   
>>       /* required for QEMU NVDIMM ppc64 support */
>> diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
>> index 649fc335ac..b5a0c09468 100644
>> --- a/src/conf/domain_validate.c
>> +++ b/src/conf/domain_validate.c
>> @@ -25,6 +25,7 @@
>>   #include "virconftypes.h"
>>   #include "virlog.h"
>>   #include "virutil.h"
>> +#include "virhostmem.h"
>>   
>>   #define VIR_FROM_THIS VIR_FROM_DOMAIN
>>   
>> @@ -1389,6 +1390,8 @@ static int
>>   virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
>>                              const virDomainDef *def)
>>   {
>> +    unsigned long long thpSize;
>> +
>>       switch (mem->model) {
>>       case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
>>           if (!mem->nvdimmPath) {
>> @@ -1442,6 +1445,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
>>                              _("virtio-pmem does not support NUMA nodes"));
>>               return -1;
>>           }
>> +        break;
>> +
>> +    case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> +        if (mem->requestedsize > mem->size) {
>> +            virReportError(VIR_ERR_XML_DETAIL, "%s",
>> +                           _("requested size must be smaller than @size"));
>> +            return -1;
>> +        }
>> +
>> +        if (!VIR_IS_POW2(mem->blocksize)) {
>> +            virReportError(VIR_ERR_XML_DETAIL, "%s",
>> +                           _("block size must be a power of two"));
>> +            return -1;
>> +        }
>> +
>> +        if (virHostMemGetTHPSize(&thpSize) < 0) {
>> +            /* We failed to get THP size, fall back to a sane default. On
>> +             * almost every architecture the size will be 2MiB, except for some
>> +             * funky arches like sparc and m68k. Use 2MiB and refine later if
>> +             * somebody complains. */
>> +            thpSize = 2048;
> 
> FWIW, a Power 9 server uses 2MiB too:
> 
> $  cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
> 2097152
> 
> 
> I don't think you should worry about too much since x86 is the only arch that is
> supporting virtio-mem (for now).


So, in QEMU we use the following logic right now:

get_block_size() {
	block_size = backend_pagesize();
	if (block_size == NATIVE_PAGE_SIZE)
		return MAX(block_size, native_thp_size());
	return MAX(block_size, 1 * MiB);
}

detect_thp_size() {
	thp_size = read_from_file();
	if (!thp_size || thp_size > 16 * MiB) {
		if (s390x)
			return 1 * MiB;
		return 2 * MiB;
	}
	return thp_size;
}


Especially, we also cap big block sizes (esp. arm64 with currently 512
MiB THP), as we prefer flexibility at this point.


So yes, on a x86-4 *host* we'll usually end up 2 MiB in QEMU. On arm64
it can be quite different.

> 
> 
> Reviewed-by: Daniel Henrique Barboza <danielhb413 at gmail.com>


-- 
Thanks,

David / dhildenb




More information about the libvir-list mailing list