[PATCH 03/10] conf: Introduce virtio-mem <memory/> model
David Hildenbrand
david at redhat.com
Mon Jan 25 14:17:22 UTC 2021
On 22.01.21 19:16, Daniel Henrique Barboza wrote:
>
>
> On 1/22/21 9:50 AM, Michal Privoznik wrote:
>> The virtio-mem is paravirtualized mechanism of adding/removing
>> memory to/from a VM. A virtio-mem-pci device is split into blocks
>> of equal size which are then exposed (all or only a requested
>> portion of them) to the guest kernel to use as regular memory.
>> Therefore, the device has two important attributes:
>>
>> 1) block-size, which defines the size of a block
>> 2) requested-size, which defines how much memory (in bytes)
>> is the device requested to expose to the guest.
>>
>> The 'block-size' is configured on command line and immutable
>> throughout device's lifetime. The 'requested-size' can be set on
>> the command line too, but also is adjustable via monitor. In
>> fact, that is how management software places its requests to
>> change the memory allocation. If it wants to give more memory to
>> the guest it changes 'requested-size' to a bigger value, and if it
>> wants to shrink guest memory it changes the 'requested-size' to a
>> smaller value. Note, value of zero means that guest should
>> release all memory offered by the device. Of course, guest has to
>> cooperate. Therefore, there is a third attribute 'size' which is
>> read only and reflects how much memory the guest still has. This
>> can be different to 'requested-size', obviously. Because of name
>> clash, I've named it 'actualsize' and it is dealt with in future
>> commits (it is a runtime information anyway).
>>
>> In the backend, memory for virtio-mem is backed by usual objects:
>> memory-backend-{ram,file,memfd} and their size puts the cap on
>> the amount of memory that a virtio-mem device can offer to a
>> guest. But we are already able to express this info using <size/>
>> under <target/>.
>>
>> Therefore, we need only two more elements to cover 'block-size'
>> and 'requested-size' attributes. This is the XML I've came up
>> with:
>>
>> <memory model='virtio-mem'>
>> <source>
>> <nodemask>1-3</nodemask>
>> <pagesize unit='KiB'>2048</pagesize>
>> </source>
>> <target>
>> <size unit='KiB'>2097152</size>
>> <node>0</node>
>> <block unit='KiB'>2048</block>
>> <requested unit='KiB'>1048576</requested>
>> </target>
>> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
>> </memory>
>>
>> I hope by now it is obvious that:
>>
>> 1) 'requested-size' must be an integer multiple of
>> 'block-size', and
>> 2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
>> address.
>>
>> Then there is a limitation that the minimal 'block-size' is
>> transparent huge page size (I'll leave this without explanation).
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>> docs/formatdomain.rst | 35 ++++++++--
>> docs/schemas/domaincommon.rng | 11 ++++
>> src/conf/domain_conf.c | 53 ++++++++++++++-
>> src/conf/domain_conf.h | 3 +
>> src/conf/domain_validate.c | 39 +++++++++++
>> src/qemu/qemu_alias.c | 1 +
>> src/qemu/qemu_command.c | 1 +
>> src/qemu/qemu_domain.c | 10 +++
>> src/qemu/qemu_domain_address.c | 37 ++++++++---
>> src/qemu/qemu_validate.c | 8 +++
>> src/security/security_apparmor.c | 1 +
>> src/security/security_dac.c | 2 +
>> src/security/security_selinux.c | 2 +
>> tests/domaincapsmock.c | 9 +++
>> .../memory-hotplug-virtio-mem.xml | 66 +++++++++++++++++++
>> ...emory-hotplug-virtio-mem.x86_64-latest.xml | 1 +
>> tests/qemuxml2xmltest.c | 1 +
>> 17 files changed, 264 insertions(+), 16 deletions(-)
>> create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
>> create mode 120000 tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
>>
>> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
>> index af540391db..2938758ec2 100644
>> --- a/docs/formatdomain.rst
>> +++ b/docs/formatdomain.rst
>> @@ -7267,6 +7267,18 @@ Example: usage of the memory devices
>> <size unit='KiB'>524288</size>
>> </target>
>> </memory>
>> + <memory model='virtio-mem'>
>> + <source>
>> + <nodemask>1-3</nodemask>
>> + <pagesize unit='KiB'>2048</pagesize>
>> + </source>
>> + <target>
>> + <size unit='KiB'>2097152</size>
>> + <node>0</node>
>> + <block unit='KiB'>2048</block>
>> + <requested unit='KiB'>1048576</requested>
>> + </target>
>> + </memory>
>> </devices>
>> ...
>>
>> @@ -7274,7 +7286,8 @@ Example: usage of the memory devices
>> Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since
>> 1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module.
>> :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized
>> - persistent memory device. :since:`Since 7.1.0`
>> + persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model
>> + to add paravirtualized memory device. :since: `Since 7.1.0`
>>
>> ``access``
>> An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
>> @@ -7297,10 +7310,11 @@ Example: usage of the memory devices
>> allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0`
>>
>> ``source``
>> - For model ``dimm`` this element is optional and allows to fine tune the
>> - source of the memory used for the given memory device. If the element is not
>> - provided defaults configured via ``numatune`` are used. If ``dimm`` is
>> - provided, then the following optional elements can be provided as well:
>> + For model ``dimm`` and model ``virtio-mem`` this element is optional and
>> + allows to fine tune the source of the memory used for the given memory
>> + device. If the element is not provided defaults configured via ``numatune``
>> + are used. If the element is provided, then the following optional elements
>> + can be provided:
>>
>> ``pagesize``
>> This element can be used to override the default host page size used for
>> @@ -7366,6 +7380,17 @@ Example: usage of the memory devices
>> so other backend types should use the ``readonly`` element. :since:`Since
>> 5.0.0`
>>
>> + ``block``
>> + For ``virtio-mem`` only.
>> + The size of an individual block, granularity of division of memory module.
>> + Must be power of two and at least equal to size of a transparent hugepage
>> + (2MiB on x84_64). The default is hypervisor dependant.
>
> I don't think that 'dependant' is wrong in this context but 'dependent' is more
> common.
>
>> +
>> + ``requested``
>> + For ``virtio-mem`` only.
>> + The total size of blocks exposed to the guest. Must respect ``block``
>> + granularity.
>> +
>> :anchor:`<a id="elementsIommu"/>`
>>
>> IOMMU devices
>> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
>> index a4bddcf132..5bc120073e 100644
>> --- a/docs/schemas/domaincommon.rng
>> +++ b/docs/schemas/domaincommon.rng
>> @@ -6020,6 +6020,7 @@
>> <value>dimm</value>
>> <value>nvdimm</value>
>> <value>virtio-pmem</value>
>> + <value>virtio-mem</value>
>> </choice>
>> </attribute>
>> <optional>
>> @@ -6104,6 +6105,16 @@
>> <ref name="unsignedInt"/>
>> </element>
>> </optional>
>> + <optional>
>> + <element name="block">
>> + <ref name="scaledInteger"/>
>> + </element>
>> + </optional>
>> + <optional>
>> + <element name="requested">
>> + <ref name="scaledInteger"/>
>> + </element>
>> + </optional>
>> <optional>
>> <element name="label">
>> <element name="size">
>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>> index dab4f10326..f8c5a40b24 100644
>> --- a/src/conf/domain_conf.c
>> +++ b/src/conf/domain_conf.c
>> @@ -1310,6 +1310,7 @@ VIR_ENUM_IMPL(virDomainMemoryModel,
>> "dimm",
>> "nvdimm",
>> "virtio-pmem",
>> + "virtio-mem",
>> );
>>
>> VIR_ENUM_IMPL(virDomainShmemModel,
>> @@ -5359,6 +5360,7 @@ virDomainMemoryDefPostParse(virDomainMemoryDefPtr mem,
>> }
>> break;
>>
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> case VIR_DOMAIN_MEMORY_MODEL_NONE:
>> case VIR_DOMAIN_MEMORY_MODEL_LAST:
>> @@ -15322,6 +15324,7 @@ virDomainMemorySourceDefParseXML(xmlNodePtr node,
>>
>> switch (def->model) {
>> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> if (virDomainParseMemory("./pagesize", "./pagesize/@unit", ctxt,
>> &def->pagesize, false, false) < 0)
>> return -1;
>> @@ -15388,7 +15391,8 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
>> &def->size, true, false) < 0)
>> return -1;
>>
>> - if (def->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
>> + switch (def->model) {
>> + case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
>> if (virDomainParseMemory("./label/size", "./label/size/@unit", ctxt,
>> &def->labelsize, false, false) < 0)
>> return -1;
>> @@ -15407,6 +15411,23 @@ virDomainMemoryTargetDefParseXML(xmlNodePtr node,
>>
>> if (virXPathBoolean("boolean(./readonly)", ctxt))
>> def->readonly = true;
>> + break;
>> +
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> + if (virDomainParseMemory("./block", "./block/@unit", ctxt,
>> + &def->blocksize, false, false) < 0)
>> + return -1;
>> +
>> + if (virDomainParseMemory("./requested", "./requested/@unit", ctxt,
>> + &def->requestedsize, false, false) < 0)
>> + return -1;
>> + break;
>> +
>> + case VIR_DOMAIN_MEMORY_MODEL_NONE:
>> + case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
>> + case VIR_DOMAIN_MEMORY_MODEL_LAST:
>> + break;
>> }
>>
>> return 0;
>> @@ -17214,11 +17235,14 @@ virDomainMemoryFindByDefInternal(virDomainDefPtr def,
>> /* target info -> always present */
>> if (tmp->model != mem->model ||
>> tmp->targetNode != mem->targetNode ||
>> - tmp->size != mem->size)
>> + tmp->size != mem->size ||
>> + tmp->blocksize != mem->blocksize ||
>> + tmp->requestedsize != mem->requestedsize)
>> continue;
>>
>> switch (mem->model) {
>> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> /* source stuff -> match with device */
>> if (tmp->pagesize != mem->pagesize)
>> continue;
>> @@ -22784,6 +22808,22 @@ virDomainMemoryDefCheckABIStability(virDomainMemoryDefPtr src,
>> return false;
>> }
>>
>> + if (src->blocksize != dst->blocksize) {
>> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> + _("Target memory device block size '%llu' doesn't match "
>> + "source memory device block size '%llu'"),
>> + dst->blocksize, src->blocksize);
>> + return false;
>> + }
>> +
>> + if (src->requestedsize != dst->requestedsize) {
>> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> + _("Target memory device requested size '%llu' doesn't match "
>> + "source memory device requested size '%llu'"),
>> + dst->requestedsize, src->requestedsize);
>> + return false;
>> + }
>> +
>> if (src->model == VIR_DOMAIN_MEMORY_MODEL_NVDIMM) {
>> if (src->labelsize != dst->labelsize) {
>> virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>> @@ -26507,6 +26547,7 @@ virDomainMemorySourceDefFormat(virBufferPtr buf,
>>
>> switch (def->model) {
>> case VIR_DOMAIN_MEMORY_MODEL_DIMM:
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> if (def->sourceNodes) {
>> if (!(bitmap = virBitmapFormat(def->sourceNodes)))
>> return -1;
>> @@ -26563,6 +26604,14 @@ virDomainMemoryTargetDefFormat(virBufferPtr buf,
>> if (def->readonly)
>> virBufferAddLit(&childBuf, "<readonly/>\n");
>>
>> + if (def->blocksize) {
>> + virBufferAsprintf(&childBuf, "<block unit='KiB'>%llu</block>\n",
>> + def->blocksize);
>> +
>> + virBufferAsprintf(&childBuf, "<requested unit='KiB'>%llu</requested>\n",
>> + def->requestedsize);
>> + }
>> +
>> virXMLFormatElement(buf, "target", NULL, &childBuf);
>> }
>>
>> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
>> index 95ad052891..5d89ecfe9d 100644
>> --- a/src/conf/domain_conf.h
>> +++ b/src/conf/domain_conf.h
>> @@ -2308,6 +2308,7 @@ typedef enum {
>> VIR_DOMAIN_MEMORY_MODEL_DIMM, /* dimm hotpluggable memory device */
>> VIR_DOMAIN_MEMORY_MODEL_NVDIMM, /* nvdimm memory device */
>> VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM, /* virtio-pmem memory device */
>> + VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM, /* virtio-mem memory device */
>>
>> VIR_DOMAIN_MEMORY_MODEL_LAST
>> } virDomainMemoryModel;
>> @@ -2328,6 +2329,8 @@ struct _virDomainMemoryDef {
>> int targetNode;
>> unsigned long long size; /* kibibytes */
>> unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */
>> + unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */
>> + unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM */
>> bool readonly; /* valid only for NVDIMM */
>>
>> /* required for QEMU NVDIMM ppc64 support */
>> diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
>> index 649fc335ac..b5a0c09468 100644
>> --- a/src/conf/domain_validate.c
>> +++ b/src/conf/domain_validate.c
>> @@ -25,6 +25,7 @@
>> #include "virconftypes.h"
>> #include "virlog.h"
>> #include "virutil.h"
>> +#include "virhostmem.h"
>>
>> #define VIR_FROM_THIS VIR_FROM_DOMAIN
>>
>> @@ -1389,6 +1390,8 @@ static int
>> virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
>> const virDomainDef *def)
>> {
>> + unsigned long long thpSize;
>> +
>> switch (mem->model) {
>> case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
>> if (!mem->nvdimmPath) {
>> @@ -1442,6 +1445,42 @@ virDomainMemoryDefValidate(const virDomainMemoryDef *mem,
>> _("virtio-pmem does not support NUMA nodes"));
>> return -1;
>> }
>> + break;
>> +
>> + case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
>> + if (mem->requestedsize > mem->size) {
>> + virReportError(VIR_ERR_XML_DETAIL, "%s",
>> + _("requested size must be smaller than @size"));
>> + return -1;
>> + }
>> +
>> + if (!VIR_IS_POW2(mem->blocksize)) {
>> + virReportError(VIR_ERR_XML_DETAIL, "%s",
>> + _("block size must be a power of two"));
>> + return -1;
>> + }
>> +
>> + if (virHostMemGetTHPSize(&thpSize) < 0) {
>> + /* We failed to get THP size, fall back to a sane default. On
>> + * almost every architecture the size will be 2MiB, except for some
>> + * funky arches like sparc and m68k. Use 2MiB and refine later if
>> + * somebody complains. */
>> + thpSize = 2048;
>
> FWIW, a Power 9 server uses 2MiB too:
>
> $ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
> 2097152
>
>
> I don't think you should worry about too much since x86 is the only arch that is
> supporting virtio-mem (for now).
So, in QEMU we use the following logic right now:
get_block_size() {
block_size = backend_pagesize();
if (block_size == NATIVE_PAGE_SIZE)
return MAX(block_size, native_thp_size());
return MAX(block_size, 1 * MiB);
}
detect_thp_size() {
thp_size = read_from_file();
if (!thp_size || thp_size > 16 * MiB) {
if (s390x)
return 1 * MiB;
return 2 * MiB;
}
return thp_size;
}
Especially, we also cap big block sizes (esp. arm64 with currently 512
MiB THP), as we prefer flexibility at this point.
So yes, on a x86-4 *host* we'll usually end up 2 MiB in QEMU. On arm64
it can be quite different.
>
>
> Reviewed-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
--
Thanks,
David / dhildenb
More information about the libvir-list
mailing list