[PATCH v4 for v7.6.0 00/14] Introduce virtio-mem <memory/> model

David Hildenbrand david at redhat.com
Wed Jul 7 10:30:19 UTC 2021


On 23.06.21 12:12, Michal Privoznik wrote:
> v4 of:
> 
> https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html
> 
> diff to v3:
> - Rebased code on the top of master
> - Tried to work in all Peter's review suggestions
> - Fixed a bug where adjusting <requested/> was viewed as hotplug of new
>    <memory/> by XML validator and thus if <maxMemory/> was close enough to
>    <currentMemory/> the validator reported an error (this was reported by
>    David).
> 

Hi Michal,

I just retested with this version and it mostly works as expected. I 
tested quite some memory configurations and have some comments / reports :)

I tested successfully:
- 1 node with one device
- 2 nodes with one device on each node
- 2 nodes with two devices on one node
- "virsh update-memory-device" on live domains -- works great
- huge pages and anonymous memory with access=private and access=shared.
   There is only one issue with hugepages and memfd (prealloc=on gets
   set).
- shared memory on memfd and anonymous memory (-> shared file) with
   access=shared

I only tested on a single host NUMA node so far, but don't expect 
surprises with host numa policies.


1. "virsh update-memory-device" and stopped domains

Once I have more than one virtio-mem device defined for a VM, "virsh 
update-memory-device" cannot be used anymore as aliases don't seem to be 
available on stopped VMs. If I manually define an alias on a stopped VM, 
the alias silently gets dropped. Is there any way to identify a 
virtio-mem device on a stopped domain?


2. "virsh update-memory-device" with --config on a running domain

# virsh update-memory-device "Fedora34" --config --alias "virtiomem1" 
--requested-size 16G
error: no memory device found

I guess the issue is again, that alias don't apply to the "!live" XML. 
So the "--config" option doesn't really work when having more than one 
virtio-mem device defined for a VM.


3. "virsh update-memory-device" and nodes

In addition to "--alias", something like "--node" would also be nice to 
have -- assuming there is only a single virtio-mem device per NUMA node, 
which is usually the case. For example:

"virsh update-memory-device "Fedora34" --node 1 --requested-size 16G" 
could come in handy. This would also work on "!live" domains.


4. "actual" vs. "current"

"<actual unit='KiB'>16777216</actual>" I wonder if "current" instead of 
"actual" would be more in line with "currentMemory". But no strong opinion.


5. Slot handling.

As already discussed, virtio-mem and virtio-pmem don't need slots. Yet, 
the "slots" definition is required and libvirt reserves once slot for 
each such device ("error: unsupported configuration: memory device count 
'2' exceeds slots count '1'"). This is certainly future work, if we ever 
want to change that.


6. 4k source results in an error

     <source>
       <pagesize unit='KiB'>4096</pagesize>
       <nodemask>0-1</nodemask>
     </source>

"error: internal error: Unable to find any usable hugetlbfs mount for 
4096 KiB"

This example is taken from https://libvirt.org/formatdomain.html for 
DIMMs. Not sure what the expected behavior is.


7. File source gets silently dropped

     <source>
       <path>/dev/shmem/vm0</path>
     </source>

The statement gets silently dropped, which is somewhat surprising. 
However, I did not test what happens with DIMMs, maybe it's the same.


8. Global preallocation of memory

With

<memoryBacking>
	<allocation mode="immediate"\>
</memoryBacking>

we also get "prealloc=on" set for the memory backends of the virito-mem 
devices, which is sub-optimal, because we end up preallocating all 
memory of the memory backend (which is unexpected for a virtio-mem 
device) and virtio-mem will then discard all memory immediately again. 
So it's essentially a dangerous NOP -- dangerous because we temporarily 
consume a lot of memory.

In an ideal world, we would not set this for the memory backend used for 
the virtio-mem devices, but for the virtio-mem devices themselves, such 
that preallocation happens when new memory blocks are actually exposed 
to the VM.

As virtio-mem does not support "prealloc=on" for virtio-mem devices yet, 
this is future work. We might want to error out, though, if <allocation 
mode="immediate"\> is used along with virtio-mem devices for now. I'm 
planning on implementing this in QEMU soon. Until then, it might also be 
good enough to simply document that this setup should be avoided.


9. Memfd and huge pages

<memoryBacking>
     <source type="memfd"/>
</memoryBacking>

and

<memory model='virtio-mem' access='shared'>
   <source>
     <pagesize unit='KiB'>2048</pagesize>
   </source>
   ...
</memory>


I get on the QEMU cmdline

"-object 
{"qom-type":"memory-backend-memfd","id":"memvirtiomem0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":17179869184}"

Dropping "the memfd" source I get on the QEMU cmdline:

-object^@{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/dev/hugepages/libvirt/qemu/2-Fedora34-2","share":true,"size":17179869184}

"prealloc":true should not have been added for virtio-mem in case of 
memfd. !memfd does what's expected.


10. Memory locking

With

<memoryBacking>
	<locked/>
</memoryBacking>

virtio-mem fails with

"qemu-system-x86_64: -device 
virtio-mem-pci,node=0,block-size=2097152,requested-size=0,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2: 
Incompatible with mlock"

Unfortunately,for example, on shmem like:

<memoryBacking>
	<locked/>
	<access mode="shared"/>
	<source type="memfd"/>
</memoryBacking>

it seems to fail after essentially (temporarily) preallocating all 
memory for the memory backend of the virtio-mem device. In the future, 
virtio-mem might be able to support mlock, until then, this is 
suboptimal but at least fails at some point.


11. Reservation of memory

With new QEMU versions we'll want to pass "reserve=off" for the memory 
backend used, especially with hugepages and private mappings. While this 
change was merged into QEMU, it's not part of an official release yet. 
Future work.

https://lore.kernel.org/qemu-devel/20210510114328.21835-1-david@redhat.com/

Otherwise, when we don't have the "size" currently in free and 
"unreserved" hugepages, we'll fail with "qemu-system-x86_64: unable to 
map backing store for guest RAM: Cannot allocate memory". The same thing 
can easily happen on anonymous memory when memory overcommit isn't disabled.

So this is future work, but at least the QEMU part is already upstream.



I'm planning on adding some libvirt documentation to 
https://virtio-mem.gitlab.io/ soon, where I'll document some of this, 
including care that has to be taken with mlock and preallocation.

Thanks for all your work!

-- 
Thanks,

David / dhildenb




More information about the libvir-list mailing list