[libvirt] Exposing mem-path in domain XML

Mon Sep 25 14:17:10 UTC 2017

On 09/15/2017 03:49 PM, Zack Cornelius wrote:
> For the Kove integration, the memory is allocated on external devices, similar to a SAN device LUN allocation. As such, each virt will have its own separate allocation, and will need its memory file(s) managed independently of other virts. We also use information from the virtual machine management layer (such as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or project) with the allocation, to assist the administrators with monitoring, tracking, and billing memory usage. This data also assists in maintenance and troubleshooting by identifying which VMs and which hosts are utilizing memory on a given external device. I don't believe we could (easily) get this data just from the process information of the process creating or opening files, but would need to do some significant work to trace this information from the qemu/libvirt/management layer stack. Pre-allocating the files at the integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows us to collect this information from the upper layers or domain XML directly.
> 
> 
> We don't actually need the file path exposed within the domain XML itself. All that's really needed is just to have some mechanism for using predictable filename(s) for qemu, instead of memory filenames that are currently generated within qemu itself using mktemp. Our original proposal for this was to use the domain UUID for the filename, and using the file within the "memory_backing_dir" directory from qemu.conf. This does have the limitation of not supporting multiple memory backing files or hotplug. An adaptation of this would be to use the domain UUID (or domain name), plus the memory device id in the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the same generation for the mem_id that is already in use for creating the memory device in qemu. Any other mechanism which would result in well-defined filenames would also work, as long as the filename is predictable prior to qemu startup.

I think qemu uses random file names only if the path provided ends with
a directory. I've tried this locally and indeed, when full path ending
with a file was provided qemu just used it. So I've written a patch that
creates mem-path argument with the following structure:

$memory_backing_dir/$alias

Problem with this approach is that $alias is not stable. It may change
on device hot(un-)plug. Moreover, we'd like to keep the possibility to
be able to change it in the future should we find ourselves in such
situation.

> 
> We may wish to add an additional flag in qemu.conf to enable this behavior, defaulting to the current random filename generation if not specified. As the path is in qemu.conf, and the filename would be generated internally within libvirt, this avoids exposing any file paths within the domain XML, keeping it system agnostic.

I don't think we need such switch. Others don't really care what the
file is named really.

> 
> An alternative would be to allow specification of the filename directly in the domain XML, while continuing to use the path from qemu.conf's memory_backing_dir directive. With this approach, libvirt would need to sanitize the filename input to prevent escaping the memory_backing_dir directory with "..". This method does expose the filenames (but not the path) in the XML, but allows the management layer (such as oVirt, RHV, or Openstack) to control the file creation locations directly.

Well, this is interesting idea. However, it may happen that we use
memory-backend-file even if no <memory model='dimm'/> device. The code
that decides this is pretty complex:

libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_command.c;hb=HEAD#l3234

Therefore we might not always have user define the file name.

Personally, I like the idea I've locally implemented. But, problem is we
can't make such promise. Although, as a fix for a different unrelated
bug we might generate the aliases at define time. If we did that, then
we sort of can make the promise about the file naming. Well, sort of,
because for instance for aforementioned <memory model='dimm'/> the alias
for the corresponding memory-backend-file object is 'memdimmX' therefore
the constructed path is different:

-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/ram-node0,share=yes,size=4294967296
-numa node,nodeid=0,cpus=0-3,memdev=ram-node0
-object memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/memdimm0,share=yes,size=536870912
-device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0

The corresponding XML looks like this:

<domain type='kvm'>
  <name>fedora</name>
  <uuid>63840878-0deb-4095-97e6-fc444d9bc9fa</uuid>
  <maxMemory slots='16' unit='KiB'>8388608</maxMemory>
  <memory unit='KiB'>4717568</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages/>
    <access mode='shared'/>
  </memoryBacking>
  ...
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='2' threads='2'/>
    <numa>
      <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
    </numa>
  </cpu>
  ...
  <devices>
    ...
    <memory model='dimm'>
      <target>
        <size unit='KiB'>523264</size>
        <node>0</node>
      </target>
      <address type='dimm' slot='0'/>
    </memory>
  </devices>

Michal