[libvirt] Exposing mem-path in domain XML

Mon Sep 25 22:00:59 UTC 2017

----- Original Message -----
> From: "Michal Privoznik" <mprivozn at redhat.com>
> To: "Zack Cornelius" <zack.cornelius at kove.net>
> Cc: "Daniel P. Berrange" <berrange at redhat.com>, "libvir-list" <libvir-list at redhat.com>
> Sent: Monday, September 25, 2017 9:17:10 AM
> Subject: Re: [libvirt] Exposing mem-path in domain XML

> On 09/15/2017 03:49 PM, Zack Cornelius wrote:
>> For the Kove integration, the memory is allocated on external devices, similar
>> to a SAN device LUN allocation. As such, each virt will have its own separate
>> allocation, and will need its memory file(s) managed independently of other
>> virts. We also use information from the virtual machine management layer (such
>> as RHV, oVirt, or OpenStack) to associate VM metadata (such as VM ID, owner, or
>> project) with the allocation, to assist the administrators with monitoring,
>> tracking, and billing memory usage. This data also assists in maintenance and
>> troubleshooting by identifying which VMs and which hosts are utilizing memory
>> on a given external device. I don't believe we could (easily) get this data
>> just from the process information of the process creating or opening files, but
>> would need to do some significant work to trace this information from the
>> qemu/libvirt/management layer stack. Pre-allocating the files at the
>> integration point, such as oVirt / RHV hooks or libvirt prepare hooks, allows
>> us to collect this information from the upper layers or domain XML directly.
>> 
>> 
>> We don't actually need the file path exposed within the domain XML itself. All
>> that's really needed is just to have some mechanism for using predictable
>> filename(s) for qemu, instead of memory filenames that are currently generated
>> within qemu itself using mktemp. Our original proposal for this was to use the
>> domain UUID for the filename, and using the file within the
>> "memory_backing_dir" directory from qemu.conf. This does have the limitation of
>> not supporting multiple memory backing files or hotplug. An adaptation of this
>> would be to use the domain UUID (or domain name), plus the memory device id in
>> the filename (for example: <domain_uuid>_<mem_id1>). This would utilize the
>> same generation for the mem_id that is already in use for creating the memory
>> device in qemu. Any other mechanism which would result in well-defined
>> filenames would also work, as long as the filename is predictable prior to qemu
>> startup.
> 
> I think qemu uses random file names only if the path provided ends with
> a directory. I've tried this locally and indeed, when full path ending
> with a file was provided qemu just used it. So I've written a patch that
> creates mem-path argument with the following structure:
> 
> $memory_backing_dir/$alias
> 
> Problem with this approach is that $alias is not stable. It may change
> on device hot(un-)plug. Moreover, we'd like to keep the possibility to
> be able to change it in the future should we find ourselves in such
> situation.
> 
>> 
>> We may wish to add an additional flag in qemu.conf to enable this behavior,
>> defaulting to the current random filename generation if not specified. As the
>> path is in qemu.conf, and the filename would be generated internally within
>> libvirt, this avoids exposing any file paths within the domain XML, keeping it
>> system agnostic.
> 
> I don't think we need such switch. Others don't really care what the
> file is named really.
> 
>> 
>> An alternative would be to allow specification of the filename directly in the
>> domain XML, while continuing to use the path from qemu.conf's
>> memory_backing_dir directive. With this approach, libvirt would need to
>> sanitize the filename input to prevent escaping the memory_backing_dir
>> directory with "..". This method does expose the filenames (but not the path)
>> in the XML, but allows the management layer (such as oVirt, RHV, or Openstack)
>> to control the file creation locations directly.
> 
> Well, this is interesting idea. However, it may happen that we use
> memory-backend-file even if no <memory model='dimm'/> device. The code
> that decides this is pretty complex:
> 
> libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_command.c;hb=HEAD#l3234
> 
> Therefore we might not always have user define the file name.

Kove would only be using our integration with domains using the file memorybacking via the following XML, which I think simplifies the cases where the memory-backend-file gets used.

 <memoryBacking>
   <source='file'/>
   <access mode='shared'/>
 </memoryBacking>

The Kove integration is not compatible with huge pages, so we're just interested in the memoryBacking source='file' case, and not the hugepages cases, if that simplifies things.

> 
> Personally, I like the idea I've locally implemented. But, problem is we
> can't make such promise. Although, as a fix for a different unrelated
> bug we might generate the aliases at define time. If we did that, then
> we sort of can make the promise about the file naming. Well, sort of,
> because for instance for aforementioned <memory model='dimm'/> the alias
> for the corresponding memory-backend-file object is 'memdimmX' therefore
> the constructed path is different:
> 
> -object
> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/ram-node0,share=yes,size=4294967296
> -numa node,nodeid=0,cpus=0-3,memdev=ram-node0
> -object
> memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/hugepages2M/libvirt/qemu/13-fedora/memdimm0,share=yes,size=536870912
> -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
> 
> The corresponding XML looks like this:
> 
> <domain type='kvm'>
>  <name>fedora</name>
>  <uuid>63840878-0deb-4095-97e6-fc444d9bc9fa</uuid>
>  <maxMemory slots='16' unit='KiB'>8388608</maxMemory>
>  <memory unit='KiB'>4717568</memory>
>  <currentMemory unit='KiB'>4194304</currentMemory>
>  <memoryBacking>
>    <hugepages/>
>    <access mode='shared'/>
>  </memoryBacking>
>  ...
>  <cpu mode='host-passthrough' check='none'>
>    <topology sockets='1' cores='2' threads='2'/>
>    <numa>
>      <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
>    </numa>
>  </cpu>
>  ...
>  <devices>
>    ...
>    <memory model='dimm'>
>      <target>
>        <size unit='KiB'>523264</size>
>        <node>0</node>
>      </target>
>      <address type='dimm' slot='0'/>
>    </memory>
>  </devices>
> 
>

With the other bugfix that defines the aliases within the XML, and your locally implemented idea, would the filenames then be predicable or readable from the XML when using memory source file in all the cases with memory defined in the <memory> element, memory defined as part of the NUMA node, and memory defined as a dimm device?

--Zack