[libvirt] Exposing mem-path in domain XML

Thu Sep 14 11:46:48 UTC 2017

On 09/06/2017 01:42 PM, Daniel P. Berrange wrote:
> On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
>> On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
>>> On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
>>>> On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
>>>>> On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
>>>>>> On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
>>>>>>> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> there is the following bug [1] which I'm not quite sure how to grasp. So
>>>>>>>> there is this application/infrastructure called Kove [2] that allows you
>>>>>>>> to have memory for your application stored on a distant host in network
>>>>>>>> and basically fetch needed region on pagefault. Now imagine that
>>>>>>>> somebody wants to use it for backing up domain memory. However, the way
>>>>>>>> that the tool works is it has some kernel module and then some userland
>>>>>>>> binary that is fed with the path of the mmaped file. I don't know all
>>>>>>>> the details, but the point is, in order to let users use this we need to
>>>>>>>> expose the paths for mem-path for the guest memory. I know we did not
>>>>>>>> want to do this in the past, but now it looks like we don't have a way
>>>>>>>> around it, do we?
>>>>>>>
>>>>>>> We don't want to expose the concept of paths in the XML because this is
>>>>>>> a linux specific way to configure hugepages / shared memory. So we hide
>>>>>>> the particular path used in the internal impl of the QEMU driver, and
>>>>>>> or via the qemu.conf global config file. I don't really want to change
>>>>>>> that approach, particularly if the only reason is to integrate with a
>>>>>>> closed source binary like Kove. 
>>>>>>
>>>>>> Yep, I agree with that. However, if you read the discussion in the
>>>>>> linked bug you'll find that they need to know what file in the
>>>>>> memory_backing_dir (from qemu.conf) corresponds to which domain. The
>>>>>> reported suggested using UUID based filenames, which I fear is not
>>>>>> enough because one can have multiple <memory type='dimm'/> -s configured
>>>>>> for their domain. But I guess we could go with:
>>>>>>
>>>>>> ${memory_backing_dir}/${domName}        for generic memory
>>>>>> ${memory_backing_dir}/${domName}_N      for Nth <memory/>
>>>>>
>>>>> This feels like it is going to lead to hell when you add in memory
>>>>> hotplug/unplug, with inevitable races.
>>>>>
>>>>>> BTW: IIUC they want predictable names because they need to create the
>>>>>> files before spawning qemu so that they are picked by qemu instead of
>>>>>> using temporary names.
>>>>>
>>>>> I would like to know why they even need to associate particular memory
>>>>> files with particular QEMU processes. eg if they're just exposing a
>>>>> new type of tmpfs filesystem from the kernel why does it matter what
>>>>> each file is used for.
>>>>
>>>> This might get you answer:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
>>>>
>>>> So the way I understand it is that they will create the files, and
>>>> provide us with paths. So luckily, we don't have to make up the paths on
>>>> our own.
>>>
>>> IOW it is pretending to be tmpfs except it is not behaving like tmpfs.
>>> This doesn't really make me any more inclined to support this closed
>>> source stuff in libvirt.
>>
>> Yeah, that's my feeling too. So, what about the following: let's assume
>> they will fix their code so that it is proper tmpfs. Libvirt can then
>> behave to it just like it is already doing so for hugetlbfs. For us
>> it'll be just yet another type of hugepages. I mean, for hugepages we
>> already create /hupages/mount/point/libvirt/$domain per each domain so
>> the separation is there (even though this is considered internal impl),
>> since it would be a proper tmpfs they can see the pid of qemu which is
>> trying to mmap() (and take the name or whatever unique ID they want from
>> there).
> 
> Yep, we can at least make a reasonable guarantee that all files belonging
> to a single QEMU process will always be within the same sub-directory.
> This allows the kmod to distinguish 2 files owned by separate VMs, from 2
> files owned by the same VM and do what's needed. I don't see why it would
> need to care about naming conventions beyond the layout.
> 
>> I guess what I'm trying to ask is if it was proper tmpfs, we would be
>> okay with it, wouldn't we?
> 
> If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we
> should be fine -  at most you would need /etc/libvirt/qemu.conf change
> to explicitly point at the custom mount point if libvirt doesn't
> auto-detect the right one.
> 

Zack, can you join the discussion and tell us if our design sounds
reasonable to you?

Michal