[PATCH 4/4] qemu_process.c: Propagate hugetlbfs mounts on reconnect

Michal Prívozník mprivozn at redhat.com
Fri Sep 23 14:43:13 UTC 2022


On 9/23/22 15:06, Martin Kletzander wrote:
> On Mon, Sep 12, 2022 at 03:46:41PM +0200, Michal Privoznik wrote:
>> When reconnecting to a running QEMU process, we construct the
>> per-domain path in all hugetlbfs mounts. This is a relict from
>> the past (v3.4.0-100-g5b24d25062) where we switched to a
>> per-domain path and we want to create those paths when libvirtd
>> restarts on upgrade.
>>
>> And with namespaces enabled there is one corner case where the
>> path is not created. In fact an error is reported and the
>> reconnect fails. Ideally, all mount events are propagated into
>> the QEMU's namespace. And they probably are, except when the
>> target path does not exist inside the namespace. Now, it's pretty
>> common for users to mount hugetlbfs under /dev (e.g.
>> /dev/hugepages), but if domain is started without hugepages (or
>> more specifically - private hugetlbfs path wasn't created on
>> domain startup), then the reconnect code tries to create it.
>> But it fails to do so, well, it fails to set seclabels on the
>> path because, because the path does not exist in the private
> 
> s/because, // at least =)
> 
>> namespace. And it doesn't exist because we specifically create
>> only a subset of all possible /dev nodes. Therefore, the mount
>> event, whilst propagated, is not successful and hence the
>> filesystem is not mounted. We have to do it ourselves.
>>
>> If hugetlbfs is mount anywhere else there's no problem and this
>> is effectively a dead code.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>> docs/kbase/qemu-passthrough-security.rst | 6 ------
>> src/qemu/qemu_process.c                  | 3 +++
>> 2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/docs/kbase/qemu-passthrough-security.rst
>> b/docs/kbase/qemu-passthrough-security.rst
>> index 106c3cc5b9..ef10d8af9b 100644
>> --- a/docs/kbase/qemu-passthrough-security.rst
>> +++ b/docs/kbase/qemu-passthrough-security.rst
>> @@ -172,9 +172,3 @@ command before any guest is started:
>> ::
>>
>>   # mount --make-rshared /
>> -
>> -Another requirement for dynamic mount point propagation is to  not place
>> -``hugetlbfs`` mount points under ``/dev`` because these won't be
>> propagated as
>> -corresponding directories do not exist in the private namespace. Or
>> just use
>> -``memfd`` memory backend instead which does not require ``hugetlbfs``
>> mount
>> -points.
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index cbfdd3bda5..b05ad059c3 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -3976,6 +3976,9 @@
>> qemuProcessBuildDestroyMemoryPathsImpl(virQEMUDriver *driver,
>>             return -1;
>>         }
>>
>> +        if (qemuDomainNamespaceSetupPath(vm, path, NULL) < 0)
>> +            return -1;
>> +
> 
> What's not visible here is that this branch is accessible only if the
> domain has any hugepage backing configured.  And it is only created on
> startup and reconnect.  That means if you start a VM without any
> hugepages mounted, then mount them somewhere under /dev and restart the
> daemon (which was always required) then attaching a hugepage-backed
> dimm, for example, will work without namespaces, but will fail with
> namespaces because we skipped the creation on restart.  I think just
> recreating the structure unconditionally would provide more benefit than
> doubly isolating qemu from hugepages (once with permissions and once
> with namespaces).

Actually, the path is created also on memory hotplug:

src/qemu/qemu_hotplug.c=2231=qemuDomainAttachMemory(virQEMUDriver *driver,
--
src/qemu/qemu_hotplug.c-2275-
src/qemu/qemu_hotplug.c:2276:    if (qemuProcessBuildDestroyMemoryPaths(driver, vm, mem, true) < 0)
src/qemu/qemu_hotplug.c-2277-        goto cleanup;

So the per-domain dir should be created in all three cases (startup, reconnect & hotplug).

> 
> Since this patch solves the issue of us killing perfectly fine qemu I
> think it's fine if that's a follow'up patch:
> 
> Reviewed-by: Martin Kletzander <mkletzan at redhat.com>

Pushed, thanks.

Michal



More information about the libvir-list mailing list