qemu:///embed and isolation from global components

Michal Privoznik mprivozn at redhat.com
Wed Mar 11 16:32:56 UTC 2020


On 3/10/20 4:42 PM, Andrea Bolognani wrote:
> On Mon, 2020-03-09 at 18:04 +0000, Daniel P. Berrangé wrote:
>> On Mon, Mar 09, 2020 at 06:09:13PM +0100, Andrea Bolognani wrote:
>>> On Fri, 2020-03-06 at 17:49 +0000, Daniel P. Berrangé wrote:
>>>> On Fri, Mar 06, 2020 at 06:24:15PM +0100, Andrea Bolognani wrote:
> [...]
>>> Aside: instead of a per-VM setting, would it make sense for this to
>>> be a connection-level setting? That way, even on traditional libvirt
>>> deployments, the hypervisor admin could eg. opt out of machinectl
>>> registration if they so desire; at the same time, you'd avoid the
>>> situation where most VMs are confined using CGroups but a few are
>>> not, which is probably not a desirable scenario.
>>
>> Yes, functionally it would work fine as a connection level setting
>> too, though this hides the behaviour from the anyone looking at the
>> guest config. We've previously punted quite a few things to the
>> qemu.conf because we didn't want to go through process of representing
>> them in the domain XML. This was OK when the qemu.conf settings were
>> something done once at host deployment time.
>>
>> With the embedded driver, I think this is not so desirable, as means
>> to get the configuration they want from a VM, they need to deal with
>> two distinct config files. The ideal would be that everything that
>> is commonly needed can be achieved solely in the domain XML, and
>> I think resource control backend is one such common tunable.
> 
> I don't have a strong opinion either way, and as far as my current
> use is concerned it doesn't bother me to have to deal with a second
> configuration file. The reason why I thought a per-VM setting might
> not be desirable is that applications would then be able to
> override it, and so maybe VMs created with virt-manager would be
> registered against machinectl but VMs created using GNOME Boxes
> would not, and if the sysadmin likes to use machinectl to get a
> comprehensive view of the system they'd no longer be guaranteed
> that. But if that's not the kind of scenario we think we should
> prevent, then a per-VM setting is fine by me :)

I still don't quite see the value in machinectl (maybe because I'm not 
using systemd :-D), but anyway - it's a system-wide monitor of virtual 
machines. Therefore it makes sense to register a domain started under 
qemu:///embed there. I don't view embed mode as a way of starting VMs 
secretly. It's a way of starting VMs privately and that's a different 
thing. Other users might learn that my app is running a VM (plain 'ps' 
would give it away), but they can not mangle with it in any way, e.g. 
change its XML.

> 
> [...]
>>> Right now we're already doing
>>>
>>>    qemu-$domid-$domname
>>>
>>> where I'm not entirely sure how much $domid actually buys us.
>>
>> IIRC $domid was a hack because at one time we had problems with
>> systemd not cleaning up the transient scope when QEMU was killed.
>> This would prevent libvirt starting the guest again thereafter.
>> I can't remember now if this was a bug we fixed in systemd or
>> libvirt or both or neither.

It was introduced by bd773e74f0d1d1b9ebbfcaa645178316b4f2265c and the 
commit message links to this bug: 
https://bugs.freedesktop.org/show_bug.cgi?id=68370 which looks like 
fixed in systemd.

> 
> I see! It would be neat if we could get rid of it, assuming of course
> it's no longer needed on the platforms we target.

I don't think it's that simple. Machinectl poses some limitations on the 
name: either it has to be a FQDN or a simple name without any dots. And 
according to our code the strlen() must be <= 64 (don't know if that 
comes from machinectl or is just our own limitation). Therefore, if you 
have two domains which names would clash after our processing, the 
domain ID guarantees unique strings are passed to machined.

> 
> [...]
>>>> Of course you can turn off virtlogd via qemu.conf
>>>
>>> That's what I'm doing right now and it works well enough, but I'm
>>> afraid that requiring a bunch of setup will discourage developers
>>> from using the embedded driver. We should aim for a reasonable out
>>> of the box experience instead.
>>
>> Why do you need to turn it off though ?  It should already
>> "do the right thing" as the log files should appear under a
>> different location and not have any clash. Turning it off
>> immediately creates a denial of service CVE in your application.
> 
> I was getting the same SELinux denial that Michal reported a few
> days back: virtlogd wants to verify it's being connected to by a
> process running as root, but it's only allowed by the policy to
> look into libvirtd's /proc/$pid for this information. So, for the
> same reason virtqemud can't currently connect to virtlogd when
> SELinux is in enforcing mode, neither can my qemu:///embed-using
> application.
> 
> Besides that, there is the fact that a lot of people, mainly those
> coming from a containers background, are not happy with having extra
> daemons running. I'm not saying they would prefer being hit by a DoS
> than having virtlogd running :) but they really, really don't like
> daemons :)
> 
>> None the less, as above I think we need common things to be
>> controllable via the domain XML. So either we need to make a
>> tunable there for use of logd or not, or we need to implement
>> the FIFO idea to avoid need for logd at all.
> 
> It seems like the FIFO idea (though I'll admit I don't fully
> understand it) would be the best of both worlds.
> 
> [...]
>>>> If you don't want to use virtnetworkd, then you won't be
>>>> creating such an <interface> config in the first place.
>>>> The app will have the option to open an embedded seconary
>>>> driver if desired. Some of the drivers won't really make
>>>> sense as embedded things though, at least not without
>>>> extra work. ie a embedded network or nwfilter driver has
>>>> no value unless your app has moved into a new network
>>>> namespace, as otherwise it will just fight with the
>>>> global network driver.
>>>
>>> Again, I think our defaults for qemu:///embed should be consistent
>>> and conservative: instead of having developers opt out of using
>>> network:///system, they should have to opt in before global
>>> resources are involved.
>>
>> They already opt-in to using the network driver by virtue of
>> configuring their guest to request its use. We don't need to
>> opt-in twice.
> 
> For some applications that's definitely an option, but others like
> virt-qemu-run accept a fairly arbitrary configuration and having to
> figure out whether that would result in eg. virtnetworkd being used,
> and whether that's acceptable, is extra work as well as potential
> duplication of logic that already exists in libvirt...
> 
> Then again, something like virt-qemu-run is probably expected to
> expose basically the entire feature set of libvirt, as opposed to
> more targeted applications which will use qemu:///embed internally
> and only rely on a relatively small subset.
> 
>>> If we don't do that, I'm afraid developers will lose trust in the
>>> whole qemu:///embed idea. Speaking from my own experience, I was
>>> certainly surprised when I accidentally realized my qemu:///embed
>>> VMs were showing up in the output of machinectl, and now I'm kinda
>>> wondering how many other ways the application I'm working on, for
>>> which the use of libvirt is just an implementation detail, is poking
>>> at the system without my knowledge...
>>
>> First off, this mis-understanding highlights a need for better
>> documentation to set out what the embedded driver is and is not
>> expected to do.
> 
> We definitely need to document this very clearly if we want
> qemu:///embed to gain traction.
> 
>> At a high level the embedded QEMU driver
>>
>>   - Isolated from any other instance of the QEMU driver
> 
> Yup.
> 
>>   - Process context of app is inherited by QEMU (namespaces,
>>     cgroups, CPU pinning, security context, etc)
> 
> Optionally! The fact that libvirt can deal with these is a big
> selling point in some scenarios.
> 
>>   - All other functionality is unchanged.
>>
>> Your comments above are equating two distinct scenarios, one which
>> had a serious functional bug & violated the first two design goals,
>> and one which does not cause any functional issues at all.
> 
> I'm not equating the two, just reporting a bunch of behaviors that
> I ran into while trying to use qemu:///embed in my application and
> that I found to be surprising, in an attempt to figure out which
> ones are intentional and whether even those are necessarily
> something that we want to keep around in their current form.
> 
>> There's no question that we must fix the machined problem.

I will try to post patches for this.

Michal




More information about the libvir-list mailing list