qemu:///embed and isolation from global components

Tue Mar 10 15:42:57 UTC 2020

On Mon, 2020-03-09 at 18:04 +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 09, 2020 at 06:09:13PM +0100, Andrea Bolognani wrote:
> > On Fri, 2020-03-06 at 17:49 +0000, Daniel P. Berrangé wrote:
> > > On Fri, Mar 06, 2020 at 06:24:15PM +0100, Andrea Bolognani wrote:
[...]
> > Aside: instead of a per-VM setting, would it make sense for this to
> > be a connection-level setting? That way, even on traditional libvirt
> > deployments, the hypervisor admin could eg. opt out of machinectl
> > registration if they so desire; at the same time, you'd avoid the
> > situation where most VMs are confined using CGroups but a few are
> > not, which is probably not a desirable scenario.
> 
> Yes, functionally it would work fine as a connection level setting
> too, though this hides the behaviour from the anyone looking at the
> guest config. We've previously punted quite a few things to the
> qemu.conf because we didn't want to go through process of representing
> them in the domain XML. This was OK when the qemu.conf settings were
> something done once at host deployment time.
> 
> With the embedded driver, I think this is not so desirable, as means
> to get the configuration they want from a VM, they need to deal with
> two distinct config files. The ideal would be that everything that
> is commonly needed can be achieved solely in the domain XML, and
> I think resource control backend is one such common tunable.

I don't have a strong opinion either way, and as far as my current
use is concerned it doesn't bother me to have to deal with a second
configuration file. The reason why I thought a per-VM setting might
not be desirable is that applications would then be able to
override it, and so maybe VMs created with virt-manager would be
registered against machinectl but VMs created using GNOME Boxes
would not, and if the sysadmin likes to use machinectl to get a
comprehensive view of the system they'd no longer be guaranteed
that. But if that's not the kind of scenario we think we should
prevent, then a per-VM setting is fine by me :)

[...]
> > Right now we're already doing
> > 
> >   qemu-$domid-$domname
> > 
> > where I'm not entirely sure how much $domid actually buys us.
> 
> IIRC $domid was a hack because at one time we had problems with
> systemd not cleaning up the transient scope when QEMU was killed.
> This would prevent libvirt starting the guest again thereafter.
> I can't remember now if this was a bug we fixed in systemd or
> libvirt or both or neither.

I see! It would be neat if we could get rid of it, assuming of course
it's no longer needed on the platforms we target.

[...]
> > > Of course you can turn off virtlogd via qemu.conf
> > 
> > That's what I'm doing right now and it works well enough, but I'm
> > afraid that requiring a bunch of setup will discourage developers
> > from using the embedded driver. We should aim for a reasonable out
> > of the box experience instead.
> 
> Why do you need to turn it off though ?  It should already
> "do the right thing" as the log files should appear under a
> different location and not have any clash. Turning it off
> immediately creates a denial of service CVE in your application.

I was getting the same SELinux denial that Michal reported a few
days back: virtlogd wants to verify it's being connected to by a
process running as root, but it's only allowed by the policy to
look into libvirtd's /proc/$pid for this information. So, for the
same reason virtqemud can't currently connect to virtlogd when
SELinux is in enforcing mode, neither can my qemu:///embed-using
application.

Besides that, there is the fact that a lot of people, mainly those
coming from a containers background, are not happy with having extra
daemons running. I'm not saying they would prefer being hit by a DoS
than having virtlogd running :) but they really, really don't like
daemons :)

> None the less, as above I think we need common things to be
> controllable via the domain XML. So either we need to make a
> tunable there for use of logd or not, or we need to implement
> the FIFO idea to avoid need for logd at all.

It seems like the FIFO idea (though I'll admit I don't fully
understand it) would be the best of both worlds.

[...]
> > > If you don't want to use virtnetworkd, then you won't be
> > > creating such an <interface> config in the first place.
> > > The app will have the option to open an embedded seconary
> > > driver if desired. Some of the drivers won't really make
> > > sense as embedded things though, at least not without
> > > extra work. ie a embedded network or nwfilter driver has
> > > no value unless your app has moved into a new network
> > > namespace, as otherwise it will just fight with the
> > > global network driver.
> > 
> > Again, I think our defaults for qemu:///embed should be consistent
> > and conservative: instead of having developers opt out of using
> > network:///system, they should have to opt in before global
> > resources are involved.
> 
> They already opt-in to using the network driver by virtue of
> configuring their guest to request its use. We don't need to
> opt-in twice.

For some applications that's definitely an option, but others like
virt-qemu-run accept a fairly arbitrary configuration and having to
figure out whether that would result in eg. virtnetworkd being used,
and whether that's acceptable, is extra work as well as potential
duplication of logic that already exists in libvirt...

Then again, something like virt-qemu-run is probably expected to
expose basically the entire feature set of libvirt, as opposed to
more targeted applications which will use qemu:///embed internally
and only rely on a relatively small subset.

> > If we don't do that, I'm afraid developers will lose trust in the
> > whole qemu:///embed idea. Speaking from my own experience, I was
> > certainly surprised when I accidentally realized my qemu:///embed
> > VMs were showing up in the output of machinectl, and now I'm kinda
> > wondering how many other ways the application I'm working on, for
> > which the use of libvirt is just an implementation detail, is poking
> > at the system without my knowledge...
> 
> First off, this mis-understanding highlights a need for better
> documentation to set out what the embedded driver is and is not
> expected to do.

We definitely need to document this very clearly if we want
qemu:///embed to gain traction.

> At a high level the embedded QEMU driver
> 
>  - Isolated from any other instance of the QEMU driver

Yup.

>  - Process context of app is inherited by QEMU (namespaces,
>    cgroups, CPU pinning, security context, etc)

Optionally! The fact that libvirt can deal with these is a big
selling point in some scenarios.

>  - All other functionality is unchanged.
> 
> Your comments above are equating two distinct scenarios, one which
> had a serious functional bug & violated the first two design goals,
> and one which does not cause any functional issues at all.

I'm not equating the two, just reporting a bunch of behaviors that
I ran into while trying to use qemu:///embed in my application and
that I found to be surprising, in an attempt to figure out which
ones are intentional and whether even those are necessarily
something that we want to keep around in their current form.

> There's no question that we must fix the machined problem.
> 
> There is nothing that needs fixing in the virtual network case as
> that behaviour is intentional and is not causing any ill effects.
> 
> The embedded driver is NOT intended to be completely isolated from
> everything on the host, whether it is systemd, or another host OS
> service, a libvirt secondary driver, or something else again.
> 
> In fact a libvirt secondary driver should really just be considered
> the same as any other host OS service. We just happen to use a
> familiar RPC API to talk to that secondary driver, where as for
> a host OS service we'd use DBus or some 3rd party API/protocol.
> 
> We don't require a special config in the QEMU driver to permit
> use of other host OS services, we would simply do the right
> thing based on whatever domain XML the application provides. The
> same is true of use of libvirt secondary drivers. We must NEVER
> connect to a secondary driver, unless something in the domain XML
> requests that behaviour. The <interface type=network> is a valid
> reason to connect to the virnetworkd daemon.

I mean, one could argue that namespaces and CGroups are OS services,
and whether or not those are used is decided not based on the per-VM
configuration but the per-driver configuration :) But I'm not here
to play devil's advocate, and I mostly get your point.

Anyway, as I mentioned earlier part of the issue is having daemons
running, which neither of the features mentioned above requires.

> Of course when we do connect to virnetworkd, we MUST ensure that
> anything we do preserves isolation from other QEMU driver instances.
> 
> I would also note that the virtnetworkd daemon connected to, may
> or may not actually be the same as the host OS one. It is entirely
> possible that the application has opened the embedded QEMU driver
> from within a namesace, that results in a connection to the
> /var/run/libvirt/virnetworkd being serviced by a completely isolated
> virnetworkd running inside a different network namespace from the
> host OS virtnetworkd. This is of no concern to the QEMU driver
> though - it just needs to honour what is in the domain XML.

This kind of setup sounds extremely confusing.

Would you have multiple instances of virtnetworkd, one per network
namespace, all running at the same time? How would the application
pick the correct one to connect to?

-- 
Andrea Bolognani / Red Hat / Virtualization