qemu:///embed and isolation from global components

Daniel P. Berrangé berrange at redhat.com
Mon Mar 9 18:04:55 UTC 2020


On Mon, Mar 09, 2020 at 06:09:13PM +0100, Andrea Bolognani wrote:
> On Fri, 2020-03-06 at 17:49 +0000, Daniel P. Berrangé wrote:
> > On Fri, Mar 06, 2020 at 06:24:15PM +0100, Andrea Bolognani wrote:
> > >   * it does, however, show up in the output of 'machinectl', with
> > >     class=vm and service=libvirt-qemu;
> > 
> > This is bad. It is one of the gaps we need to deal with.
> > 
> > A long while back I proposed a domain XML option for this:
> > 
> >   https://www.redhat.com/archives/libvir-list/2017-December/msg00729.html
> > 
> >     <resource register="none|direct|machined|systemd"/>
> > 
> > The "none" case meant inherit the cgroups placement of the parent
> > libvirtd (or now the app using the embedded driver), and would be
> > a reasonable default for the embedded case.
> 
> Agreed. In my case I'll want to use "direct" instead, but "none"
> would indeed be a sane default for the embedded driver.

Yes, "none" makes sense for the emedded driver, as QEMU is intended
to be able to inherit process execution context from the calling
application. IOW, we must inherit cgroups placement, and not create
things under /machine.slice by default.

> Aside: instead of a per-VM setting, would it make sense for this to
> be a connection-level setting? That way, even on traditional libvirt
> deployments, the hypervisor admin could eg. opt out of machinectl
> registration if they so desire; at the same time, you'd avoid the
> situation where most VMs are confined using CGroups but a few are
> not, which is probably not a desirable scenario.

Yes, functionally it would work fine as a connection level setting
too, though this hides the behaviour from the anyone looking at the
guest config. We've previously punted quite a few things to the
qemu.conf because we didn't want to go through process of representing
them in the domain XML. This was OK when the qemu.conf settings were
something done once at host deployment time.

With the embedded driver, I think this is not so desirable, as means
to get the configuration they want from a VM, they need to deal with
two distinct config files. The ideal would be that everything that
is commonly needed can be achieved solely in the domain XML, and
I think resource control backend is one such common tunable.

> > For the other cases, we certainly need to do something to ensure
> > uniqueness. This is possibly as simple as including a fixed
> > prefix like "qemu-$PID" where $PID is the app embedding the
> > driver. That said, we support closing the app, and re-starting
> > using the same embedded driver directory, which would change
> > PID.
> 
> Right now we're already doing
> 
>   qemu-$domid-$domname
> 
> where I'm not entirely sure how much $domid actually buys us.

IIRC $domid was a hack because at one time we had problems with
systemd not cleaning up the transient scope when QEMU was killed.
This would prevent libvirt starting the guest again thereafter.
I can't remember now if this was a bug we fixed in systemd or
libvirt or both or neither.

> I think machine ids need to be unique per host, not per service,
> which is kind of a bummer because the obvious choice would be to
> generate a service name based on the embedded root... Michal already
> suggested another solution, perhaps that one is more viable.

Yes, they definitely need to be unique globally, but as agreed
above, we should not create cgroups by default at all - we must
inherit cgroups placement.


> > >   * virtlogd is automatically spawned, if not already active, to
> > >     take care of the domain's log file.
> > 
> > This is trickier. The use of virtlogd was to fix a security DoS
> > where malicious QEMU can write to serial console, or trigger
> > QEMU to write to stdout/err, such that it exhausts the host
> > filesystem.  IIUC, virtlogd should still end up writing to
> > the logfile path associated with the embedded  driver root
> > directory prefix, so there shouldn't be a filename clash
> > unless I screwed up.
> > 
> > Since introducing virtlogd, however, I did think of a different
> > strategy, which would be to connect a FIFO to QEMU as the log
> > file FD. The QEMU driver itself can own the other end of the FIFO
> > and do rollover.
> > 
> > Of course you can turn off virtlogd via qemu.conf
> 
> That's what I'm doing right now and it works well enough, but I'm
> afraid that requiring a bunch of setup will discourage developers
> from using the embedded driver. We should aim for a reasonable out
> of the box experience instead.

Why do you need to turn it off though ?  It should already
"do the right thing" as the log files should appear under a
different location and not have any clash. Turning it off
immediately creates a denial of service CVE in your application.

None the less, as above I think we need common things to be
controllable via the domain XML. So either we need to make a
tunable there for use of logd or not, or we need to implement
the FIFO idea to avoid need for logd at all.

> > > The first one is expected, but the other two were a surprise, at
> > > least to me. Basically, what I would expect is that qemu:///embed
> > > would work in a completely isolated way, only interacting with
> > > system-wide components when that's explicitly requested.
> > 
> > The goal wasn't explicitly to avoid system-wide components,
> > but it certainly was intended to avoid clashing resources.
> > 
> > IOW, machined, virtlogd are both valid to use, as long as
> > the resource clashes are avoided. We should definitely have
> > a way to disable these too.
> 
> I'd argue that most users of the embedded driver would probably
> prefer it didn't interact with system-wide components: if that
> wasn't the case, they'd just use qemu:///system or qemu:///session
> instead.
> 
> Having a way to turn off those behaviors would certainly be a step
> in the right direction, but I think ultimately we want to be in a
> situation where developers opt in rather than out of them.
> 
> > > In other words, I would expect virtlogd not to be spawned, and the
> > > domain not to be registered with machinectl; at the same time, if
> > > the domain configuration is such that it contains for example
> > > 
> > >   <interface type='network'>
> > >     <source network='default'/>
> > >     <model type='virtio'/>
> > >   </interface>
> > > 
> > > then I would expect to see a failure unless a connection to
> > > network:///system had been explicitly and pre-emptively opened, and
> > > not the current behavior which apparently is to automatically open
> > > that connection and spawning virtnetworkd as a result.
> > 
> > The QEMU embedded driver is explicitly intended to be able to
> > interact with other global secondary drivers.
> > 
> > If you don't want to use virtnetworkd, then you won't be
> > creating such an <interface> config in the first place.
> > The app will have the option to open an embedded seconary
> > driver if desired. Some of the drivers won't really make
> > sense as embedded things though, at least not without
> > extra work. ie a embedded network or nwfilter driver has
> > no value unless your app has moved into a new network
> > namespace, as otherwise it will just fight with the
> > global network driver.
> 
> Again, I think our defaults for qemu:///embed should be consistent
> and conservative: instead of having developers opt out of using
> network:///system, they should have to opt in before global
> resources are involved.

They already opt-in to using the network driver by virtue of
configuring their guest to request its use. We don't need to
opt-in twice.

> If we don't do that, I'm afraid developers will lose trust in the
> whole qemu:///embed idea. Speaking from my own experience, I was
> certainly surprised when I accidentally realized my qemu:///embed
> VMs were showing up in the output of machinectl, and now I'm kinda
> wondering how many other ways the application I'm working on, for
> which the use of libvirt is just an implementation detail, is poking
> at the system without my knowledge...

First off, this mis-understanding highlights a need for better
documentation to set out what the embedded driver is and is not
expected to do.

At a high level the embedded QEMU driver

 - Isolated from any other instance of the QEMU driver
 - Process context of app is inherited by QEMU (namespaces,
   cgroups, CPU pinning, security context, etc)
 - All other functionality is unchanged.

Your comments above are equating two distinct scenarios, one which
had a serious functional bug & violated the first two design goals,
and one which does not cause any functional issues at all.

There's no question that we must fix the machined problem.

There is nothing that needs fixing in the virtual network case as
that behaviour is intentional and is not causing any ill effects.


The embedded driver is NOT intended to be completely isolated from
everything on the host, whether it is systemd, or another host OS
service, a libvirt secondary driver, or something else again.

In fact a libvirt secondary driver should really just be considered
the same as any other host OS service. We just happen to use a
familiar RPC API to talk to that secondary driver, where as for
a host OS service we'd use DBus or some 3rd party API/protocol.

We don't require a special config in the QEMU driver to permit
use of other host OS services, we would simply do the right
thing based on whatever domain XML the application provides. The
same is true of use of libvirt secondary drivers. We must NEVER
connect to a secondary driver, unless something in the domain XML
requests that behaviour. The <interface type=network> is a valid
reason to connect to the virnetworkd daemon.

Of course when we do connect to virnetworkd, we MUST ensure that
anything we do preserves isolation from other QEMU driver instances.

I would also note that the virtnetworkd daemon connected to, may
or may not actually be the same as the host OS one. It is entirely
possible that the application has opened the embedded QEMU driver
from within a namesace, that results in a connection to the
/var/run/libvirt/virnetworkd being serviced by a completely isolated
virnetworkd running inside a different network namespace from the
host OS virtnetworkd. This is of no concern to the QEMU driver
though - it just needs to honour what is in the domain XML.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list