qemu:///embed and isolation from global components

Andrea Bolognani abologna at redhat.com
Mon Mar 9 17:09:13 UTC 2020


On Fri, 2020-03-06 at 17:49 +0000, Daniel P. Berrangé wrote:
> On Fri, Mar 06, 2020 at 06:24:15PM +0100, Andrea Bolognani wrote:
> >   * it does, however, show up in the output of 'machinectl', with
> >     class=vm and service=libvirt-qemu;
> 
> This is bad. It is one of the gaps we need to deal with.
> 
> A long while back I proposed a domain XML option for this:
> 
>   https://www.redhat.com/archives/libvir-list/2017-December/msg00729.html
> 
>     <resource register="none|direct|machined|systemd"/>
> 
> The "none" case meant inherit the cgroups placement of the parent
> libvirtd (or now the app using the embedded driver), and would be
> a reasonable default for the embedded case.

Agreed. In my case I'll want to use "direct" instead, but "none"
would indeed be a sane default for the embedded driver.

Aside: instead of a per-VM setting, would it make sense for this to
be a connection-level setting? That way, even on traditional libvirt
deployments, the hypervisor admin could eg. opt out of machinectl
registration if they so desire; at the same time, you'd avoid the
situation where most VMs are confined using CGroups but a few are
not, which is probably not a desirable scenario.

> For the other cases, we certainly need to do something to ensure
> uniqueness. This is possibly as simple as including a fixed
> prefix like "qemu-$PID" where $PID is the app embedding the
> driver. That said, we support closing the app, and re-starting
> using the same embedded driver directory, which would change
> PID.

Right now we're already doing

  qemu-$domid-$domname

where I'm not entirely sure how much $domid actually buys us.

I think machine ids need to be unique per host, not per service,
which is kind of a bummer because the obvious choice would be to
generate a service name based on the embedded root... Michal already
suggested another solution, perhaps that one is more viable.

Anyway, I think it's reasonable to expect that, when it comes to VMs
created via the embedded driver, the same way you'd not be able to
control them via virsh, you'd also not be able to do so via
machinectl, so I'm not too concerned about this once we flip the
default to "none" as discussed above.

> >   * virtlogd is automatically spawned, if not already active, to
> >     take care of the domain's log file.
> 
> This is trickier. The use of virtlogd was to fix a security DoS
> where malicious QEMU can write to serial console, or trigger
> QEMU to write to stdout/err, such that it exhausts the host
> filesystem.  IIUC, virtlogd should still end up writing to
> the logfile path associated with the embedded  driver root
> directory prefix, so there shouldn't be a filename clash
> unless I screwed up.
> 
> Since introducing virtlogd, however, I did think of a different
> strategy, which would be to connect a FIFO to QEMU as the log
> file FD. The QEMU driver itself can own the other end of the FIFO
> and do rollover.
> 
> Of course you can turn off virtlogd via qemu.conf

That's what I'm doing right now and it works well enough, but I'm
afraid that requiring a bunch of setup will discourage developers
from using the embedded driver. We should aim for a reasonable out
of the box experience instead.

> > The first one is expected, but the other two were a surprise, at
> > least to me. Basically, what I would expect is that qemu:///embed
> > would work in a completely isolated way, only interacting with
> > system-wide components when that's explicitly requested.
> 
> The goal wasn't explicitly to avoid system-wide components,
> but it certainly was intended to avoid clashing resources.
> 
> IOW, machined, virtlogd are both valid to use, as long as
> the resource clashes are avoided. We should definitely have
> a way to disable these too.

I'd argue that most users of the embedded driver would probably
prefer it didn't interact with system-wide components: if that
wasn't the case, they'd just use qemu:///system or qemu:///session
instead.

Having a way to turn off those behaviors would certainly be a step
in the right direction, but I think ultimately we want to be in a
situation where developers opt in rather than out of them.

> > In other words, I would expect virtlogd not to be spawned, and the
> > domain not to be registered with machinectl; at the same time, if
> > the domain configuration is such that it contains for example
> > 
> >   <interface type='network'>
> >     <source network='default'/>
> >     <model type='virtio'/>
> >   </interface>
> > 
> > then I would expect to see a failure unless a connection to
> > network:///system had been explicitly and pre-emptively opened, and
> > not the current behavior which apparently is to automatically open
> > that connection and spawning virtnetworkd as a result.
> 
> The QEMU embedded driver is explicitly intended to be able to
> interact with other global secondary drivers.
> 
> If you don't want to use virtnetworkd, then you won't be
> creating such an <interface> config in the first place.
> The app will have the option to open an embedded seconary
> driver if desired. Some of the drivers won't really make
> sense as embedded things though, at least not without
> extra work. ie a embedded network or nwfilter driver has
> no value unless your app has moved into a new network
> namespace, as otherwise it will just fight with the
> global network driver.

Again, I think our defaults for qemu:///embed should be consistent
and conservative: instead of having developers opt out of using
network:///system, they should have to opt in before global
resources are involved.

If we don't do that, I'm afraid developers will lose trust in the
whole qemu:///embed idea. Speaking from my own experience, I was
certainly surprised when I accidentally realized my qemu:///embed
VMs were showing up in the output of machinectl, and now I'm kinda
wondering how many other ways the application I'm working on, for
which the use of libvirt is just an implementation detail, is poking
at the system without my knowledge...

-- 
Andrea Bolognani / Red Hat / Virtualization




More information about the libvir-list mailing list