[libvirt] [PATCH 0/5] Support embedding the QEMU driver in client apps directly

Mon May 20 13:21:10 UTC 2019

On Mon, May 20, 2019 at 10:53:30AM +0200, Peter Krempa wrote:
> On Fri, May 17, 2019 at 17:49:31 +0100, Daniel Berrange wrote:
> > On Fri, May 17, 2019 at 06:45:08PM +0200, Peter Krempa wrote:
> > > On Fri, May 17, 2019 at 15:22:12 +0200, Peter Krempa wrote:
> > > > On Fri, May 17, 2019 at 13:24:52 +0100, Daniel Berrange wrote:
> > > > > This patch series proposes a new way to build a "libvirt QEMU shim"
> > > > > which supports more use cases than the previous approach from last year,
> > > > > as well as being vastly simpler to implement.
> > > > 
> > > > Few thoughts:
> > > 
> > > two more:
> > > 
> > > [...]
> > > 
> > > 9) Users may have different ideas regarding values in qemu.conf
> > > (such as default_tls_x509_cert_dir) thus we probably need a way to
> > > provide that one as well separately.
> > 
> > I believe my patch should already deal with that as I prefix all the
> > dirs that the QEMU driver knows about.
> 
> Hmm, I'm not sure whether just prefixing every path used internally with
> the passed prefix will be a good idea.
> 
> Ideally we should keep the directory contents opaque to the user so they
> don't ever try to mess with the files in the directory. This
> disqualifies the use of this for passing in the qemu config file.

I don't think the sub-dir is really much different from the normal
libvirt locations in that respect. We already recommend apps/admins
to not touch /etc/libvirt/qemu files, nor the /run/libvirt/qemu
files, nor the /var/lib/libvirt/qemu files. At the same time though
there are some files which are reasonable for apps/admins to look
at. UNIX domain sockets for virtio-serial devices in the local
state sub-dir. Per VM log files, though we really should create a
virStream API for accessing logs, and so on.

We don't really document this very well even today. If we improve
our docs in this area, I don't think apps need to treat the virtial
prefix dirs any different from how we tell apps to treat libvirt's
files under /

> This also reminds me that we need some locking for the directory so that
> there aren't two processes openning the same prefix and messing up the
> files internally. It will even not work in some cases because the
> attempt to reopen the monitor sockets would possibly fail. If not we've
> got even more of a problem.

QEMU should reject multiple attempts to open monitor socket, or rather
the second attempt will just never succeeed/hang, since QEMU won't
call accept() until the first connection goes away. This 1-1 model
is an intentional design limitation of chardevs in QEMU.

We should definitely look at doing locking though. Currently we implicitly
have the libvirtd daemon's own pidfile as the mutual exclusion mechanism.

We'll need to have a separate lockfile independantly of that for the
QEMU driver. Probably just $prefix/var/run/libvirt/qemu/driver.lock is
good enough

> Also this would mean that a prefix of '/' would be equal to the system
> instance handled by libvirtd so we need also interlocking with libvirt
> if we don't change the layout.

Yes.

> One disadvantage of this idea is that I can see users that will
> actually use this to replace libvirt's RPC with something else.
> This will be caused by the fact that there can be only one process
> handling each prefix as extending qemu driver to allow concurrent access
> would be a nightmare.

Yes, but in many ways we already have this situation.

libvirt-dbus is providing a new RPC mechanism to talk to libvirt. It happens
to then delegate to libvirt's own RPC, but from an app developer POV that
is invisible.

Large scale mgmt apps like OpenStack/oVirt/KubeVirt have in some sense
replaced libvirt RPC at the cross-host level - they only use libvirt
RPC within scope of a single host, so most of the time there's only a
single client of libvirtd - VDSM, Nova, or Kubevirt's virt handler.

Of course there is a critical distinction. Even if there's normally
only a single client of libvirtd, other clients are not prevented
from running if needed. So you can still run "virsh list" on a host
running openstack/oVirt and get "normal" results. KubeVirt has
changed since they run one-libvirtd instance per VM so  any
aggregated APIs like "virsh list" are largely useless, only showing
1 VM.

The original shim concept was intended to try to fix that behaviour
by allowing the shim to register its existance back with the central
libvirtd. The feedback from kubevirt devs though is that even if we
had that ability in the shim, they'd likely not use it.

If we consider just the single VM oriented APIs though, the inability
to use "virsh" commands with an embedded driver instance could be a
major pain point for debugging runtime problems, depending on the app.

For short lived VMs used for embedded purposes such as with libguestfs
I think the limitation would be tolerable.

For long lived VMs for the inability to use virsh to debug would be
pretty troubling, unless the app embedding the QEMU driver exposed
a comparable set of features for collecting debug info.

You'd also be unable to use QMP directly unless a second monitor was
added to every VM, since QEMU limits you to one active connection
per monitor chardev.

This debuggability issue is probably the biggest downside to this
embedded driver approach.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|