[RFC] Add basic driver for the Cloud-Hypervisor

Daniel P. Berrangé berrange at redhat.com
Tue Sep 22 08:11:09 UTC 2020


On Mon, Sep 21, 2020 at 12:05:48PM -0700, Douglas, William wrote:
> On Tue, Sep 15, 2020 at 5:53 AM Daniel P. Berrangé <berrange at redhat.com> wrote:
> >
> > On Thu, Aug 27, 2020 at 11:24:32AM -0700, William Douglas wrote:
> 
> <snip>
> 
> > > +virCHMonitorPtr
> > > +virCHMonitorNew(virDomainObjPtr vm, const char *socketdir)
> > > +{
> > > +    virCHMonitorPtr ret = NULL;
> > > +    virCHMonitorPtr mon = NULL;
> > > +    virCommandPtr cmd = NULL;
> > > +    int pings = 0;
> > > +
> > > +    if (virCHMonitorInitialize() < 0)
> > > +        return NULL;
> > > +
> > > +    if (!(mon = virObjectLockableNew(virCHMonitorClass)))
> > > +        return NULL;
> > > +
> > > +    mon->socketpath = g_strdup_printf("%s/%s-socket", socketdir, vm->def->name);
> > > +
> > > +    /* prepare to launch Cloud-Hypervisor socket */
> > > +    if (!(cmd = chMonitorBuildSocketCmd(vm, mon->socketpath)))
> > > +        goto cleanup;
> > > +
> > > +    if (virFileMakePath(socketdir) < 0) {
> > > +        virReportSystemError(errno,
> > > +                             _("Cannot create socket directory '%s'"),
> > > +                             socketdir);
> > > +        goto cleanup;
> > > +    }
> > > +
> > > +    /* launch Cloud-Hypervisor socket */
> > > +    if (virCommandRunAsync(cmd, &mon->pid) < 0)
> > > +        goto cleanup;
> > > +
> > > +    /* get a curl handle */
> > > +    mon->handle = curl_easy_init();
> > > +
> > > +    /* try to ping VMM socket 5 times to make sure it is ready */
> > > +    while (pings < 5) {
> > > +        if (virCHMonitorPingVMM(mon) == 0)
> > > +            break;
> > > +        if (pings == 5)
> > > +            goto cleanup;
> > > +
> > > +        g_usleep(100 * 1000);
> > > +    }
> >
> > This is highly undesirable. Is there no way to launch the CH process
> > such that the socket is guaranteed to be accepting requests by the
> > time it has forked into the background ? Or is there a way to pass
> > in a pre-opened FD for the monitor socket ?
> >
> > This kind of wait with timeout will cause startup failures due to
> > timeout under load.
> 
> Currently the cloud-hypervisor process doesn't fork into the
> background so that was initially what I was thinking to add to
> cloud-hypervsior. Would tracking the pid of the cloud-hypervsior
> running in the background be required at that point (assuming the
> socket path to control the daemon would be known and working given
> virCommandRun returns successfully)?

It doesn't especially matter whether cloud-hypervsior forks into
the background itself, or whether libvirt forks it into the
background when launching it.

The important thing is simply to ensure that whichever startup
approach is used can be implemented in a race-free manner without
needing busy-waits or timeouts.

If cloud-hypervisor forks into the background, then it would
need to write a pidfile so libvirt can determine the pid that
ended up running. The act of libvirt waiting for the pidfile
to be written is potentially racy though, so that might not be
be best.

Based on what we learnt from QEMU, I think being able to pass
in a pre-created UNIX listener socket is the best. That lets
libvirt immedately issue a connect() start after forking the
cloud-hypervisor process. If cloud-hypervisor succesfully
starts up, then the client socket becomes live and can be used.
if it fails to startup, then the client socket libvirt has
will get EOF and thus we'll see the startup failure. This
avoids any looping/sleeping/timeout.


> > What is the security model for the processes ? The libvirt driver
> > here is running as root and spawning CH as root. We don't really
> > want the VM process running as root though. It really needs to
> > be unprivileged from a DAC POV. Obviously that's quite a bit more
> > work for you todo, but it should be opssible to share alot of the
> > security mgmt infrastructure that we already have for QEMU and
> > LXC drivers. It can manage DAC permissions and apply SELinux or
> > AppArmor MAC labelling/profiles.
> 
> Currently cloud-hypervisor runs with cap_net_admin permitted and
> effective on it. The implementation will be updated to run under the
> user's session using a non-root driver (with its own daemon as you had
> mentioned earlier).
> 
> >
> > The other big question is around device addressing. If using PCI,
> > then PCI address assignment logic is critical to ensure consistent
> > guest ABI.
> >
> 
> I'll be adding the device passthrough for the actual driver submission
> but if you would like to see an updated RFC with that added I could do
> so.

No need to do that in an initial patch. It is just something I wanted
to raise as an item needed to make it sustainable for live migration
type uses cases in particular.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list