[RFC DOCUMENT 02/12] kubevirt-and-kvm: Add Components page

Wed Sep 16 16:46:57 UTC 2020

https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Components.md

# Components

This document describes the various components of the KubeVirt
architecture, how they fit together, and how they compare to the
traditional virtualization architecture (QEMU + libvirt).

## Traditional architecture

For the comparison to make sense, let's start by reviewing the
architecture used for traditional virtualization.

![libvirt architecture][Components-Libvirt]

(Image taken from the "[Look into libvirt][]" presentation by Osier
Yang, which is a bit old but still mostly accurate from a high-level
perspective.)

In particular, the `libvirtd` process runs with high privileges on
the host and is responsible for managing all VMs.

When asked to start a VM, the management process will

* Prepare the environment by performing a number of privileged
  operations upfront
* Set up CGroups
* Set up kernel namespaces
* Apply SELinux labels
* Configure network devices
* Open host files
* ...
* Start a non-privileged QEMU process in that environment

## Kubernetes

To understand how KubeVirt works, it's first necessary to have some
knowledge of Kubernetes.

In Kubernetes, every user workload runs inside [Pods][]. The pod is
the smallest unit of work that Kubernetes will schedule.

Some facts about pods:

* They consist of multiple containers
* The containers share a network namespace
* The containers have their own PID and mount namespace
* The containers have their own CGroups for CPU, memory, devices and
  so forth. They are controlled by k8s and can’t be modified from
  outside.
* Pods can be started with extended privileges (`CAP_NICE`,
  `CAP_NET_RAW`, root user, ...)
* The app in the pods can drop the privileges, but the pod can not
  drop them (`kubectl exec` gives you a shell with the full
  privileges).

Creating pods with elevated privileges is generally frowned upon, and
depending on the policy decided by the cluster administrator it might
be outright impossible.

## KubeVirt architecture

Let's now discuss how KubeVirt is structured.

![KubeVirt architecture][Components-Kubevirt]

The main components are:

* `virt-launcher`, a copy of which runs inside each pod besides QEMU
  and libvirt, is the unprivileged component responsible for
  receiving commands from other KubeVirt components and reporting
  back events such as VM crashes;
* `virt-handler` runs at the node level via a DaemonSet, and is the
  privileged component which takes care of the VM setup by reaching
  into the corresponding pod and modifying its namespaces;
* `virt-controller` runs at the cluster level and monitors the API
  server so that it can react to user requests and VM events;
* `virt-api`, also running at the cluster level, exposes a few
  additional APIs that only apply to VMs, such as the "console" and
  "vnc" actions.

When a KubeVirt VM is started:

* We request a Pod with certain privileges and resources from
  Kubernetes.
* The kubelet (the node daemon of kubernetes) prepares the
  environment with the help of a container runtime.
* A shim process (virt-launcher) is our main entrypoint in the pod,
  which starts libvirt
* Once our node-daemon (virt-handler) can reach our shim process, it
  does privileged setup from outside. It reaches into the namespaces
  and modifies their content as needed. We mostly have to modify the
  mount and network namespaces.
* Once the environment is prepared, virt-handler asks virt-launcher
  to start a VM via its libvirt component.

More information can be found in the [KubeVirt architecture][] page.

## Comparison

The two architectures are quite similar from the high-level point of
view: in both cases there are a number of privileged components which
take care of preparing an environment suitable for running an
unprivileged QEMU process in.

The difference, however, is that while libvirtd takes care of all
this setup itself, in the case of KubeVirt several smaller components
are involved: some of these components are privileged just as libvirtd
is, but others are not, and some of the tasks are not even performed
by KubeVirt itself but rather delegated to the existing Kubernetes
infrastructure.

## Use of libvirtd in KubeVirt

In the traditional virtualization scenario, `libvirtd` provides a
number of useful features on top of those available with plain QEMU,
including

* support for multiple clients connecting at the same time
* management of multiple VMs through a single entry point
* remote API access

KubeVirt interacts with libvirt under certain conditions that make
the features described above irrelevant:

* there's only one client talking to libvirt: `virt-handler`
* libvirt is only asked to manage a single VM
* client and libvirt are running in the same pod, no remote libvirt
  access

[Components-Kubevirt]: 
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Images/Components-Kubevirt.png
[Components-Libvirt]: 
https://gitlab.com/abologna/kubevirt-and-kvm/-/blob/master/Images/Components-Libvirt.png
[KubeVirt architecture]: https://github.com/kubevirt/kubevirt/blob/master/docs/architecture.md
[Look into libvirt]: https://www.slideshare.net/ben_duyujie/look-into-libvirt-osier-yang
[Pods]: https://kubernetes.io/docs/concepts/workloads/pods/