[RFC DOCUMENT 02/12] kubevirt-and-kvm: Add Components page
abologna at redhat.com
Wed Sep 16 16:46:57 UTC 2020
This document describes the various components of the KubeVirt
architecture, how they fit together, and how they compare to the
traditional virtualization architecture (QEMU + libvirt).
## Traditional architecture
For the comparison to make sense, let's start by reviewing the
architecture used for traditional virtualization.
(Image taken from the "[Look into libvirt]" presentation by Osier
Yang, which is a bit old but still mostly accurate from a high-level
In particular, the `libvirtd` process runs with high privileges on
the host and is responsible for managing all VMs.
When asked to start a VM, the management process will
* Prepare the environment by performing a number of privileged
* Set up CGroups
* Set up kernel namespaces
* Apply SELinux labels
* Configure network devices
* Open host files
* Start a non-privileged QEMU process in that environment
To understand how KubeVirt works, it's first necessary to have some
knowledge of Kubernetes.
In Kubernetes, every user workload runs inside [Pods]. The pod is
the smallest unit of work that Kubernetes will schedule.
Some facts about pods:
* They consist of multiple containers
* The containers share a network namespace
* The containers have their own PID and mount namespace
* The containers have their own CGroups for CPU, memory, devices and
so forth. They are controlled by k8s and can’t be modified from
* Pods can be started with extended privileges (`CAP_NICE`,
`CAP_NET_RAW`, root user, ...)
* The app in the pods can drop the privileges, but the pod can not
drop them (`kubectl exec` gives you a shell with the full
Creating pods with elevated privileges is generally frowned upon, and
depending on the policy decided by the cluster administrator it might
be outright impossible.
## KubeVirt architecture
Let's now discuss how KubeVirt is structured.
The main components are:
* `virt-launcher`, a copy of which runs inside each pod besides QEMU
and libvirt, is the unprivileged component responsible for
receiving commands from other KubeVirt components and reporting
back events such as VM crashes;
* `virt-handler` runs at the node level via a DaemonSet, and is the
privileged component which takes care of the VM setup by reaching
into the corresponding pod and modifying its namespaces;
* `virt-controller` runs at the cluster level and monitors the API
server so that it can react to user requests and VM events;
* `virt-api`, also running at the cluster level, exposes a few
additional APIs that only apply to VMs, such as the "console" and
When a KubeVirt VM is started:
* We request a Pod with certain privileges and resources from
* The kubelet (the node daemon of kubernetes) prepares the
environment with the help of a container runtime.
* A shim process (virt-launcher) is our main entrypoint in the pod,
which starts libvirt
* Once our node-daemon (virt-handler) can reach our shim process, it
does privileged setup from outside. It reaches into the namespaces
and modifies their content as needed. We mostly have to modify the
mount and network namespaces.
* Once the environment is prepared, virt-handler asks virt-launcher
to start a VM via its libvirt component.
More information can be found in the [KubeVirt architecture] page.
The two architectures are quite similar from the high-level point of
view: in both cases there are a number of privileged components which
take care of preparing an environment suitable for running an
unprivileged QEMU process in.
The difference, however, is that while libvirtd takes care of all
this setup itself, in the case of KubeVirt several smaller components
are involved: some of these components are privileged just as libvirtd
is, but others are not, and some of the tasks are not even performed
by KubeVirt itself but rather delegated to the existing Kubernetes
## Use of libvirtd in KubeVirt
In the traditional virtualization scenario, `libvirtd` provides a
number of useful features on top of those available with plain QEMU,
* support for multiple clients connecting at the same time
* management of multiple VMs through a single entry point
* remote API access
KubeVirt interacts with libvirt under certain conditions that make
the features described above irrelevant:
* there's only one client talking to libvirt: `virt-handler`
* libvirt is only asked to manage a single VM
* client and libvirt are running in the same pod, no remote libvirt
[KubeVirt architecture]: https://github.com/kubevirt/kubevirt/blob/master/docs/architecture.md
[Look into libvirt]: https://www.slideshare.net/ben_duyujie/look-into-libvirt-osier-yang
More information about the libvir-list