[libvirt PATCH] docs: add a kbase explaining security protections for QEMU passthrough
Kashyap Chamarthy
kchamart at redhat.com
Fri Feb 7 15:27:45 UTC 2020
On Thu, Feb 06, 2020 at 01:05:37PM +0000, Daniel P. Berrangé wrote:
The core content reads very well. A couple of minor nit-picks inline.
[...]
> diff --git a/docs/kbase/qemu-passthrough-security.rst b/docs/kbase/qemu-passthrough-security.rst
> new file mode 100644
> index 0000000000..7fb1f6fbdd
> --- /dev/null
> +++ b/docs/kbase/qemu-passthrough-security.rst
> @@ -0,0 +1,157 @@
[...]
> +XML document additions
> +======================
> +
> +To deal with the problem, libvirt introduced support for command line
Nit: s/command line/command-line/g (there are a few occurrences)
> +passthrough of QEMU arguments. This is achieved by supporting a custom
> +XML namespace, under which some QEMU driver specific elements are defined.
> +
> +The canonical place to declare the namespace is on the top level ``<domain>``
> +element. At the very end of the document, arbitrary command line arguments
> +can now be added, using the namespace prefix ``qemu:``
> +
> +::
If you can stomach the syntax chance, you can put the :: at the end of
the sentence.
> +
> + <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
> + <name>QEMUGuest1</name>
> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
> + ...
> + <qemu:commandline>
> + <qemu:arg value='-newarg'/>
> + <qemu:arg value='parameter'/>
I'd guess you intentionally took a generic example, rather than specific
QEMU command-line parameter to illustrate the XML, in case the example
command-line is deprecated, etc.
> + <qemu:env name='ID' value='wibble'/>
> + <qemu:env name='BAR'/>
> + </qemu:commandline>
> + </domain>
Is it worth calling out that the 'env' fragments are envirnoment
variables? As it isn't obvious to those who don't dwell on libvirt/QEMU
daily.
> +Note that when an argument takes a value eg ``-newarg parameter``, the argument
> +and the value must be passed as separate ``<qemu:arg>`` entries.
>
> +
> +Instead of declaring the XML namespace on the top level ``<domain>`` it is also
> +possible to declare it at time of use, which is more convenient for humans
> +writing the XML documents manually. So the following example is functionally
> +identical:
> +
> +::
Here too, you can put the :: at the end of the sentence, saving one
colon :D
> +
> + <domain type='kvm'>
> + <name>QEMUGuest1</name>
> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
> + ...
> + <commandline xmlns="http://libvirt.org/schemas/domain/qemu/1.0">
> + <arg value='-newarg'/>
> + <arg value='parameter'/>
> + <env name='ID' value='wibble'/>
> + <env name='BAR'/>
> + </commandline>
> + </domain>
> +
> +Note that when querying the XML from libvirt, it will have been translated into
> +the canonical syntax once more with the namespace on the top level element.
Here you might want to use the rST "note" admonition:
.. note:: When querying the XML from libvirt, it will have been
translated into canonical syntax once more with the namespace
on the top level element.
> +
> +Security confinement / sandboxing
> +=================================
> +
> +When libvirt launches a QEMU process it makes use of a number of security
> +technologies to confine QEMU and thus protect the host from malicious VM
> +breakouts.
> +
> +When configuring security protection, however, libvirt generally needs to know
> +exactly which host resources the VM is permitted to access. It gets this
> +information from the domain XML document. This only works for elements in the
> +regular schema, the arguments used with command line passthrough are completely
> +opaque to libvirt.
> +
> +As a result, if command line passthrough is used to expose a file on the host
> +to QEMU, the security protections will activate and either kill QEMU or deny it
> +access.
> +
> +There are two strategies for dealing with this problem, either figure out what
> +steps are needed to grant QEMU access to the device, or disable the security
> +protections. The former is harder, but more secure, while the latter is simple.
> +
> +Granting access per VM
> +----------------------
> +
> +* SELinux - the file on the host needs an SELinux label that will grant access
> + to QEMU's ``svirt_t`` policy.
> +
> + - Read only access - use the ``virt_content_t`` label
Nit: s/"Read only"/Read-only/
> + - Shared, write access - use the ``svirt_image_t:s0`` label (ie no MCS
> + category appended)
> + - Exclusive, write access - use the ``svirt_image_t:s0:MCS`` label for the VM.
> + The MCS is auto-generatd at boot time, so this may require re-configuring
> + the VM to have a fixed MCS label
> +
> +* DAC - the file on the host needs to be readable/writable to the ``qemu``
Nit: let's please expand acronyms on first use: "Discretionary Access
Control (DAC)"; although DAC and ACL (below) might be common enough for
"Linux dwellers" that we don't have to be pedantic about it. But MCS
(Multi-Category Security) is familiar only for those who are
SELinux-aware.
So, your choice, as I don't want to make you expand every acronym; but
only the obscure ones. :-)
> + user or ``qemu`` group. This can be done by changing the file ownership to
> + ``qemu``, or relaxing the permissions to allow world read, or adding file
> + ACLs to allow access to ``qemu``.
> +
> +* Namespaces - a private ``mount`` namespace is used for QEMU by default
> + which populates a new ``/dev`` with only the device nodes needed by QEMU.
> + There is no way to augment the set of device nodes ahead of time.
> +
> +* Seccomp - libvirt launches QEMU with its built-in seccomp policy enabled with
> + ``obsolete=deny``, ``elevateprivileges=deny``, ``spawn=deny`` and
> + ``resourcecontrol=deny`` settings active. There is no way to change this
> + policy on a per VM basis
Missing full stop at the end here ...
> +
> +* Cgroups - a custom cgroup is created per VM and this will either use the
> + ``devices`` controller or an ``BPF`` rule to whitelist a set of device nodes.
> + There is no way to change this policy on a per VM basis.
> +
> +Disabling security protection per VM
> +------------------------------------
> +
> +Some of the security protections can be disabled per-VM:
> +
> +* SELinux - in the domain XML the ``<seclabel>`` model can be changed to
> + ``none`` instead of ``selinux``, which will make the VM run unconfined.
> +
> +* DAC - in the domain XML an ``<seclabel>`` element with the ``dac`` model can
> + be added, configured with a user / group account of ``root`` to make QEMU run
> + with full privileges
... here,
> +* Namespaces - there is no way to disable this per VM
> +
> +* Seccomp - there is no way to disable this per VM
> +
> +* Cgroups - there is no way to disable this per VM
> +
> +Disabling security protection host-wide
> +---------------------------------------
> +
> +As a last resort it is possible to disable security protection host wide which
> +will affect all virtual machines. These settings are all made in
> +``/etc/libvirt/qemu.conf``
... and here.
> +
> +* SELinux - set ``security_default_confied = 0`` to make QEMU run unconfined by
> + default, while still allowing explicit opt-in to SELinux for VMs.
> +
> +* DAC - set ``user = root`` and ``group = root`` to make QEMU run as the root
> + account
> +
> +* SELinux, DAC - set ``security_driver = []`` to entirely disable both the
> + SELinux and DAC security drivers.
> +
> +* Namespaces - set ``namespaces = []`` to disable use of the ``mount``
> + namespaces, causing QEMU to see the normal fully popualated ``dev``
> +
> +* Seccomp - set ``seccomp_sandbox = 0`` to disable use of the Seccomp sandboxing
> + in QEMU
> +
> +* Cgroups - set ``cgroup_device_acl`` to include the desired device node, or
> + ``cgroup_controllers = [...]`` to exclude the ``devices`` controller.
I'll let you pick what you want to address, as this doc is an
improvement as-is, FWIW:
Reviewed-by: Kashyap Chamarthy <kchamart at redhat.com>
--
/kashyap
More information about the libvir-list
mailing list