[libvirt] RFC: extending sVirt to confine host apps which talk to libvirtd
Daniel J Walsh
dwalsh at redhat.com
Mon Jun 6 18:51:15 UTC 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 06/06/2011 10:41 AM, Daniel P. Berrange wrote:
> What follows is a document outlining some thoughts I've been having
> on extending sVirt to allow confinement of applications which talk
> to libvirtd on the host, primarily focusing on use of SELinux, but
> also allowing a simple non-SElinux RBAC mechanism.
>
> Securing KVM virtualization hosts with MAC
> ==========================================
>
> This document looks at the task of securing KVM virtualizaton
> hosts using mandatory access control technologies, with focus
> on SELinux. At the time of writing there have been two phases
> of development, and this document makes proposals for a third
> phase.
>
> Phase 1: circa 2006
> -------------------
>
> Goal: Protect the host from a compromised virtual machine.
>
> The first phase of development had the modest goal of
> protecting the host from attack by a compromised virtual
> machine. To achieve this, the KVM processes are configured
> such that they will run under a confined security context
> ('virt_t' in the SELinux reference policy), which blocks
> access to any host resources not labelled ('virt_image_t')
> for use by virtual machines.
>
> The primary limitations of this initial implementation
> is that while the virtual host is secured, there is no
> protection between virtual machines. This can be considered
> a regression in isolation as compared to that offered by
> non-virtualized hosts. The second limitation is that the
> virtualization admin has to take care to ensure the host
> resources intended for use by the virtual machines are
> correctly labelled. This is a manual setup taks unless
> the images are kept in a preset location (/var/lib/libvirt/images
> in the SELinux reference policy).
>
>
>
> Phase 2: March 2009
> -------------------
>
> Goal: Protect virtual machines from each other
>
> The second phase of development has the goal of providing
> isolation between virtual machines that is comparable to
> that achieved between physical machines. This piece of
> work is commonly referred to as "svirt". The achieve this,
> the KVM processes are each configured to run under a
> dedicated security context, which blocks access to any
> resources not explicitly assigned to that virtual machine.
> In the SELinux implementation, the base context "svirt_t"
> has a unique MCS category ("c240,c955") appended to form
> a unique security context "system_u:system_r:svirt_t:s0:c240,c955".
> For each host resource to be assigned to the virtual machine,
> the base context "svirt_image_t" is combined with the same
> MCS category to form a unique resource security context
> "system_u:object_r:svirt_image_t:s0:c240,c955".
>
> The assignment of virtual machine security contexts and
> labelling of resources can be done statically by the
> administrator / management application, or dynamically
> by the libvirtd daemon. The latter removes much of the
> administrator burden.
>
> The second phase has addressed the major guest security
> limitation of the first phase, and eased the burden placed
> on host administors. Attention can now focus on the security
> of the host management software stack. Client applications
> communicate with the libvirtd daemon using a simple sockets
> based RPC protocol. Thus operations initiated by client
> applications which run under one security context are in
> fact invoked under the libvirtd daemon's security context.
> Since the libvirtd daemon is a highly privileged, almost
> unconfined process, this provides a means for applications
> to elevate their privileges.
>
> A second problem with the current model is seen when looking
> at guest migration between hosts. During migration, there
> are two QEMU processes running for the same virtual machine,
> one process on each host. The dynamic assignment of MCS
> values to form unique security contexts is done on a per host
> basis, so there is no guarantee that the VM on host A will be
> using (or be able to use) the same security context on the
> target host of migration. This is not neccessarily a problem
> if the guest is using block devices, since block device inode
> labels are only visible to a single host. With a shared
> filesystem that supports SELinux labelling, like GFS2, both
> QEMU processes must run in the same security context to allow
> them both to access the associated files.
>
>
> Phase 3: June 2011
> ------------------
>
> Goal: Protect virtual machines from host applications
>
> The third phase of development has the primary goal of
> honouring the confinement of client applications talking
> to libvirtd, when performing operations on virtual machines
> and other managed objects (storage pools, host devices,
> virtual networks, secrets, etc). Every application connecting
> to libvirt has an associated security context. Every object
> managed by libvirtd will have an associated security context.
> When an operation is invoked via a libvirt API the client
> application security context will be checked against the
> target object context, before proceeding. Thus applications
> will not be able to make use of a libvirtd connection to
> perform operations that are otherwise blocked.
>
> The secondary goal is to add further flexibility and safety
> to the way MCS categories are assigned, and files are relabelled.
> Instead of maintaining a local database of assigned labels, there
> must be some shared storage where label usage can be recorded.
> At its simplest this can be an NFS share, with one file per MCS
> category and locking with fcntl(). An alternative would to be
> acquire leases using a lock manager such as sanlock. In addition,
> the guest configuration will be enhanced such that a guest can
> be assigned a statically chosen security context, but still make
> use of dynamic relabelling of resources. Finally the existing
> boolean mode of 'static' vs 'dynmamic' label generation will be
> turned into a tri-state, introducing a 'hybrid' mode where the
> client supplies a custom base context, and the MCS part is still
> auto-generated.
>
>
> Usage scenarios
> ---------------
>
> To aid in development a couple of relevant core use cases
> or usage scenarios have been identified:
>
> 1. A virtual machine monitoring application
>
> For this example, consider the simple monitoring application
> 'virt-top'. This application displays a list of all virtual
> machines on the host and their associated resource utilization
> (CPU, disk, network). This application has no need to be able
> to stop/start/define virtual machines, nor do any operation
> related to host devices, storage, or networking. Traditionally
> this application is written to use a read only libvirt connection.
>
> With enhanced access control from libvirtd, the policy would define
> a new security context 'virt_top_t' for the 'virt-top' application.
> This policy would allow 'list', 'read', 'readstats' on the 'domain'
> object type.
>
>
> 2. A multi-guest, multi-user MLS enabled host
>
> For this example, consider a virtualizaton host with MLS policy
> that is running multiple virtual machines, for a variety of
> different users. A user with the security level "restricted"
> must not be allowed to control virtual machines with a security
> level of "confidential". Conversely a user with security level
> "secret" must not be allowed to create virtual machines with a
> security level of "unclassified".
>
> With enhanced access control from libvirtd, getpeercon() would
> provide the security context of the client application (user).
> The client context would be used to perform an AVC when any API
> operation is invoked, thus ensuring that the client's MLS
> label is honoured in access control checks. The effect would be
> that when an 'restricted' user asked for a list of virtual machines
> only virtual machines at level 'restricted' or below would be
> returned. Or when a "secret" user asked to start a guest when
> a security level of 'unclassified', the operation would be denied.
>
>
> 3. Identity transitions from trusted agents
>
> For this example, consider a trusted agent such as libvirt-qpid,
> or libvirt-snmp, which translates the libvirt API from its native
> model, into an alternate access model. In such an example, the
> agent talking to libvirtd will have authenticated itself. The
> peer identity that libvirtd sees, however, is that of the agent,
> not the ultimate (end-user) client. In such a case it will desirable
> to allow a trusted agent to transition to a different identity when
> performing operations.
>
> An end user running under context "unconfined_u:unconfined_r:virt_top_t:s0-s0:c0.c1023"
> may talk to the libvirt-qpid agent which runs under the context
> "system_u:system_r:virt_qpid_t:s0-s0:c0.c1023". The libvirt-qpid
> connects to libvirtd which sees 'virt_qpid_t' as the client type.
> The policy is written to allow transitions from 'virt_qpid_t' to
> the 'virt_top_t' type, so when the virt-top client connects to
> libvirt-qpid, it changes its identity to 'virt_top_t'. From that
> point onwards, all AVC checks honour the privileges of the ultimate
> end user application, rather than the libvirt-qpid intermediary.
> The same mechanism also ensures that the client application MLS
> level is transferred via the libvirt-qpid agent to libvirtd.
>
>
> Anticipated Development tasks
> -----------------------------
>
> 1. Extend the domain XML to add a third attribute to the <seclabel>
> element relabel="yes|no", to control whether libvirtd will
> automatically label resources assigned to a guest. If the
> existing 'mode' attribute is "dynamic", then relabelling will
> default to enabled, while if it is 'static', then relabelling
> will default to disabled. Also change 'mode' to allow a new
> 'hybrid' value.
>
> 2. Determine how to maintain/identify security labels for other
> managed objects, including virStoragePoolPtr, virStorageVolPtr,
> virSecretPtr, virNetworkPtr, virInterfacePtr, virNodeDevicePtr,
> an host level APIs without any explicit managed object.
>
> 3. Extend XML for non-domain objects to implant security labels
> as identified in step 2.
>
> 4. Create an internal virIdentity struct to store the identity
> of the client. This will include at least the x509 distinguished
> name, the SASL username, the SELinux context (getpeercon())
> and UNIX username/group (SCM_CREDENTIALS).
>
> 5. Create a new public API to allow a client application to
> supply a new identity, allowing them to pass a new x509
> distinguished name, SASL username, SELinux context and
> UNIX username/group.
>
> 6. Extend the libvirtd daemon such that the current identity
> is stored in a thread local whenever invoking a public
> API operation.
>
> 7. Extend the QEMU driver such that a suitable identity is
> set when performing autonomous background operations
> such as domain auto-start and core dump, in a non-API
> thread.
>
> 8. Create a set of internal access control helper APIs in
> $libvirt/src/accesscontrol/. There will be one API for each
> managed object, talking an object pointer, and an operation
> identifier (from an enum).
>
> 9. Create a simple impl of the access control APIs which defines
> roles for groups of user identities, and grants privileges to
> each role based on the operation names. This allows for simple
> testing of internal infrastructure, and an RBAC mechanism for
> users who lack SELinux in their OS.
>
> 10. Implant access control checks into the main codepaths of every
> driver method implementations in the QEMU driver.
>
> 11. Change the SELinux reference policy to define the new security
> types and access vectors for the libvirt objects & associated
> API calls.
>
> 12. Create a SELinux impl of the access control APIs which invokes
> avc_has_perm() using the client's SELinux context. This is
> intended to be the primary RBAC mechanism for Fedora/RHEL
> virtualization hosts.
>
> 13. Write policy to confine targetted applications like virt-top,
> virt-mem.
>
> 14. Extend libvirt-snmp, libvirt-cim, libvirt-qpid to pass through
> the client identity to libvirtd.
>
>
> Technical Notes / Issues
> ------------------------
>
> 1. Adding new SELinux security classes / access vectors
>
> The selinux security classes are defined in /usr/include/selinux/flask.h
> and access vectors in /usr/include/selinux/av_permissions.h Both of these
> files are automatically by a script in the selinux reference policy code
> '$serefpolicy/policy/flask/flask.py'. The master data files are in the
> same directory, 'access_vectors' and 'security_classes'. Once generated,
> the headers need to be manually copied into the libselinux package
> sources.
>
You do not need to do this anymore. libselinux does not care about the
access vectors, they are named in your application.Well
>
> APIs are added to libvirt on a very frequent basis. What is the process
> for applying access control to them if the SELinux policy does not yet
> have a suitable access vector / security class defined ? Do we need a
> generic 'admin' access vector we can use as catch all, until more
> specific vectors can be defined for the new APIs. Desirable to avoid
> having to lock-step upgrade libvirt with selinux policy for all additions
> to the libvirt public API.
>
Well one benefit would be unconfined_t, although I am not sure it would
have access.
>
> 2. Security contexts for libvirt managed objects
>
> virDomainPtr: Already embedded in XML, unless using dynamic labelling
> in which case context is assigned at startup.
>
> virNetworkPtr: No existing security context, nor any object on disk
> that could be used. Follow example of domains and embed
> <seclabel> in the XML. Assign unique MCS category per
> network and ensure that daemons launched per network
> (dnsmasq, radvd) inherit the MCS category.
>
> virSecretPtr: No existing security context. Secrets may be associated
> with disk paths for VMs. Could copy the security context
> of the guests and apply it to the secret, or have a
> dedicated type svirt_secret_t and just copy the MCS
> category. Hard to make it work for guests with dynamic
> MCS assignment.
>
> virStoragePoolPtr: No existing security context. Some pool types have
> objects existing on the host filesystem eg SCSI
> HBAs have a directory in sysfs, filesystem dirs
> have a directory somewhere, LVM has directory
> for the volume group in /dev. Other pool types have
> no object on disk anywhere convenient. eg Sheepdog.
> Other pool types only have an object on disk when
> the pool is active (eg iSCSI, NFS). So there is
> nothing to use for API checks when the pool is
> inactive.
>
> Likely have to ignore whatever associated resource
> is on disk and just store a security context in the
> XML config as with virDomainPtr/virNetworkPtr.
>
>
> virStorageVolPtr: Currently reports the SELinux security label associated
> with the file on disk. Not all pool types neccessarily
> have volumes with a corresponding file on disks (eg
> Sheepdog).
>
> virNodeDevicePtr: No existing security context. Most data comes from udev
> or HAL databases, though ultimately much is available
> in sysfs.
>
> When detaching PCI devices from host drivers, files
> in sysfs are used. When creating/deleting NPIV adapters
> sysfs is used. Thus could use sysfs file labels for AVC
> checks ?
>
> virConnectPtr: All host level APIs for which there is no other object
> aside from the nebulous concept of the 'host'. APIs are
> all readonly, eg query host capabilities, query free
> memory, CPU stats, etc. What if we gain APIs to make
> write calls.
>
>
> virInterfacePtr: No existing security context. Currently using netcf to
> get data from /etc/sysconfig/network-scripts/ifcfg-XXX
> files, but can't assume those file names since that is
> Fedora/RHEL specific. Might not even use netcf if it
> talks directly to network manager. Does netcf need to
> expose a security label based on the ifcfg-XXX file ?
>
>
> 3. Security labelling config modes
>
> When creating a guest the following XML snippets can be used.
>
> a. Default type, dynamic MCS, automatic relabelling
>
> <seclabel type='selinux' mode='dynamic' relabel='yes'/>
>
>
> b. Custom type, dynamic MCS, automatic relabelling
>
> <seclabel type='selinux' mode='hybrid' relabel='yes'>
> <label>system_u:system_r:mysvirt_t</label>
> <imagelabel>system_u:object_r:mysvirt_image_t</imagelabel>
> </seclabel>
>
Yes this would be cool, although I am not sure you need an image label,
since the MCS separation would still work on svirt_image_t. Would make
policy writing easier and selection easier if you did not change the
type of the image file.
I would at least allow for the admin to not specify a image label.
>
> c. Default type, dynamic MCS, no relabelling
>
> <seclabel type='selinux' mode='dynamic' relabel='no'/>
>
> Does this mode make any sense, since admin doesn't know
> MCS category upfront ? Possibly only useful if the guest
> only has readonly disks.
>
But you don't relabel on readonly correct, since this is a shared
resource. I would say this would not be used.
>
> d. Custom type, dynamic MCS, no relabelling
>
> <seclabel type='selinux' mode='hybrid' relabel='no'>
> <label>system_u:system_r:mysvirt_t</label>
> </seclabel>
>
> Same question about whether it makes sense
>
I don't think this makes sense.
>
> e. Custom type, static MCS, auto relabelling
>
> <seclabel type='selinux' mode='static' relabel='yes'>
> <label>system_u:system_r:mysvirt_t:s0:c123,c456</label>
> <imagelabel>system_u:system_r:mysvirt_image_t:s0:c123,c456</imagelabel>
> </seclabel>
>
>
This is fine, not sure it is legal in MLS world. Although I guess we
could change the label to SystemHigh when not in use.
> f. Custom type, static MCS, no relabelling
>
> <seclabel type='selinux' mode='static' relabel='no'>
> <label>system_u:system_r:mysvirt_t:s0:c123,c456</label>
> </seclabel>
>
>
We have this now, this is static labeling.
> 4. Time at which to apply checks / source context
>
> It would be desirable to restrict the ability to use automatic file
> relabelling within the policy. If a client application defines a
> guest with the 'relabel=yes' attribute set, at what time should this
> usage be validated ?
>
> Validate at the time the guest is defined ? This ensures the app
> defining the guest is suitably privileged, but the file labels
> might be changed by the time the guest starts.
>
> Validate at the time the guest is started ? This minimises the
> window between access check being performed, and libvirtd actually
> performing the relabel operation. The app starting the guest might
> be different from the one defining the guest though ?
>
> Check at both define + start time ?
>
>
Probably most sane.
> What source security context should we use when performing autostart
> of virtual machines ? Normally when starting a VM, the check would be
> performed using the context of the client invoking the start API, but
> there is no such client when autostart occurs.
>
libselinux default.
> Should we instead perform a 'start' operation check whenever the
> 'autostart' flag is turned on by a client ? Or check the autostart
> operation against some generic source context ?
>
>
I think we leave this in the default_context file.
One last thing to think about is since libvirt can now be run under the
users context, in certain situations, libvirt should examine the range
of MLS/MCS labels associated with it and make sure that it can only
assign MCS labels within this range.
For example if I am a user running as
staff_t:s0-s0:c500
libvirt should only pick random labels between 0-500.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
iEYEARECAAYFAk3tIaIACgkQrlYvE4MpobMc3ACfcDqjO+dns9V+zGr1l1h0qbNe
jcsAoMuSheEzYSKWbPd0/9zr+zn6PndG
=SttH
-----END PGP SIGNATURE-----
More information about the libvir-list
mailing list