[libvirt] RFC: extending sVirt to confine host apps which talk to libvirtd

Mon Jun 6 18:51:15 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/06/2011 10:41 AM, Daniel P. Berrange wrote:
> What follows is a document outlining some thoughts I've been having
> on extending sVirt to allow confinement of applications which talk
> to libvirtd on the host, primarily focusing on use of SELinux, but
> also allowing a simple non-SElinux RBAC mechanism.
> 
>    Securing KVM virtualization hosts with MAC
>    ==========================================
> 
> This document looks at the task of securing KVM virtualizaton
> hosts using mandatory access control technologies, with focus
> on SELinux. At the time of writing there have been two phases
> of development, and this document makes proposals for a third
> phase.
> 
> Phase 1: circa 2006
> -------------------
> 
> Goal: Protect the host from a compromised virtual machine.
> 
> The first phase of development had the modest goal of
> protecting the host from attack by a compromised virtual
> machine. To achieve this, the KVM processes are configured
> such that they will run under a confined security context
> ('virt_t' in the SELinux reference policy), which blocks
> access to any host resources not labelled ('virt_image_t')
> for use by virtual machines.
> 
> The primary limitations of this initial implementation
> is that while the virtual host is secured, there is no
> protection between virtual machines. This can be considered
> a regression in isolation as compared to that offered by
> non-virtualized hosts. The second limitation is that the
> virtualization admin has to take care to ensure the host
> resources intended for use by the virtual machines are
> correctly labelled. This is a manual setup taks unless
> the images are kept in a preset location (/var/lib/libvirt/images
> in the SELinux reference policy).
> 
> 
> 
> Phase 2: March 2009
> -------------------
> 
> Goal: Protect virtual machines from each other
> 
> The second phase of development has the goal of providing
> isolation between virtual machines that is comparable to
> that achieved between physical machines. This piece of
> work is commonly referred to as "svirt". The achieve this,
> the KVM processes are each configured to run under a
> dedicated security context, which blocks access to any
> resources not explicitly assigned to that virtual machine.
> In the SELinux implementation, the base context "svirt_t"
> has a unique MCS category ("c240,c955") appended to form
> a unique security context "system_u:system_r:svirt_t:s0:c240,c955".
> For each host resource to be assigned to the virtual machine,
> the base context "svirt_image_t" is combined with the same
> MCS category to form a unique resource security context
> "system_u:object_r:svirt_image_t:s0:c240,c955".
> 
> The assignment of virtual machine security contexts and
> labelling of resources can be done statically by the
> administrator / management application, or dynamically
> by the libvirtd daemon. The latter removes much of the
> administrator burden.
> 
> The second phase has addressed the major guest security
> limitation of the first phase, and eased the burden placed
> on host administors. Attention can now focus on the security
> of the host management software stack. Client applications
> communicate with the libvirtd daemon using a simple sockets
> based RPC protocol. Thus operations initiated by client
> applications which run under one security context are in
> fact invoked under the libvirtd daemon's security context.
> Since the libvirtd daemon is a highly privileged, almost
> unconfined process, this provides a means for applications
> to elevate their privileges.
> 
> A second problem with the current model is seen when looking
> at guest migration between hosts. During migration, there
> are two QEMU processes running for the same virtual machine,
> one process on each host. The dynamic assignment of MCS
> values to form unique security contexts is done on a per host
> basis, so there is no guarantee that the VM on host A will be
> using (or be able to use) the same security context on the
> target host of migration. This is not neccessarily a problem
> if the guest is using block devices, since block device inode
> labels are only visible to a single host. With a shared
> filesystem that supports SELinux labelling, like GFS2, both
> QEMU processes must run in the same security context to allow
> them both to access the associated files.
> 
> 
> Phase 3: June 2011
> ------------------
> 
> Goal: Protect virtual machines from host applications
> 
> The third phase of development has the primary goal of
> honouring the confinement of client applications talking
> to libvirtd, when performing operations on virtual machines
> and other managed objects (storage pools, host devices,
> virtual networks, secrets, etc). Every application connecting
> to libvirt has an associated security context. Every object
> managed by libvirtd will have an associated security context.
> When an operation is invoked via a libvirt API the client
> application security context will be checked against the
> target object context, before proceeding. Thus applications
> will not be able to make use of a libvirtd connection to
> perform operations that are otherwise blocked.
> 
> The secondary goal is to add further flexibility and safety
> to the way MCS categories are assigned, and files are relabelled.
> Instead of maintaining a local database of assigned labels, there
> must be some shared storage where label usage can be recorded.
> At its simplest this can be an NFS share, with one file per MCS
> category and locking with fcntl(). An alternative would to be
> acquire leases using a lock manager such as sanlock. In addition,
> the guest configuration will be enhanced such that a guest can
> be assigned a statically chosen security context, but still make
> use of dynamic relabelling of resources. Finally the existing
> boolean mode of 'static' vs 'dynmamic' label generation will be
> turned into a tri-state, introducing a 'hybrid' mode where the
> client supplies a custom base context, and the MCS part is still
> auto-generated.
> 
> 
> Usage scenarios
> ---------------
> 
> To aid in development a couple of relevant core use cases
> or usage scenarios have been identified:
> 
> 1. A virtual machine monitoring application
> 
> For this example, consider the simple monitoring application
> 'virt-top'. This application displays a list of all virtual
> machines on the host and their associated resource utilization
> (CPU, disk, network). This application has no need to be able
> to stop/start/define virtual machines, nor do any operation
> related to host devices, storage, or networking. Traditionally
> this application is written to use a read only libvirt connection.
> 
> With enhanced access control from libvirtd, the policy would define
> a new security context 'virt_top_t' for the 'virt-top' application.
> This policy would allow 'list', 'read', 'readstats' on the 'domain'
> object type.
> 
> 
> 2. A multi-guest, multi-user MLS enabled host
> 
> For this example, consider a virtualizaton host with MLS policy
> that is running multiple virtual machines, for a variety of
> different users. A user with the security level "restricted"
> must not be allowed to control virtual machines with a security
> level of "confidential". Conversely a user with security level
> "secret" must not be allowed to create virtual machines with a
> security level of "unclassified".
> 
> With enhanced access control from libvirtd, getpeercon() would
> provide the security context of the client application (user).
> The client context would be used to perform an AVC when any API
> operation is invoked, thus ensuring that the client's MLS
> label is honoured in access control checks. The effect would be
> that when an 'restricted' user asked for a list of virtual machines
> only virtual machines at level 'restricted' or below would be
> returned. Or when a "secret" user asked to start a guest when
> a security level of 'unclassified', the operation would be denied.
> 
> 
> 3. Identity transitions from trusted agents
> 
> For this example, consider a trusted agent such as libvirt-qpid,
> or libvirt-snmp, which translates the libvirt API from its native
> model, into an alternate access model. In such an example, the
> agent talking to libvirtd will have authenticated itself. The
> peer identity that libvirtd sees, however, is that of the agent,
> not the ultimate (end-user) client. In such a case it will desirable
> to allow a trusted agent to transition to a different identity when
> performing operations.
> 
> An end user running under context "unconfined_u:unconfined_r:virt_top_t:s0-s0:c0.c1023"
> may talk to the libvirt-qpid agent which runs under the context
> "system_u:system_r:virt_qpid_t:s0-s0:c0.c1023". The libvirt-qpid
> connects to libvirtd which sees 'virt_qpid_t' as the client type.
> The policy is written to allow transitions from 'virt_qpid_t' to
> the 'virt_top_t' type, so when the virt-top client connects to
> libvirt-qpid, it changes its identity to 'virt_top_t'. From that
> point onwards, all AVC checks honour the privileges of the ultimate
> end user application, rather than the libvirt-qpid intermediary.
> The same mechanism also ensures that the client application MLS
> level is transferred via the libvirt-qpid agent to libvirtd.
> 
> 
> Anticipated Development tasks
> -----------------------------
> 
> 1. Extend the domain XML to add a third attribute to the <seclabel>
>    element relabel="yes|no", to control whether libvirtd will
>    automatically label resources assigned to a guest. If the
>    existing 'mode' attribute is "dynamic", then relabelling will
>    default to enabled, while if it is 'static', then relabelling
>    will default to disabled. Also change 'mode' to allow a new
>    'hybrid' value.
> 
> 2. Determine how to maintain/identify security labels for other
>    managed objects, including virStoragePoolPtr, virStorageVolPtr,
>    virSecretPtr, virNetworkPtr, virInterfacePtr, virNodeDevicePtr,
>    an host level APIs without any explicit managed object.
> 
> 3. Extend XML for non-domain objects to implant security labels
>    as identified in step 2.
> 
> 4. Create an internal virIdentity struct to store the identity
>    of the client. This will include at least the x509 distinguished
>    name, the SASL username, the SELinux context (getpeercon())
>    and UNIX username/group (SCM_CREDENTIALS).
> 
> 5. Create a new public API to allow a client application to
>    supply a new identity, allowing them to pass a new x509
>    distinguished name, SASL username, SELinux context and
>    UNIX username/group.
> 
> 6. Extend the libvirtd daemon such that the current identity
>    is stored in a thread local whenever invoking a public
>    API operation.
> 
> 7. Extend the QEMU driver such that a suitable identity is
>    set when performing autonomous background operations
>    such as domain auto-start and core dump, in a non-API
>    thread.
> 
> 8. Create a set of internal access control helper APIs in
>    $libvirt/src/accesscontrol/. There will be one API for each
>    managed object, talking an object pointer, and an operation
>    identifier (from an enum).
> 
> 9. Create a simple impl of the access control APIs which defines
>    roles for groups of user identities, and grants privileges to
>    each role based on the operation names. This allows for simple
>    testing of internal infrastructure, and an RBAC mechanism for
>    users who lack SELinux in their OS.
> 
> 10. Implant access control checks into the main codepaths of every
>     driver method implementations in the QEMU driver.
> 
> 11. Change the SELinux reference policy to define the new security
>     types and access vectors for the libvirt objects & associated
>     API calls.
> 
> 12. Create a SELinux impl of the access control APIs which invokes
>     avc_has_perm() using the client's SELinux context. This is
>     intended to be the primary RBAC mechanism for Fedora/RHEL
>     virtualization hosts.
> 
> 13. Write policy to confine targetted applications like virt-top,
>     virt-mem.
> 
> 14. Extend libvirt-snmp, libvirt-cim, libvirt-qpid to pass through
>     the client identity to libvirtd.
> 
> 
> Technical Notes / Issues
> ------------------------
> 
> 1. Adding new SELinux security classes / access vectors
> 
> The selinux security classes are defined in /usr/include/selinux/flask.h
> and access vectors in /usr/include/selinux/av_permissions.h Both of these
> files are automatically by a script in the selinux reference policy code
> '$serefpolicy/policy/flask/flask.py'. The master data files are in the
> same directory, 'access_vectors' and 'security_classes'. Once generated,
> the headers need to be manually copied into the libselinux package
> sources.
> 
You do  not need to do this anymore.  libselinux does not care about the
access vectors, they are named in your application.Well
> 
> APIs are added to libvirt on a very frequent basis. What is the process
> for applying access control to them if the SELinux policy does not yet
> have a suitable access vector / security class defined ? Do we need a
> generic 'admin' access vector we can use as catch all, until more
> specific vectors can be defined for the new APIs. Desirable to avoid
> having to lock-step upgrade libvirt with selinux policy for all additions
> to the libvirt public API.
> 
Well one benefit would be unconfined_t, although I am not sure it would
have access.
> 
> 2. Security contexts for libvirt managed objects
> 
> virDomainPtr: Already embedded in XML, unless using dynamic labelling
>               in which case context is assigned at startup.
> 
> virNetworkPtr: No existing security context, nor any object on disk
>                that could be used. Follow example of domains and embed
>                <seclabel> in the XML. Assign unique MCS category per
>                network and ensure that daemons launched per network
>                (dnsmasq, radvd) inherit the MCS category.
> 
> virSecretPtr: No existing security context. Secrets may be associated
>               with disk paths for VMs. Could copy the security context
>               of the guests and apply it to the secret, or have a
>               dedicated type svirt_secret_t and just copy the MCS
>               category. Hard to make it work for guests with dynamic
>               MCS assignment.
> 
> virStoragePoolPtr: No existing security context. Some pool types have
>                    objects existing on the host filesystem eg SCSI
>                    HBAs have a directory in sysfs, filesystem dirs
>                    have a directory somewhere, LVM has directory
>                    for the volume group in /dev. Other pool types have
>                    no object on disk anywhere convenient. eg Sheepdog.
>                    Other pool types only have an object on disk when
>                    the pool is active (eg iSCSI, NFS). So there is
>                    nothing to use for API checks when the pool is
>                    inactive.
> 
>                    Likely have to ignore whatever associated resource
>                    is on disk and just store a security context in the
>                    XML config as with virDomainPtr/virNetworkPtr.
> 
> 
> virStorageVolPtr: Currently reports the SELinux security label associated
>                   with the file on disk. Not all pool types neccessarily
>                   have volumes with a corresponding file on disks (eg
>                   Sheepdog).
> 
> virNodeDevicePtr: No existing security context. Most data comes from udev
>                   or HAL databases, though ultimately much is available
>                   in sysfs.
> 
>                   When detaching PCI devices from host drivers, files
>                   in sysfs are used. When creating/deleting NPIV adapters
>                   sysfs is used. Thus could use sysfs file labels for AVC
>                   checks ?
> 
> virConnectPtr: All host level APIs for which there is no other object
>                aside from the nebulous concept of the 'host'. APIs are
>                all readonly, eg query host capabilities, query free
>                memory, CPU stats, etc. What if we gain APIs to make
>                write calls.
> 
> 
> virInterfacePtr: No existing security context. Currently using netcf to
>                  get data from /etc/sysconfig/network-scripts/ifcfg-XXX
>                  files, but can't assume those file names since that is
>                  Fedora/RHEL specific. Might not even use netcf if it
>                  talks directly to network manager. Does netcf need to
>                  expose a security label based on the ifcfg-XXX file ?
> 
> 
> 3. Security labelling config modes
> 
> When creating a guest the following XML snippets can be used.
> 
>   a. Default type, dynamic MCS, automatic relabelling
> 
>      <seclabel type='selinux' mode='dynamic' relabel='yes'/>
> 
> 
>   b. Custom type, dynamic MCS, automatic relabelling
> 
>      <seclabel type='selinux' mode='hybrid' relabel='yes'>
>         <label>system_u:system_r:mysvirt_t</label>
>         <imagelabel>system_u:object_r:mysvirt_image_t</imagelabel>
>      </seclabel>
> 
Yes this would be cool, although I am not sure you need an image label,
since the MCS separation would still work on svirt_image_t.  Would make
policy writing easier and selection easier if you did not change the
type of the image file.

I would at least allow for the admin to not specify a image label.

> 
>   c. Default type, dynamic MCS, no relabelling
> 
>      <seclabel type='selinux' mode='dynamic' relabel='no'/>
> 
>      Does this mode make any sense, since admin doesn't know
>      MCS category upfront ? Possibly only useful if the guest
>      only has readonly disks.
> 
But you don't relabel on readonly correct, since this is a shared
resource.  I would say this would not be used.
> 
>   d. Custom type, dynamic MCS, no relabelling
> 
>      <seclabel type='selinux' mode='hybrid' relabel='no'>
>         <label>system_u:system_r:mysvirt_t</label>
>      </seclabel>
> 
>      Same question about whether it makes sense
> 
I don't think this makes sense.

> 
>   e. Custom type, static MCS, auto relabelling
> 
>      <seclabel type='selinux' mode='static' relabel='yes'>
>         <label>system_u:system_r:mysvirt_t:s0:c123,c456</label>
>         <imagelabel>system_u:system_r:mysvirt_image_t:s0:c123,c456</imagelabel>
>      </seclabel>
> 
> 
This is fine, not sure it is legal in MLS  world.  Although I guess we
could change the label to SystemHigh when not in use.

>   f. Custom type, static MCS, no relabelling
> 
>      <seclabel type='selinux' mode='static' relabel='no'>
>         <label>system_u:system_r:mysvirt_t:s0:c123,c456</label>
>      </seclabel>
> 
> 
We have this now, this is static labeling.
> 4. Time at which to apply checks / source context
> 
> It would be desirable to restrict the ability to use automatic file
> relabelling within the policy. If a client application defines a
> guest with the 'relabel=yes' attribute set, at what time should this
> usage be validated ?
> 
> Validate at the time the guest is defined ? This ensures the app
> defining the guest is suitably privileged, but the file labels
> might be changed by the time the guest starts.
> 
> Validate at the time the guest is started ? This minimises the
> window between access check being performed, and libvirtd actually
> performing the relabel operation. The app starting the guest might
> be different from the one defining the guest though ?
> 
> Check at both define + start time ?
> 
> 
Probably most sane.
> What source security context should we use when performing autostart
> of virtual machines ? Normally when starting a VM, the check would be
> performed using the context of the client invoking the start API, but
> there is no such client when autostart occurs.
> 
libselinux default.
> Should we instead perform a 'start' operation check whenever the
> 'autostart' flag is turned on by a client ?  Or check the autostart
> operation against some generic source context ?
> 
> 
I think we leave this in the default_context file.

One last thing to think about is since libvirt can now be run under the
users context, in certain situations, libvirt should examine the range
of MLS/MCS labels associated with it and make sure that it can only
assign MCS labels within this range.

For example if I am a user running as

staff_t:s0-s0:c500

libvirt should only pick random labels between 0-500.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk3tIaIACgkQrlYvE4MpobMc3ACfcDqjO+dns9V+zGr1l1h0qbNe
jcsAoMuSheEzYSKWbPd0/9zr+zn6PndG
=SttH
-----END PGP SIGNATURE-----