Re: [libvirt] [ANNOUNCE][RFC] sVirt: Integrating SELinux and Linux-based virtualization

On Mon, Aug 11, 2008 at 12:17:48PM +1000, James Morris wrote:
> 4.  Design Considerations
>     4.1  Consensus in preliminary discussion appears to be that adding
>          MAC to libvirt will be the most effective approach.  Support
>          may then be extended to virsh, virt-manager, oVirt etc.

I can see a couple of immediate items to address in the libvirt space

 - Need to decide how to ensure the VM is run with the correct 
   security label instead of the default virt_t.

   Cannot assume that all VMs have disks configured. Some VMs may
   be PXE boot, and use an NFS/iSCSI root filesystem - this is not
   visible to the host. Implication is that we can't rely on labelling
   of disks files  to infer the VM's security context.

   This suggests the domain XML format needs to allow for a security
   context to be specified at time the VM is defined/created in libvirt.
   libvirt would have to takes steps to make sure the VM is started with
   this defined context. An approach of including context in the XML
   would also allow easy extension to Xen XSM framework in future
   where you specify a context at time of VM creation, which is passed
   to the hypervisor.

 - The storage XML format can already report what label a storage
   volume currently has. In addition we need to be able to set the

   A few complications...

      - We may need to set it in several places - ie a VM may be assigned
        a disk based on a stable path such as 


        Which is a symlink to the real (unstable) device name


        Clearly need to set label on the real device, but may also ned
        to change the symlink too ?

      - We can't add the new label to the SELinux policy directly, 
        because the label needs to be on the unstable device name 
        /dev/sdaXXX which can change across host OS reboots.

        Do we instead add the info the udev rules, so when /dev is
        populated at boot time by udev the device nodes get the desired
        initial labelling ?  Or do we manually  chcon() the device
        at the time we boot the VM ?

      - Some storage types don't allow per-file labelling - eg NFS
        In those scenarios the storage pool is assigned a label and
        all volumes inherit it. So, if two VMs are using NFS files 
        and need different  labelling, they need to use different
        directories on the NFS server, so that we can have separate
        mount points with appropriate labelling for each. 

>     4.2  Initially, sVirt should "just work" as a means to isolate VMs,
>          with minimal administrative interaction.  e.g. an option is
>          added to virt-manager which allows a VM to be designated as
>          "isolated", and from then on, it is automatically run in a
>          separate security context, with policy etc. being generated
>          and managed by libvirt.
>     4.3  We need to consider very carefully exactly how VMs will be
>          launched and controlled e.g. overall security ultimately must
>          be enforced by the host kernel, even though VM launch will be
>          initially controlled by host userspace.
>     4.4  We need to consider overall management of labeling both
>          locally and in distributed environments (e.g. oVirt), as well
>          as situations where VMs may be migrated between systems,
>          backed up etc.

We need to define who/what is responsible for ensuring that all hosts
in the cluster have the same policy loaded. Typically libvirt only
aims to provide the mechanism, and not constrain what you do with it.
So perhaps libvirt needs to merely be able to report on what policy
version is loaded as part of host capabilities information.

oVirt (or FreeIPA?) would be responsible for using this info, and also
ensuring that all hosts have same policy if desired/required.

>          One possible approach may be to represent the security label
>          as the UUID of the guest and then translate that to locally
>          meaningful values as needed.

This implies there needs to be some lookup table of UUID -> security
label mappings on every host in the cluster. This needs to be updated 
whenever a new VM is created, which is a fairly significant data sync
task someone/thing needs to take care of. Would be doable for oVirt or
FreeIPA, since they have a network-wide view. virt-manager though has
individual host-centric view of things - it doesn't consider the broader

>     4.5  MAC controls/policy will need to be considered for any control
>          planes (e.g. /dev/kvm).

I should probably point out that there are in fact two ways in which
KVM/QEMU can be used on a host

  - The 'system' instance. There is one of these per host, and it
    currently runs as a privileged user (ie root)

  - The 'session' instance. There is one of these per user, per host
    and it runs as the unprivileged user.

The session instances can only utilize KVM acceleration if the host admin
has given then appropriate group/ACL membership to access /dev/kvm. Likewise
they can only access physical devices if they have neccessary grou/ACL
membership for the device. Network access is SLIRP based unless the admin
has pre-created TUN devices & given them access. 

I imagine that for this work we'll primarily target the 'system' instance
and anything that happens to work for the 'session' instances can just be
considered a free bonus

>     4.10 {lib}semanage needs performance optimization work to reduce
>          impact on the virt toolchain.

Specifically in libvirt we need to avoid a dependancy on python. For oVirt
we have a requirement that the operating system for a 'managed node' (ie 
the host running VMs) can be built into a Live CD / PXE bootable image
that is < 64 MB in size. So any new dependancies from libvirt are very
sensitive in terms of on disk footprint. 

