[libvirt] RFC: sVirt disk isolation with network based storage

Thu Aug 21 14:48:05 UTC 2014

On 08/20/2014 11:17 AM, Daniel P. Berrange wrote:
> As everyone knows sVirt is our nice solution to isolating guest resources
> from other (malicious) guests through SELinux labelling of the appropriate
> files / device nodes. This has been pretty effective since we introduced
> it to libvirt.
>
> In the last year or two, particularly in the cloud arena, there has been
> a big shift towards use of network based storage. Initially we were relying
> on kernel drivers / FUSE layers that exposed this network storage as devices
> or nodes in the host filesystem, so sVirt still stood a chance of being
> useful if the devices /FUSE layer supported labelling.
>
> Now though QEMU has native support for talking to gluster, ceph/rbd,
> iscsi and even nfs servers. This support is increasingly used in preference
> to using the kernel drivers / FUSE layers since it provides a simpler and
> thus (in theory) better performing I/O path for the network storage and
> does not require any privileged setup tasks on the host ahead of time.
>
> The problem is that I beleive this is blowing a decent sized hole in our
> sVirt isolation story.
>
> eg when we launch QEMU with an argument like this:
>
>   -drive 'file=rbd:pool/image:auth_supported=none:\
>     mon_host=mon1.example.org\:6321\;mon2.example.org\:6322\;\
>     mon3.example.org\:6322,if=virtio,format=raw' 
>
> We are trusting QEMU to only ever access the disk volume 'pool/image'.
> There are, in all likelihood, many 100's or 1000's of disk images on the
> server it is connecting to and nothing is stopping QEMU from accessing
> any of them AFAICT.
>
> There is no currrently implemented mechanism by which the sVirt label
> that QEMU runs under is made available to the remote RBD server to use
> for enforcement, nor any way in which libvirt could tell the RBD server
> which label was applied for which disk. The same seems to apply for
> Gluster, iSCSI, and NFS too when accessed directly from a network client
> inside the QEMU process.
>
> As it stands the only approach I see for isolating each virtual machines
> disk(s) from other virtual machines is to make use of user authentication
> with these services. eg each virtual machine would need to have its own
> dedicated user account on the RBD/Gluster/iSCSI/NFS server, and the disk
> volumes for the VM would have to be made accessible solely to that user
> account. Assuming such user account / disk mapping exists in the servers
> today that can be made to work but it is an incredibly awful solution
> to deal with when VMs are being dynamically created & deleted very
> frequently.
>
> Today apps like OpenStack just have a single RBD username and password
> for everything they do. Any virtual machines running with RBD storage
> on OpenStack thus have no sVirt protection for their disk images AFAICT.
> To protect images OpenStack would have to dynamically create & delete
> new user accounts on the RBD server & setup disk access for them. I
> don't see that kind of approach being viable.
>
> IIUC, there is some mechanism at the IP stack level where the kernel
> can take the SELinux label of the process that establishes the network
> connection and pass it across to the server. If there was a way in the
> RBD API for libvirt to label the volumes, then potentially we could
> have a system where the RBD server did sVirt enforcement, based on the
> instructions from libvirt & the label of the client process. 
>
> Thoughts on what to do about this ?  Network based storage, where the
> network client is inside each QEMU server, is here to stay so I don't
> think we can ignore the problem long term.
>
> Regards,
> Daniel
I think we should setup a meeting to discuss this and figure out our option.

We need a mechanism for libvirt to send the labels of the process and
images to the remote server and then
we need an enforcement mechanism to only allow the process label to
interact with the file image.  SELinux could
do this if each vm has a separate process running on the server
interacting with the image.  Otherwise the server needs to do some kind
of enforcement on its own.

We could use some form of labeled networking for transmitting the MCS
Label of qemu to the server or we would need to extend the protocol to
send the label down.

There is two ways to handle labeled networking.The most common labeling
standard,CIPSO, only sends the MCS portion of the label.  The second
form can send the entire label of the process, but it is seldom used and
requires Labeled IPSEC.