[libvirt] RFC: sVirt disk isolation with network based storage

Wed Aug 20 17:24:16 UTC 2014

Adding Paul Moore since he is the labelled networking expert.

On 08/20/2014 11:17 AM, Daniel P. Berrange wrote:
> As everyone knows sVirt is our nice solution to isolating guest resources
> from other (malicious) guests through SELinux labelling of the appropriate
> files / device nodes. This has been pretty effective since we introduced
> it to libvirt.
>
> In the last year or two, particularly in the cloud arena, there has been
> a big shift towards use of network based storage. Initially we were relying
> on kernel drivers / FUSE layers that exposed this network storage as devices
> or nodes in the host filesystem, so sVirt still stood a chance of being
> useful if the devices /FUSE layer supported labelling.
>
> Now though QEMU has native support for talking to gluster, ceph/rbd,
> iscsi and even nfs servers. This support is increasingly used in preference
> to using the kernel drivers / FUSE layers since it provides a simpler and
> thus (in theory) better performing I/O path for the network storage and
> does not require any privileged setup tasks on the host ahead of time.
>
> The problem is that I beleive this is blowing a decent sized hole in our
> sVirt isolation story.
>
> eg when we launch QEMU with an argument like this:
>
>   -drive 'file=rbd:pool/image:auth_supported=none:\
>     mon_host=mon1.example.org\:6321\;mon2.example.org\:6322\;\
>     mon3.example.org\:6322,if=virtio,format=raw' 
>
> We are trusting QEMU to only ever access the disk volume 'pool/image'.
> There are, in all likelihood, many 100's or 1000's of disk images on the
> server it is connecting to and nothing is stopping QEMU from accessing
> any of them AFAICT.
>
> There is no currrently implemented mechanism by which the sVirt label
> that QEMU runs under is made available to the remote RBD server to use
> for enforcement, nor any way in which libvirt could tell the RBD server
> which label was applied for which disk. The same seems to apply for
> Gluster, iSCSI, and NFS too when accessed directly from a network client
> inside the QEMU process.
>
> As it stands the only approach I see for isolating each virtual machines
> disk(s) from other virtual machines is to make use of user authentication
> with these services. eg each virtual machine would need to have its own
> dedicated user account on the RBD/Gluster/iSCSI/NFS server, and the disk
> volumes for the VM would have to be made accessible solely to that user
> account. Assuming such user account / disk mapping exists in the servers
> today that can be made to work but it is an incredibly awful solution
> to deal with when VMs are being dynamically created & deleted very
> frequently.
>
> Today apps like OpenStack just have a single RBD username and password
> for everything they do. Any virtual machines running with RBD storage
> on OpenStack thus have no sVirt protection for their disk images AFAICT.
> To protect images OpenStack would have to dynamically create & delete
> new user accounts on the RBD server & setup disk access for them. I
> don't see that kind of approach being viable.
>
> IIUC, there is some mechanism at the IP stack level where the kernel
> can take the SELinux label of the process that establishes the network
> connection and pass it across to the server. If there was a way in the
> RBD API for libvirt to label the volumes, then potentially we could
> have a system where the RBD server did sVirt enforcement, based on the
> instructions from libvirt & the label of the client process. 
>
> Thoughts on what to do about this ?  Network based storage, where the
> network client is inside each QEMU server, is here to stay so I don't
> think we can ignore the problem long term.
>
> Regards,
> Daniel