[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Corey Bryant
coreyb at linux.vnet.ibm.com
Tue Jun 26 13:52:51 UTC 2012
On 06/26/2012 05:10 AM, Daniel P. Berrange wrote:
> On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
>> libvirt's sVirt security driver provides SELinux MAC isolation for
>> Qemu guest processes and their corresponding image files. In other
>> words, sVirt uses SELinux to prevent a QEMU process from opening
>> files that do not belong to it.
>>
>> sVirt provides this support by labeling guests and resources with
>> security labels that are stored in file system extended attributes.
>> Some file systems, such as NFS, do not support the extended
>> attribute security namespace, and therefore cannot support sVirt
>> isolation.
>>
>> A solution to this problem is to provide fd passing support, where
>> libvirt opens files and passes file descriptors to QEMU. This,
>> along with SELinux policy to prevent QEMU from opening files, can
>> provide image file isolation for NFS files stored on the same NFS
>> mount.
>>
>> This patch series adds the pass-fd QMP monitor command, which allows
>> an fd to be passed via SCM_RIGHTS, and returns the received file
>> descriptor. Support is also added to the block layer to allow QEMU
>> to dup the fd when the filename is of the /dev/fd/X format. This
>> is useful if MAC policy prevents QEMU from opening specific types
>> of files.
>
> I was thinking about some of the sources complexity when using
> FD passing from libvirt and wanted to raise one idea for discussion
> before we continue.
>
> With this proposed series, we have usage akin to:
>
> 1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's
> view of the FD
> 2. drive_add file=/dev/fd/N
> 3. if failure:
> close_fd "/dev/fd/N"
>
> My problem is that none of this FD passing is "transactional".
> If libvirtd crashes or otherwise fails between steps 1 & 2,
> a FD is left open in QEMU. If libvirtd gets the failure
> detection wrong in step 2 (eg sees a I/O failure on the monitor,
> but from QEMU's pov drive_add succeeed), we could end up
> telling QEMU to close an FD that it is still using for a
> drive. Likewise if libvirtd fails/crashes between steps 2 & 3
> we might not clean up after failure.
I see what you're saying, but if libvirt crashes it seems like there are
bigger issues going on than a leaked fd. If libvirt fails, then it
should call closefd to prevent leakage.
I don't know if it really buys that much of an advantage though. I
think one major advantage to having separate commands is that other
commands can use an fd passed by pass-fd, not just drive_add.
>
> These aren't new problems with pass_fd - they existed with
> getfd too of course.
>
> If we were designing this interface with no regard for the
> historical practice in QEMU, then I feel like we would not
> even bother to have either 'pass_fd' or 'getfd'. We would
> pass the FD(s) directly with the "drive_add" command.
>
> Given that we have decided that attaching special semantics
> to filenames matching "/dev/fd/N" is OK, then I feel we could
> go one better, and allow the FD to be passed with the "drive_add"
> (or other) commands directly. All we need do is define slightly
> different semantics for "/dev/fd/N". Instead of it meaning
> "use the process FD numbered N", we can define it to mean
> "use the n'th FD set in the current context". The "context"
> would be populated with all FDs received with the monitor
> current command.
>
> So now from a client's POV you'd have a flow like
>
> * drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
>
> And in QEMU you'd have something like
>
> * handle_monitor_command
> - recvmsg all FDs, and stash them in a thread local "FDContext"
> context
> - invoke monitor command handler
> - Sees file=/dev/fd/N
> - Fetch /dev/fd/N from "FDContext"
> - If success remove /dev/fd/N from "FDContext"
> - close() all FDs left in "FDContext"
>
> The key point with this is that because the FDs are directly
> associated with a monitor command, QEMU can /guarantee/ that
> FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the
set of fds?
--
Regards,
Corey
More information about the libvir-list
mailing list