[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Tue Jun 26 13:52:51 UTC 2012

On 06/26/2012 05:10 AM, Daniel P. Berrange wrote:
> On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
>> libvirt's sVirt security driver provides SELinux MAC isolation for
>> Qemu guest processes and their corresponding image files.  In other
>> words, sVirt uses SELinux to prevent a QEMU process from opening
>> files that do not belong to it.
>>
>> sVirt provides this support by labeling guests and resources with
>> security labels that are stored in file system extended attributes.
>> Some file systems, such as NFS, do not support the extended
>> attribute security namespace, and therefore cannot support sVirt
>> isolation.
>>
>> A solution to this problem is to provide fd passing support, where
>> libvirt opens files and passes file descriptors to QEMU.  This,
>> along with SELinux policy to prevent QEMU from opening files, can
>> provide image file isolation for NFS files stored on the same NFS
>> mount.
>>
>> This patch series adds the pass-fd QMP monitor command, which allows
>> an fd to be passed via SCM_RIGHTS, and returns the received file
>> descriptor.  Support is also added to the block layer to allow QEMU
>> to dup the fd when the filename is of the /dev/fd/X format.  This
>> is useful if MAC policy prevents QEMU from opening specific types
>> of files.
>
> I was thinking about some of the sources complexity when using
> FD passing from libvirt and wanted to raise one idea for discussion
> before we continue.
>
> With this proposed series, we have usage akin to:
>
>    1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's
>       view of the FD
>    2. drive_add file=/dev/fd/N
>    3. if failure:
>         close_fd "/dev/fd/N"
>
> My problem is that none of this FD passing is "transactional".
> If libvirtd crashes or otherwise fails between steps 1 & 2,
> a FD is left open in QEMU.  If libvirtd gets the failure
> detection wrong in step 2 (eg sees a I/O failure on the monitor,
> but from QEMU's pov drive_add succeeed), we could end up
> telling QEMU to close an FD that it is still using for a
> drive. Likewise if libvirtd fails/crashes between steps 2 & 3
> we might not clean up after failure.

I see what you're saying, but if libvirt crashes it seems like there are 
bigger issues going on than a leaked fd.  If libvirt fails, then it 
should call closefd to prevent leakage.

I don't know if it really buys that much of an advantage though.  I 
think one major advantage to having separate commands is that other 
commands can use an fd passed by pass-fd, not just drive_add.

>
> These aren't new problems with pass_fd - they existed with
> getfd too of course.
>
> If we were designing this interface with no regard for the
> historical practice in QEMU, then I feel like we would not
> even bother to have either 'pass_fd' or 'getfd'. We would
> pass the FD(s) directly with the "drive_add" command.
>
> Given that we have decided that attaching special semantics
> to filenames matching "/dev/fd/N" is OK, then I feel we could
> go one better, and allow the FD to be passed with the "drive_add"
> (or other) commands directly. All we need do is define slightly
> different semantics for "/dev/fd/N". Instead of it meaning
> "use the process FD numbered N", we can define it to mean
> "use the n'th FD set in the current context". The "context"
> would be populated with all FDs received with the monitor
> current command.
>
> So now from a client's POV you'd have a flow like
>
>     * drive_add "file=/dev/fd/N"  FDSET={N}

IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?

>
> And in QEMU you'd have something like
>
>     * handle_monitor_command
>          - recvmsg all FDs, and stash them in a thread local "FDContext"
>            context
>          - invoke monitor command handler
>                - Sees file=/dev/fd/N
>                - Fetch /dev/fd/N from "FDContext"
>                - If success remove /dev/fd/N from "FDContext"
>          - close() all FDs left in "FDContext"
>
> The key point with this is that because the FDs are directly
> associated with a monitor command, QEMU can /guarantee/ that
> FDs are never leaked, regardless of client behaviour.

Wouldn't this leak fds if libvirt crashed part way through sending the 
set of fds?

-- 
Regards,
Corey