[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Corey Bryant coreyb at linux.vnet.ibm.com
Mon Jul 2 22:02:15 UTC 2012

On 06/26/2012 06:54 PM, Eric Blake wrote:
> On 06/26/2012 04:28 PM, Corey Bryant wrote:
>>>>> With this proposed series, we have usage akin to:
>>>>>     1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's
>>>>>        view of the FD
>>>>>     2. drive_add file=/dev/fd/N
>>>>>     3. if failure:
>>>>>          close_fd "/dev/fd/N"
>>>> In fact, there are more steps:
>>>> 4. use it successfully
>>>> 5. close_fd "/dev/fd/N"
>>>> I think it would well be possible that qemu just closes the fd when it's
>>>> not used internally any more.
>>> pass-fd could have a flag indicating qemu to do that.
>> It seems like libvirt would be in a better position to understand when a
>> file is no longer in use, and then it can call close_fd.  No?  Of course
>> the the only fd that needs to be closed is the originally passed fd. The
>> dup'd fd's are closed by QEMU.
> The more we discuss this, the more I'm convinced that commands like
> 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR
> will need to have an optional argument that says the name of the file to
> reopen.  That is, I've seen three proposals:
> Proposal One: enhance every command to take an fd:name protocol.
> PRO: Successful use of a name removes the fd from the 'getfd' list.
> CON: Lots of commands to change.
> CON: The command sequence is not atomic.
> CON: Block layer does not have a file name tied to the fd, so the reopen
> operation also needs to learn the fd:name protocol, but since we're
> already touching the command it's not much more work.
> 1. getfd FDSET={M}, ties new fd to "name"
> 2. drive_add fd=name - looks up the fd by name from the list
> 3a. on success, fd is no longer in the list, drive_add consumed it
> 3b. on failure, libvirt must call 'closefd name' to avoid a leak
> 4. getfd FDSET={M2}, ties new fd to "newname"
> 5. block-commit fd=newname - looks up fd by name from the list
> 6a. on success, fd is no longer in the list, block-commit consumed it
> 6b. on failure, libvirt must call 'closefd newname' to avoid a leak
> Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then
> update that name to the new fd in advance of any operation that needs to
> do a reopen.
> PRO: All other commands work without impact by using qemu_open(), by
> getting a clone of the permanent name.
> CON: The permanent name must remain open as long as qemu might need to
> reopen it, and reopening needs the pass-fd force option.
> CON: The command sequence is not atomic.
> 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N")
> showing QEMU's permanent name of the FD
> 2. drive_add file=<permanent name> means that qemu_open() will clone the
> fd, and drive_add is now using yet another FD while N remains open;
> meanwhile, the block layer knows that the drive is tied to the permanent
> name
> 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and
> still returns /dev/fd/N
> 4. block-commit - looks up the block-device name (/dev/fd/N), which maps
> back to the permanent fd, and gets a copy of the newly passed M2
> 5. on completion (success or failure), libvirt must call 'closefd name'
> to avoid a leak
> Proposal Three: Use thread-local fds passed alongside any command,
> rather than passing via a separate command
> PRO: All commands can now atomically handle fd passing
> PRO: Commands that only need a one-shot open can now use fd
> CON: Commands that need to reopen still need modification to allow a
> name override during the reopen
> 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd
> is used as the block file, on failure it is atomically closed, so there
> is no leak either way. However, the block file has no permanent name.
> 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file
> override must be passed again (since we have no guarantee that the
> /dev/fdlist/1 of _this_ command matches the command-local naming used in
> the previous command), but the fd is again used atomically
> Under proposal 3, there is no need to dup fd's, merely only to check
> that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd
> received via FDSET.  And the fact that things are made atomic means that
> libvirt no longer has to worry about calling closefd, nor does it have
> to worry about being interrupted between two monitor commands and not
> knowing what state qemu is in.  But it does mean teaching every possible
> command that wants to do a reopen to learn how to use fd overrides
> rather than to reuse the permanent name that was originally in place on
> the first open.

Here's another option that Kevin and I discussed today on IRC.  I've 
modified a few minor details since the discussion. And Kevin please 
correct me if anything is wrong.

Proposal Four: Pass a set of fds via 'pass-fds'.  The group of fds 
should all refer to the same file, but may have different access flags 
(ie. O_RDWR, O_RDONLY).  qemu_open can then dup the fd that has the 
matching access mode flags.
     { 'command': 'pass-fds',
       'data': {'fdsetname': 'str', '*close': 'bool'},
       'returns': 'string' }
     { 'command': 'close-fds',
       'data': {'fdsetname': 'str' }
      @fdsetname - the file descriptor set name
      @close - *if true QEMU closes the monitor fds when they expire.
               if false, fds remain on the monitor until close-fds
PRO: Supports reopen
PRO: All other commands work without impact by using qemu_open()
PRO: No fd leakage if close==true specified
CON: Determining when to close fds when close==true may be tricky
1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to
the passed set of fds.
2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the
list that has access flags matching the qemu_open() action flags.
3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first
fd from the list that has access flags matching the qemu_open() action 
4. The monitor fds are closed:
    A) *If @close == true, qemu closes the set of fds when the timer
    B) If @close == false, libvirt must issue close-fds command to close
       the set of fds

*How to solve this has yet to be determined.  Kevin mentioned 
potentially using a bottom-half or a timer in qemu_close().


More information about the libvir-list mailing list