[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Corey Bryant coreyb at linux.vnet.ibm.com
Tue Jun 26 18:40:03 UTC 2012



On 06/26/2012 11:37 AM, Corey Bryant wrote:
>
>
> On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
>> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
>>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
>>>>> So now from a client's POV you'd have a flow like
>>>>>
>>>>>     * drive_add "file=/dev/fd/N"  FDSET={N}
>>>>
>>>> IIUC then drive_add would loop and pass each fd in the set via
>>>> SCM_RIGHTS?
>>>
>>> Yes, you'd probably use the JSON to tell QEMU exactly
>>> how many FDs you're intending to pass with the command,
>>> and then once the command is received, you'd have N*SCM_RIGHTS
>>> messages sent/received.
>>>
>>>>>
>>>>> And in QEMU you'd have something like
>>>>>
>>>>>     * handle_monitor_command
>>>>>          - recvmsg all FDs, and stash them in a thread local
>>>>> "FDContext"
>>>>>            context
>>>>>          - invoke monitor command handler
>>>>>                - Sees file=/dev/fd/N
>>>>>                - Fetch /dev/fd/N from "FDContext"
>>>>>                - If success remove /dev/fd/N from "FDContext"
>>>>>          - close() all FDs left in "FDContext"
>>>>>
>>>>> The key point with this is that because the FDs are directly
>>>>> associated with a monitor command, QEMU can /guarantee/ that
>>>>> FDs are never leaked, regardless of client behaviour.
>>>>
>>>> Wouldn't this leak fds if libvirt crashed part way through sending
>>>> the set of fds?
>>>
>>> No, because you've scoped the FDs to the current monitor instance,
>>> and the current command being processed you know to close all FDs
>>> when the associated command has finished, as well as closing them
>>> all if you see EOF on the monitor socket while in the middle of
>>> receiving a command.
>>
>> Here is a quick proof of concept (ie untested) patch to demonstrate
>> what I mean. It relies on Cory's patch which converts everything
>> to use qemu_open. It is also still valuable to make the change
>> to qemu_open() to support "/dev/fd/N" for passing FDs during
>> QEMU initial startup for CLI args.
>>
>> IMHO, what I propose here is preferrable for QMP clients that
>> our current plan of requiring use of 3 monitor commands (passfd,
>> XXXXX, closefd).
>
> Thanks for the PoC.
>
> Two other required updates that I can think of would be:
>
> 1) Update change, block_stream, block_reopen, snapshot_blkdev, and
> perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
>

Nevermind my comment.  I see that your PoC supports passing nfds for any 
QMP command.

I'm curious what Kevin's thoughts are on this and the overall approach.

> 2) Support re-opening files with different access modes (O_RDONLY,
> O_WRONLY, or O_RDWR).  This would be similar to the force option for
> pass-fd.
>

I'm still not quite sure how we'd go about this.  We need a way to 
determine the existing QEMU fd that is to be re-associated with the new 
fd, when using a /dev/fdlist/0 filename.  In this new approach, 0 
corresponds to an index, not an fd.  The prior approach of using the 
/dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy.

>
>>
>>  From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001
>> From: "Daniel P. Berrange" <berrange at redhat.com>
>> Date: Tue, 26 Jun 2012 15:48:13 +0100
>> Subject: [PATCH] Support passing FDs with the monitor command that
>> uses them
>>
>> Add support for a syntax "/dev/fdlist/N", where 'N' refers to
>> the N'th FD that was passed along with the currently processing
>> command in this thread.
>>
>> The QMP client sends
>>
>>          {
>>            "execute": "drive_add",
>>            "arguments": {
>>                 "filename": "/dev/fdlist/0",
>>                 "backingfilename": "/dev/fdlist/1" },
>>            "nfds": 2
>>          }
>>
>> followed by "nfds" * SCM_RIGHTS packets
>>
>> The FDs received along with the "drive_add" command are placed
>> in a thread-local list, then the QMP handler function is invoked.
>> The qemu_open() function will resolve any path "/dev/fdlist/0"
>> to use an FD stored in the  thread-local list, and dup() it.
>> One the QMP handler function is finished, all FDs in the thread
>> local list are closed.
>>
>> THe reason for choice of "/dev/fdlist/N", is to allow the
>> "/dev/fd/N" syntax to be used for FDs that are passed to QEMU
>> at startup time, which will be in the global FD number namespace,
>> not the thread-local index.
>>
>> The reason for using a thread local is to ensure the FDs are
>> only made available to the currently executing monitor command
>> handler in this thread, and not any other threads.
>>
>> The advantage over using a separate 'passfd' command, is that
>> this guarentees all FDs are closed in QEMU once the QMP command
>> handler is finished. There is no risk of leaks being triggered
>> by the client either intentionally, or via it crashing.
>>
>> NB: this is a proof of concept demonstration of the overall
>> architectural design. The patch would need more work before
>> it was suitable for merging
>>
>>   - Pick a better place/file for the fdlist APIs
>>   - Do proper error reporting in various places
>>   - Check for possibility that qemu_chr_fe_get_msgfd
>>     can block or return EAGAIN.
>>
>> diff --git a/monitor.c b/monitor.c
>> index f6107ba..a3a4bd4 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser
>> *parser, QList *tokens)
>>       const mon_cmd_t *cmd;
>>       const char *cmd_name;
>>       Monitor *mon = cur_mon;
>> +    int nfds;
>> +    size_t i;
>>
>>       args = input = NULL;
>>
>> @@ -4535,6 +4537,17 @@ static void
>> handle_qmp_command(JSONMessageParser *parser, QList *tokens)
>>           goto err_out;
>>       }
>>
>> +    nfds = qdict_get_int(input, "nfds");
>> +    for (i = 0 ; i < nfds ; i++) {
>> +    int fd = qemu_chr_fe_get_msgfd(mon->chr);
>> +    if (fd == -1) {
>> +        qerror_report(QERR_FD_NOT_SUPPLIED);
>> +        goto err_out;
>> +    }
>> +
>> +    qemu_fdlist_add(fd);
>> +    }
>> +
>>       if (handler_is_async(cmd)) {
>>           err = qmp_async_cmd_handler(mon, cmd, args);
>>           if (err) {
>> @@ -4552,6 +4565,7 @@ err_out:
>>   out:
>>       QDECREF(input);
>>       QDECREF(args);
>> +    qemu_fdlist_closeall();
>>   }
>>
>>   /**
>> diff --git a/osdep.c b/osdep.c
>> index 03817f0..0849cb6 100644
>> --- a/osdep.c
>> +++ b/osdep.c
>> @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>   #endif
>>   }
>>
>> +typedef struct QEMUFDList {
>> +    int *fds;
>> +    size_t nfds;
>> +} QEMUFDList;
>> +
>> +static pthread_key_t fdlist_key;
>> +
>> +static void qemu_fdlist_cleanup(void *data)
>> +{
>> +    size_t i;
>> +    QEMUFDList *fdlist = data;
>> +
>> +    if (!fdlist)
>> +    return;
>> +
>> +    for (i = 0 ; i < fdlist->nfds ; i++) {
>> +    close(fdlist->fds[i]);
>> +    }
>> +}
>> +
>> +int qemu_fdlist_init(void)
>> +{
>> +    if (pthread_key_create(&fdlist_key,
>> +               qemu_fdlist_cleanup) < 0)
>> +    return -1;
>> +
>> +    return 0;
>> +}
>> +
>> +static QEMUFDList *qemu_fdlist_for_thread(void)
>> +{
>> +    QEMUFDList *fdlist = pthread_getspecific(fdlist_key);
>> +
>> +    if (fdlist == NULL) {
>> +    fdlist = g_new0(QEMUFDList, 1);
>> +    pthread_setspecific(fdlist_key, fdlist);
>> +    }
>> +
>> +    return fdlist;
>> +}
>> +
>> +void qemu_fdlist_add(int fd)
>> +{
>> +    QEMUFDList *fdlist = qemu_fdlist_for_thread();
>> +
>> +    fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1);
>> +    fdlist->fds[fdlist->nfds++] = fd;
>> +}
>> +
>> +int qemu_fdlist_dup(int idx)
>> +{
>> +    QEMUFDList *fdlist = qemu_fdlist_for_thread();
>> +
>> +    if (idx >= fdlist->nfds) {
>> +    errno = EINVAL;
>> +    return -1;
>> +    }
>> +
>> +    return dup(fdlist->fds[idx]);
>> +}
>> +
>> +void qemu_fdlist_closeall(void)
>> +{
>> +    QEMUFDList *fdlist = qemu_fdlist_for_thread();
>> +    size_t i;
>> +
>> +    for (i = 0 ; i < fdlist->nfds ; i++) {
>> +    close(fdlist->fds[i]);
>> +    }
>> +    g_free(fdlist->fds);
>> +    fdlist->fds = NULL;
>> +    fdlist->nfds = 0;
>> +}
>> +
>> +
>>
>>   /*
>>    * Opens a file with FD_CLOEXEC set
>> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...)
>>   {
>>       int ret;
>>       int mode = 0;
>> +    const char *p;
>> +
>> +    /* Attempt dup of fd for pre-opened file */
>> +    if (strstart(name, "/dev/fdlist/", &p)) {
>> +        int idx;
>> +    int fd;
>> +
>> +        idx = qemu_parse_fd(p);
>> +        if (idx == -1) {
>> +            return -1;
>> +        }
>> +
>> +    fd = qemu_fdlist_dup(idx);
>> +    if (fd == -1) {
>> +        return -1;
>> +    }
>> +
>> +    /* XXX verify flags match */
>> +    return fd;
>> +    }
>>
>>       if (flags & O_CREAT) {
>>           va_list ap;
>> diff --git a/qemu-common.h b/qemu-common.h
>> index 8f87e41..b6b7203 100644
>> --- a/qemu-common.h
>> +++ b/qemu-common.h
>> @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf,
>> size_t count, int flags)
>>   ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
>>       QEMU_WARN_UNUSED_RESULT;
>>
>> +int qemu_fdlist_init(void);
>> +void qemu_fdlist_add(int fd);
>> +int qemu_fdlist_dup(int idx);
>> +void qemu_fdlist_closeall(void);
>> +
>>   #ifndef _WIN32
>>   int qemu_eventfd(int pipefd[2]);
>>   int qemu_pipe(int pipefd[2]);
>> diff --git a/vl.c b/vl.c
>> index 1329c30..e515c84 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp)
>>       QLIST_INIT (&vm_change_state_head);
>>       os_setup_early_signal_handling();
>>
>> +    if (qemu_fdlist_init() < 0) {
>> +    perror("fdlist");
>> +    exit(EXIT_FAILURE);
>> +    }
>> +
>>       module_call_init(MODULE_INIT_MACHINE);
>>       machine = find_default_machine();
>>       cpu_model = NULL;
>>
>
>

-- 
Regards,
Corey





More information about the libvir-list mailing list