[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Corey Bryant coreyb at linux.vnet.ibm.com
Tue Jun 26 15:37:29 UTC 2012



On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
>>>> So now from a client's POV you'd have a flow like
>>>>
>>>>     * drive_add "file=/dev/fd/N"  FDSET={N}
>>>
>>> IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
>>
>> Yes, you'd probably use the JSON to tell QEMU exactly
>> how many FDs you're intending to pass with the command,
>> and then once the command is received, you'd have N*SCM_RIGHTS
>> messages sent/received.
>>
>>>>
>>>> And in QEMU you'd have something like
>>>>
>>>>     * handle_monitor_command
>>>>          - recvmsg all FDs, and stash them in a thread local "FDContext"
>>>>            context
>>>>          - invoke monitor command handler
>>>>                - Sees file=/dev/fd/N
>>>>                - Fetch /dev/fd/N from "FDContext"
>>>>                - If success remove /dev/fd/N from "FDContext"
>>>>          - close() all FDs left in "FDContext"
>>>>
>>>> The key point with this is that because the FDs are directly
>>>> associated with a monitor command, QEMU can /guarantee/ that
>>>> FDs are never leaked, regardless of client behaviour.
>>>
>>> Wouldn't this leak fds if libvirt crashed part way through sending
>>> the set of fds?
>>
>> No, because you've scoped the FDs to the current monitor instance,
>> and the current command being processed you know to close all FDs
>> when the associated command has finished, as well as closing them
>> all if you see EOF on the monitor socket while in the middle of
>> receiving a command.
>
> Here is a quick proof of concept (ie untested) patch to demonstrate
> what I mean. It relies on Cory's patch which converts everything
> to use qemu_open. It is also still valuable to make the change
> to qemu_open() to support "/dev/fd/N" for passing FDs during
> QEMU initial startup for CLI args.
>
> IMHO, what I propose here is preferrable for QMP clients that
> our current plan of requiring use of 3 monitor commands (passfd,
> XXXXX, closefd).

Thanks for the PoC.

Two other required updates that I can think of would be:

1) Update change, block_stream, block_reopen, snapshot_blkdev, and 
perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.

2) Support re-opening files with different access modes (O_RDONLY, 
O_WRONLY, or O_RDWR).  This would be similar to the force option for 
pass-fd.

-- 
Regards,
Corey

>
>  From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001
> From: "Daniel P. Berrange" <berrange at redhat.com>
> Date: Tue, 26 Jun 2012 15:48:13 +0100
> Subject: [PATCH] Support passing FDs with the monitor command that uses them
>
> Add support for a syntax "/dev/fdlist/N", where 'N' refers to
> the N'th FD that was passed along with the currently processing
> command in this thread.
>
> The QMP client sends
>
>          {
>            "execute": "drive_add",
>            "arguments": {
>                 "filename": "/dev/fdlist/0",
>                 "backingfilename": "/dev/fdlist/1" },
>            "nfds": 2
>          }
>
> followed by "nfds" * SCM_RIGHTS packets
>
> The FDs received along with the "drive_add" command are placed
> in a thread-local list, then the QMP handler function is invoked.
> The qemu_open() function will resolve any path "/dev/fdlist/0"
> to use an FD stored in the  thread-local list, and dup() it.
> One the QMP handler function is finished, all FDs in the thread
> local list are closed.
>
> THe reason for choice of "/dev/fdlist/N", is to allow the
> "/dev/fd/N" syntax to be used for FDs that are passed to QEMU
> at startup time, which will be in the global FD number namespace,
> not the thread-local index.
>
> The reason for using a thread local is to ensure the FDs are
> only made available to the currently executing monitor command
> handler in this thread, and not any other threads.
>
> The advantage over using a separate 'passfd' command, is that
> this guarentees all FDs are closed in QEMU once the QMP command
> handler is finished. There is no risk of leaks being triggered
> by the client either intentionally, or via it crashing.
>
> NB: this is a proof of concept demonstration of the overall
> architectural design. The patch would need more work before
> it was suitable for merging
>
>   - Pick a better place/file for the fdlist APIs
>   - Do proper error reporting in various places
>   - Check for possibility that qemu_chr_fe_get_msgfd
>     can block or return EAGAIN.
>
> diff --git a/monitor.c b/monitor.c
> index f6107ba..a3a4bd4 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens)
>       const mon_cmd_t *cmd;
>       const char *cmd_name;
>       Monitor *mon = cur_mon;
> +    int nfds;
> +    size_t i;
>
>       args = input = NULL;
>
> @@ -4535,6 +4537,17 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens)
>           goto err_out;
>       }
>
> +    nfds = qdict_get_int(input, "nfds");
> +    for (i = 0 ; i < nfds ; i++) {
> +	int fd = qemu_chr_fe_get_msgfd(mon->chr);
> +	if (fd == -1) {
> +	    qerror_report(QERR_FD_NOT_SUPPLIED);
> +	    goto err_out;
> +	}
> +
> +	qemu_fdlist_add(fd);
> +    }
> +
>       if (handler_is_async(cmd)) {
>           err = qmp_async_cmd_handler(mon, cmd, args);
>           if (err) {
> @@ -4552,6 +4565,7 @@ err_out:
>   out:
>       QDECREF(input);
>       QDECREF(args);
> +    qemu_fdlist_closeall();
>   }
>
>   /**
> diff --git a/osdep.c b/osdep.c
> index 03817f0..0849cb6 100644
> --- a/osdep.c
> +++ b/osdep.c
> @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice)
>   #endif
>   }
>
> +typedef struct QEMUFDList {
> +    int *fds;
> +    size_t nfds;
> +} QEMUFDList;
> +
> +static pthread_key_t fdlist_key;
> +
> +static void qemu_fdlist_cleanup(void *data)
> +{
> +    size_t i;
> +    QEMUFDList *fdlist = data;
> +
> +    if (!fdlist)
> +	return;
> +
> +    for (i = 0 ; i < fdlist->nfds ; i++) {
> +	close(fdlist->fds[i]);
> +    }
> +}
> +
> +int qemu_fdlist_init(void)
> +{
> +    if (pthread_key_create(&fdlist_key,
> +			   qemu_fdlist_cleanup) < 0)
> +	return -1;
> +
> +    return 0;
> +}
> +
> +static QEMUFDList *qemu_fdlist_for_thread(void)
> +{
> +    QEMUFDList *fdlist = pthread_getspecific(fdlist_key);
> +
> +    if (fdlist == NULL) {
> +	fdlist = g_new0(QEMUFDList, 1);
> +	pthread_setspecific(fdlist_key, fdlist);
> +    }
> +
> +    return fdlist;
> +}
> +
> +void qemu_fdlist_add(int fd)
> +{
> +    QEMUFDList *fdlist = qemu_fdlist_for_thread();
> +
> +    fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1);
> +    fdlist->fds[fdlist->nfds++] = fd;
> +}
> +
> +int qemu_fdlist_dup(int idx)
> +{
> +    QEMUFDList *fdlist = qemu_fdlist_for_thread();
> +
> +    if (idx >= fdlist->nfds) {
> +	errno = EINVAL;
> +	return -1;
> +    }
> +
> +    return dup(fdlist->fds[idx]);
> +}
> +
> +void qemu_fdlist_closeall(void)
> +{
> +    QEMUFDList *fdlist = qemu_fdlist_for_thread();
> +    size_t i;
> +
> +    for (i = 0 ; i < fdlist->nfds ; i++) {
> +	close(fdlist->fds[i]);
> +    }
> +    g_free(fdlist->fds);
> +    fdlist->fds = NULL;
> +    fdlist->nfds = 0;
> +}
> +
> +
>
>   /*
>    * Opens a file with FD_CLOEXEC set
> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...)
>   {
>       int ret;
>       int mode = 0;
> +    const char *p;
> +
> +    /* Attempt dup of fd for pre-opened file */
> +    if (strstart(name, "/dev/fdlist/", &p)) {
> +        int idx;
> +	int fd;
> +
> +        idx = qemu_parse_fd(p);
> +        if (idx == -1) {
> +            return -1;
> +        }
> +
> +	fd = qemu_fdlist_dup(idx);
> +	if (fd == -1) {
> +	    return -1;
> +	}
> +
> +	/* XXX verify flags match */
> +	return fd;
> +    }
>
>       if (flags & O_CREAT) {
>           va_list ap;
> diff --git a/qemu-common.h b/qemu-common.h
> index 8f87e41..b6b7203 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
>   ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
>       QEMU_WARN_UNUSED_RESULT;
>
> +int qemu_fdlist_init(void);
> +void qemu_fdlist_add(int fd);
> +int qemu_fdlist_dup(int idx);
> +void qemu_fdlist_closeall(void);
> +
>   #ifndef _WIN32
>   int qemu_eventfd(int pipefd[2]);
>   int qemu_pipe(int pipefd[2]);
> diff --git a/vl.c b/vl.c
> index 1329c30..e515c84 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp)
>       QLIST_INIT (&vm_change_state_head);
>       os_setup_early_signal_handling();
>
> +    if (qemu_fdlist_init() < 0) {
> +	perror("fdlist");
> +	exit(EXIT_FAILURE);
> +    }
> +
>       module_call_init(MODULE_INIT_MACHINE);
>       machine = find_default_machine();
>       cpu_model = NULL;
>




More information about the libvir-list mailing list