[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Corey Bryant
coreyb at linux.vnet.ibm.com
Tue Jun 26 18:40:03 UTC 2012
On 06/26/2012 11:37 AM, Corey Bryant wrote:
>
>
> On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
>> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
>>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
>>>>> So now from a client's POV you'd have a flow like
>>>>>
>>>>> * drive_add "file=/dev/fd/N" FDSET={N}
>>>>
>>>> IIUC then drive_add would loop and pass each fd in the set via
>>>> SCM_RIGHTS?
>>>
>>> Yes, you'd probably use the JSON to tell QEMU exactly
>>> how many FDs you're intending to pass with the command,
>>> and then once the command is received, you'd have N*SCM_RIGHTS
>>> messages sent/received.
>>>
>>>>>
>>>>> And in QEMU you'd have something like
>>>>>
>>>>> * handle_monitor_command
>>>>> - recvmsg all FDs, and stash them in a thread local
>>>>> "FDContext"
>>>>> context
>>>>> - invoke monitor command handler
>>>>> - Sees file=/dev/fd/N
>>>>> - Fetch /dev/fd/N from "FDContext"
>>>>> - If success remove /dev/fd/N from "FDContext"
>>>>> - close() all FDs left in "FDContext"
>>>>>
>>>>> The key point with this is that because the FDs are directly
>>>>> associated with a monitor command, QEMU can /guarantee/ that
>>>>> FDs are never leaked, regardless of client behaviour.
>>>>
>>>> Wouldn't this leak fds if libvirt crashed part way through sending
>>>> the set of fds?
>>>
>>> No, because you've scoped the FDs to the current monitor instance,
>>> and the current command being processed you know to close all FDs
>>> when the associated command has finished, as well as closing them
>>> all if you see EOF on the monitor socket while in the middle of
>>> receiving a command.
>>
>> Here is a quick proof of concept (ie untested) patch to demonstrate
>> what I mean. It relies on Cory's patch which converts everything
>> to use qemu_open. It is also still valuable to make the change
>> to qemu_open() to support "/dev/fd/N" for passing FDs during
>> QEMU initial startup for CLI args.
>>
>> IMHO, what I propose here is preferrable for QMP clients that
>> our current plan of requiring use of 3 monitor commands (passfd,
>> XXXXX, closefd).
>
> Thanks for the PoC.
>
> Two other required updates that I can think of would be:
>
> 1) Update change, block_stream, block_reopen, snapshot_blkdev, and
> perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
>
Nevermind my comment. I see that your PoC supports passing nfds for any
QMP command.
I'm curious what Kevin's thoughts are on this and the overall approach.
> 2) Support re-opening files with different access modes (O_RDONLY,
> O_WRONLY, or O_RDWR). This would be similar to the force option for
> pass-fd.
>
I'm still not quite sure how we'd go about this. We need a way to
determine the existing QEMU fd that is to be re-associated with the new
fd, when using a /dev/fdlist/0 filename. In this new approach, 0
corresponds to an index, not an fd. The prior approach of using the
/dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy.
>
>>
>> From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001
>> From: "Daniel P. Berrange" <berrange at redhat.com>
>> Date: Tue, 26 Jun 2012 15:48:13 +0100
>> Subject: [PATCH] Support passing FDs with the monitor command that
>> uses them
>>
>> Add support for a syntax "/dev/fdlist/N", where 'N' refers to
>> the N'th FD that was passed along with the currently processing
>> command in this thread.
>>
>> The QMP client sends
>>
>> {
>> "execute": "drive_add",
>> "arguments": {
>> "filename": "/dev/fdlist/0",
>> "backingfilename": "/dev/fdlist/1" },
>> "nfds": 2
>> }
>>
>> followed by "nfds" * SCM_RIGHTS packets
>>
>> The FDs received along with the "drive_add" command are placed
>> in a thread-local list, then the QMP handler function is invoked.
>> The qemu_open() function will resolve any path "/dev/fdlist/0"
>> to use an FD stored in the thread-local list, and dup() it.
>> One the QMP handler function is finished, all FDs in the thread
>> local list are closed.
>>
>> THe reason for choice of "/dev/fdlist/N", is to allow the
>> "/dev/fd/N" syntax to be used for FDs that are passed to QEMU
>> at startup time, which will be in the global FD number namespace,
>> not the thread-local index.
>>
>> The reason for using a thread local is to ensure the FDs are
>> only made available to the currently executing monitor command
>> handler in this thread, and not any other threads.
>>
>> The advantage over using a separate 'passfd' command, is that
>> this guarentees all FDs are closed in QEMU once the QMP command
>> handler is finished. There is no risk of leaks being triggered
>> by the client either intentionally, or via it crashing.
>>
>> NB: this is a proof of concept demonstration of the overall
>> architectural design. The patch would need more work before
>> it was suitable for merging
>>
>> - Pick a better place/file for the fdlist APIs
>> - Do proper error reporting in various places
>> - Check for possibility that qemu_chr_fe_get_msgfd
>> can block or return EAGAIN.
>>
>> diff --git a/monitor.c b/monitor.c
>> index f6107ba..a3a4bd4 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser
>> *parser, QList *tokens)
>> const mon_cmd_t *cmd;
>> const char *cmd_name;
>> Monitor *mon = cur_mon;
>> + int nfds;
>> + size_t i;
>>
>> args = input = NULL;
>>
>> @@ -4535,6 +4537,17 @@ static void
>> handle_qmp_command(JSONMessageParser *parser, QList *tokens)
>> goto err_out;
>> }
>>
>> + nfds = qdict_get_int(input, "nfds");
>> + for (i = 0 ; i < nfds ; i++) {
>> + int fd = qemu_chr_fe_get_msgfd(mon->chr);
>> + if (fd == -1) {
>> + qerror_report(QERR_FD_NOT_SUPPLIED);
>> + goto err_out;
>> + }
>> +
>> + qemu_fdlist_add(fd);
>> + }
>> +
>> if (handler_is_async(cmd)) {
>> err = qmp_async_cmd_handler(mon, cmd, args);
>> if (err) {
>> @@ -4552,6 +4565,7 @@ err_out:
>> out:
>> QDECREF(input);
>> QDECREF(args);
>> + qemu_fdlist_closeall();
>> }
>>
>> /**
>> diff --git a/osdep.c b/osdep.c
>> index 03817f0..0849cb6 100644
>> --- a/osdep.c
>> +++ b/osdep.c
>> @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice)
>> #endif
>> }
>>
>> +typedef struct QEMUFDList {
>> + int *fds;
>> + size_t nfds;
>> +} QEMUFDList;
>> +
>> +static pthread_key_t fdlist_key;
>> +
>> +static void qemu_fdlist_cleanup(void *data)
>> +{
>> + size_t i;
>> + QEMUFDList *fdlist = data;
>> +
>> + if (!fdlist)
>> + return;
>> +
>> + for (i = 0 ; i < fdlist->nfds ; i++) {
>> + close(fdlist->fds[i]);
>> + }
>> +}
>> +
>> +int qemu_fdlist_init(void)
>> +{
>> + if (pthread_key_create(&fdlist_key,
>> + qemu_fdlist_cleanup) < 0)
>> + return -1;
>> +
>> + return 0;
>> +}
>> +
>> +static QEMUFDList *qemu_fdlist_for_thread(void)
>> +{
>> + QEMUFDList *fdlist = pthread_getspecific(fdlist_key);
>> +
>> + if (fdlist == NULL) {
>> + fdlist = g_new0(QEMUFDList, 1);
>> + pthread_setspecific(fdlist_key, fdlist);
>> + }
>> +
>> + return fdlist;
>> +}
>> +
>> +void qemu_fdlist_add(int fd)
>> +{
>> + QEMUFDList *fdlist = qemu_fdlist_for_thread();
>> +
>> + fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1);
>> + fdlist->fds[fdlist->nfds++] = fd;
>> +}
>> +
>> +int qemu_fdlist_dup(int idx)
>> +{
>> + QEMUFDList *fdlist = qemu_fdlist_for_thread();
>> +
>> + if (idx >= fdlist->nfds) {
>> + errno = EINVAL;
>> + return -1;
>> + }
>> +
>> + return dup(fdlist->fds[idx]);
>> +}
>> +
>> +void qemu_fdlist_closeall(void)
>> +{
>> + QEMUFDList *fdlist = qemu_fdlist_for_thread();
>> + size_t i;
>> +
>> + for (i = 0 ; i < fdlist->nfds ; i++) {
>> + close(fdlist->fds[i]);
>> + }
>> + g_free(fdlist->fds);
>> + fdlist->fds = NULL;
>> + fdlist->nfds = 0;
>> +}
>> +
>> +
>>
>> /*
>> * Opens a file with FD_CLOEXEC set
>> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...)
>> {
>> int ret;
>> int mode = 0;
>> + const char *p;
>> +
>> + /* Attempt dup of fd for pre-opened file */
>> + if (strstart(name, "/dev/fdlist/", &p)) {
>> + int idx;
>> + int fd;
>> +
>> + idx = qemu_parse_fd(p);
>> + if (idx == -1) {
>> + return -1;
>> + }
>> +
>> + fd = qemu_fdlist_dup(idx);
>> + if (fd == -1) {
>> + return -1;
>> + }
>> +
>> + /* XXX verify flags match */
>> + return fd;
>> + }
>>
>> if (flags & O_CREAT) {
>> va_list ap;
>> diff --git a/qemu-common.h b/qemu-common.h
>> index 8f87e41..b6b7203 100644
>> --- a/qemu-common.h
>> +++ b/qemu-common.h
>> @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf,
>> size_t count, int flags)
>> ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
>> QEMU_WARN_UNUSED_RESULT;
>>
>> +int qemu_fdlist_init(void);
>> +void qemu_fdlist_add(int fd);
>> +int qemu_fdlist_dup(int idx);
>> +void qemu_fdlist_closeall(void);
>> +
>> #ifndef _WIN32
>> int qemu_eventfd(int pipefd[2]);
>> int qemu_pipe(int pipefd[2]);
>> diff --git a/vl.c b/vl.c
>> index 1329c30..e515c84 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp)
>> QLIST_INIT (&vm_change_state_head);
>> os_setup_early_signal_handling();
>>
>> + if (qemu_fdlist_init() < 0) {
>> + perror("fdlist");
>> + exit(EXIT_FAILURE);
>> + }
>> +
>> module_call_init(MODULE_INIT_MACHINE);
>> machine = find_default_machine();
>> cpu_model = NULL;
>>
>
>
--
Regards,
Corey
More information about the libvir-list
mailing list