[libvirt] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets

Kevin Wolf kwolf at redhat.com
Tue Jul 24 12:07:57 UTC 2012


Am 23.07.2012 15:08, schrieb Corey Bryant:
> When qemu_open is passed a filename of the "/dev/fdset/nnn"
> format (where nnn is the fdset ID), an fd with matching access
> mode flags will be searched for within the specified monitor
> fd set.  If the fd is found, a dup of the fd will be returned
> from qemu_open.
> 
> Each fd set has a reference count.  The purpose of the reference
> count is to determine if an fd set contains file descriptors that
> have open dup() references that have not yet been closed.  It is
> incremented on qemu_open and decremented on qemu_close.  It is
> not until the refcount is zero that file desriptors in an fd set
> can be closed.  If an fd set has dup() references open, then we
> must keep the other fds in the fd set open in case a reopen
> of the file occurs that requires an fd with a different access
> mode.
> 
> Signed-off-by: Corey Bryant <coreyb at linux.vnet.ibm.com>
> 
> v2:
>  -Get rid of file_open and move dup code to qemu_open
>   (kwolf at redhat.com)
>  -Use strtol wrapper instead of atoi (kwolf at redhat.com)
> 
> v3:
>  -Add note about fd leakage (eblake at redhat.com)
> 
> v4
>  -Moved patch to be later in series (lcapitulino at redhat.com)
>  -Update qemu_open to check access mode flags and set flags that
>   can be set (eblake at redhat.com, kwolf at redhat.com)
> 
> v5:
>  -This patch was overhauled quite a bit in this version, with
>   the addition of fd set and refcount support.
>  -Use qemu_set_cloexec() on dup'd fd (eblake at redhat.com)
>  -Modify flags set by fcntl on dup'd fd (eblake at redhat.com)
>  -Reduce syscalls when setting flags for dup'd fd (eblake at redhat.com)
>  -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake at redhat.com)
> ---
>  block/raw-posix.c |   24 +++++-----
>  block/raw-win32.c |    2 +-
>  block/vmdk.c      |    4 +-
>  block/vpc.c       |    2 +-
>  block/vvfat.c     |   12 ++---
>  cutils.c          |    5 ++
>  monitor.c         |   85 +++++++++++++++++++++++++++++++++
>  monitor.h         |    4 ++
>  osdep.c           |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qemu-common.h     |    3 +-
>  qemu-tool.c       |   12 +++++
>  11 files changed, 267 insertions(+), 24 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index a172de3..5d0a801 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>  out_free_buf:
>      qemu_vfree(s->aligned_buf);
>  out_close:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>      return -errno;
>  }

Hm, not a nice interface where qemu_close() needs the filename and
(worse) could be given a wrong filename. Maybe it would be better to
maintain a list of fd -> fdset mappings in qemu_open/close?

But if we decided to keep it like this, please use the right interface
from the beginning in patch 5 instead of updating it here.

> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>      }
>  }
>  
> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id == fdset_id) {
> +            mon_fdset->refcount++;
> +            break;
> +        }
> +    }
> +}
> +
> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id == fdset_id) {
> +            mon_fdset->refcount--;
> +            if (mon_fdset->refcount == 0) {
> +                monitor_fdset_cleanup(mon_fdset);
> +            }
> +            break;
> +        }
> +    }
> +}

These two functions are almost the same. Would a
monitor_fdset_update_refcount(mon, fdset_id, value) make sense? These
functions could then be kept as thin wrappers around it, or they could
even be dropped completely.

> +
> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
> +{
> +    mon_fdset_t *mon_fdset;
> +    mon_fdset_fd_t *mon_fdset_fd;
> +    int mon_fd_flags;
> +
> +    if (!mon) {
> +        errno = ENOENT;
> +        return -1;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id != fdset_id) {
> +            continue;
> +        }
> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
> +            if (mon_fdset_fd->removed) {
> +                continue;
> +            }
> +
> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
> +            if (mon_fd_flags == -1) {
> +                return -1;
> +            }
> +
> +            switch (flags & O_ACCMODE) {
> +            case O_RDWR:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_RDONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_WRONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            }

I think you mean:

  if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
      return mon_fdset_fd->fd;
  }

What about other flags that cannot be set with fcntl(), like O_SYNC on
older kernels or possibly non-Linux? (The block layer doesn't use it any
more, but I think we want to keep the function generally useful)

> +        }
> +        errno = EACCES;
> +        return -1;
> +    }
> +    errno = ENOENT;
> +    return -1;
> +}
> +
>  /* mon_cmds and info_cmds would be sorted at runtime */
>  static mon_cmd_t mon_cmds[] = {
>  #include "hmp-commands.h"

> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>  #endif
>  }
>  
> +/*
> + * Dups an fd and sets the flags
> + */
> +static int qemu_dup(int fd, int flags)
> +{
> +    int i;
> +    int ret;
> +    int serrno;
> +    int dup_flags;
> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
> +                          O_NONBLOCK, 0 };
> +
> +    if (flags & O_CLOEXEC) {
> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
> +        if (ret == -1 && errno == EINVAL) {
> +            ret = dup(fd);
> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
> +                goto fail;
> +            }
> +        }
> +    } else {
> +        ret = dup(fd);
> +    }
> +
> +    if (ret == -1) {
> +        goto fail;
> +    }
> +
> +    dup_flags = fcntl(ret, F_GETFL);
> +    if (dup_flags == -1) {
> +        goto fail;
> +    }
> +
> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
> +        errno = EINVAL;
> +        goto fail;
> +    }

It's worth trying to set it before failing, newer kernels can do it. But
as I said above, if you can fail here, it makes sense to consider O_SYNC
when selecting the right file descriptor from the fdset.

> +    /* Set/unset flags that we can with fcntl */
> +    i = 0;
> +    while (setfl_flags[i] != 0) {
> +        if (flags & setfl_flags[i]) {
> +            dup_flags = (dup_flags | setfl_flags[i]);
> +        } else {
> +            dup_flags = (dup_flags & ~setfl_flags[i]);
> +        }
> +        i++;
> +    }

What about this instead of the loop:

  int setfl_flags = O_APPEND | O_ASYNC | ... ;

  dup_flags &= ~setfl_flags;
  dup_flags |= (flags & setfl_flags);


> +
> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
> +        goto fail;
> +    }
> +
> +    /* Truncate the file in the cases that open() would truncate it */
> +    if (flags & O_TRUNC ||
> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
> +        if (ftruncate(ret, 0) == -1) {
> +            goto fail;
> +        }
> +    }

O_CREAT | O_EXCL kind of loses its meaning here, but okay, it's hard to
do better with file descriptors.

> +
> +    qemu_set_cloexec(ret);

Wait... If O_CLOEXEC is set, you set the flag immediately and if it
isn't you set it at the end of the function? What's the intended use
case for not using O_CLOEXEC then?

Kevin




More information about the libvir-list mailing list