[libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol

Corey Bryant bryntcor at us.ibm.com
Thu Jun 16 14:48:51 UTC 2011



On 06/15/2011 03:12 PM, Blue Swirl wrote:
> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor at us.ibm.com>  wrote:
>> >  sVirt provides SELinux MAC isolation for Qemu guest processes and their
>> >  corresponding resources (image files). sVirt provides this support
>> >  by labeling guests and resources with security labels that are stored
>> >  in file system extended attributes. Some file systems, such as NFS, do
>> >  not support the extended attribute security namespace, which is needed
>> >  for image file isolation when using the sVirt SELinux security driver
>> >  in libvirt.
>> >
>> >  The proposed solution entails a combination of Qemu, libvirt, and
>> >  SELinux patches that work together to isolate multiple guests' images
>> >  when they're stored in the same NFS mount. This results in an
>> >  environment where sVirt isolation and NFS image file isolation can both
>> >  be provided.
>> >
>> >  This patch contains the Qemu code to support this solution. I would
>> >  like to solicit input from the libvirt community prior to starting
>> >  the libvirt patch.
>> >
>> >  Currently, Qemu opens an image file in addition to performing the
>> >  necessary read and write operations. The proposed solution will move
>> >  the open out of Qemu and into libvirt. Once libvirt opens an image
>> >  file for the guest, it will pass the file descriptor to Qemu via a
>> >  new fd: protocol.
>> >
>> >  If the image file resides in an NFS mount, the following SELinux policy
>> >  changes will provide image isolation:
>> >
>> >    - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>> >      allow Qemu (svirt_t) to only have SELinux read and write
>> >      permissions on nfs_t files
>> >
>> >    - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>> >      (virtd_t) file descriptors
>> >
>> >  Following is a sample invocation of Qemu using the fd: protocol on
>> >  the command line:
>> >
>> >      qemu -drive file=fd:4,format=qcow2
>> >
>> >  The fd: protocol is also supported by the drive_add monitor command.
>> >  This requires that the specified file descriptor is passed to the
>> >  monitor alongside a prior getfd monitor command.
>> >
>> >  There are some additional features provided by certain image types
>> >  where Qemu reopens the image file. All of these scenarios will be
>> >  unsupported for the fd: protocol, at least for this patch:
>> >
>> >    - The -snapshot command line option
>> >    - The savevm monitor command
>> >    - The snapshot_blkdev monitor command
>> >    - Starting Qemu with a backing file
> There's also native CDROM device. Did you consider adding an explicit
> reopen method to block layer?
Thanks. Yes it looks like I overlooked CDROM reopens.

I'm not sure that I'm clear on the purpose of the reopen function. 
Would the goal be to funnel all block layer reopens through a single 
function, enabling potential future support where a privileged layer of 
Qemu, or libvirt, performs the open?


>
>> >  The thought is that this support can be added in the future, but is
>> >  not required for the initial fd: support.
>> >
>> >  This patch was tested with the following formats: raw, cow, qcow,
>> >  qcow2, qed, and vmdk, using the fd: protocol from the command line
>> >  and the monitor. Tests were also run to verify existing file name
>> >  support and qemu-img were not regressed. Non-valid file descriptors,
>> >  fd: without format, snapshot and backing files were also tested.
>> >
>> >  Signed-off-by: Corey Bryant<coreyb at linux.vnet.ibm.com>
>> >  ---
>> >    block.c           |   16 ++++++++++
>> >    block.h           |    1 +
>> >    block/cow.c       |    5 +++
>> >    block/qcow.c      |    5 +++
>> >    block/qcow2.c     |    5 +++
>> >    block/qed.c       |    4 ++
>> >    block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>> >    block/vmdk.c      |    5 +++
>> >    block_int.h       |    1 +
>> >    blockdev.c        |   10 ++++++
>> >    monitor.c         |    5 +++
>> >    monitor.h         |    1 +
>> >    qemu-options.hx   |    8 +++--
>> >    qemu-tool.c       |    5 +++
>> >    14 files changed, 140 insertions(+), 12 deletions(-)
>> >
>> >  diff --git a/block.c b/block.c
>> >  index 24a25d5..500db84 100644
>> >  --- a/block.c
>> >  +++ b/block.c
>> >  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >           char tmp_filename[PATH_MAX];
>> >           char backing_filename[PATH_MAX];
>> >
>> >  +        if (bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >  +
>> >           /* if snapshot, we create a temporary backing file and open it
>> >              instead of opening 'filename' directly */
>> >
>> >  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >
>> >       /* Find the right image format driver */
>> >       if (!drv) {
>> >  +        /* format must be specified for fd: protocol */
>> >  +        if (bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >           ret = find_image_format(filename,&drv);
>> >       }
>> >
>> >  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>> >       return bs->enable_write_cache;
>> >    }
>> >
>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>> >  +{
>> >  +    return bs->fd_protocol;
>> >  +}
>> >  +
>> >    /* XXX: no longer used */
>> >    void bdrv_set_change_cb(BlockDriverState *bs,
>> >                           void (*change_cb)(void *opaque, int reason),
>> >  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>> >       BlockDriver *drv = bs->drv;
>> >       if (!drv)
>> >           return -ENOMEDIUM;
>> >  +    if (bdrv_is_fd_protocol(bs)) {
>> >  +        return -ENOTSUP;
>> >  +    }
>> >       if (drv->bdrv_snapshot_create)
>> >           return drv->bdrv_snapshot_create(bs, sn_info);
>> >       if (bs->file)
>> >  diff --git a/block.h b/block.h
>> >  index da7d39c..dd46d52 100644
>> >  --- a/block.h
>> >  +++ b/block.h
>> >  @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>> >    int bdrv_is_read_only(BlockDriverState *bs);
>> >    int bdrv_is_sg(BlockDriverState *bs);
>> >    int bdrv_enable_write_cache(BlockDriverState *bs);
>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs);
>> >    int bdrv_is_inserted(BlockDriverState *bs);
>> >    int bdrv_media_changed(BlockDriverState *bs);
>> >    int bdrv_is_locked(BlockDriverState *bs);
>> >  diff --git a/block/cow.c b/block/cow.c
>> >  index 4cf543c..e17f8e7 100644
>> >  --- a/block/cow.c
>> >  +++ b/block/cow.c
>> >  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>> >       pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>> >               cow_header.backing_file);
>> >
>> >  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +        /* backing file currently not supported by fd: protocol */
>> >  +        goto fail;
>> >  +    }
>> >  +
>> >       bitmap_size = ((bs->total_sectors + 7)>>  3) + sizeof(cow_header);
>> >       s->cow_sectors_offset = (bitmap_size + 511)&  ~511;
>> >       return 0;
>> >  diff --git a/block/qcow.c b/block/qcow.c
>> >  index a26c886..1e46bdb 100644
>> >  --- a/block/qcow.c
>> >  +++ b/block/qcow.c
>> >  @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>> >           if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>> >               goto fail;
>> >           bs->backing_file[len] = '\0';
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            /* backing file currently not supported by fd: protocol */
>> >  +            goto fail;
>> >  +        }
>> >       }
>> >       return 0;
>> >
>> >  diff --git a/block/qcow2.c b/block/qcow2.c
>> >  index 8451ded..8b9c160 100644
>> >  --- a/block/qcow2.c
>> >  +++ b/block/qcow2.c
>> >  @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>> >               goto fail;
>> >           }
>> >           bs->backing_file[len] = '\0';
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            ret = -ENOTSUP;
>> >  +            goto fail;
>> >  +        }
>> >       }
>> >       if (qcow2_read_snapshots(bs)<  0) {
>> >           ret = -EINVAL;
>> >  diff --git a/block/qed.c b/block/qed.c
>> >  index 3970379..5028897 100644
>> >  --- a/block/qed.c
>> >  +++ b/block/qed.c
>> >  @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>> >               return ret;
>> >           }
>> >
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >  +
>> >           if (s->header.features&  QED_F_BACKING_FORMAT_NO_PROBE) {
>> >               pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>> >           }
>> >  diff --git a/block/raw-posix.c b/block/raw-posix.c
>> >  index 4cd7d7a..c72de3d 100644
>> >  --- a/block/raw-posix.c
>> >  +++ b/block/raw-posix.c
>> >  @@ -28,6 +28,7 @@
>> >    #include "block_int.h"
>> >    #include "module.h"
>> >    #include "block/raw-posix-aio.h"
>> >  +#include "monitor.h"
>> >
>> >    #ifdef CONFIG_COCOA
>> >    #include<paths.h>
>> >  @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> >                              int bdrv_flags, int open_flags)
>> >    {
>> >       BDRVRawState *s = bs->opaque;
>> >  -    int fd, ret;
>> >  +    int fd = -1;
>> >  +    int ret;
>> >
>> >       ret = raw_normalize_devicepath(&filename);
>> >       if (ret != 0) {
>> >  @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> >       if (!(bdrv_flags&  BDRV_O_CACHE_WB))
>> >           s->open_flags |= O_DSYNC;
>> >
>> >  -    s->fd = -1;
>> >  -    fd = qemu_open(filename, s->open_flags, 0644);
>> >  -    if (fd<  0) {
>> >  -        ret = -errno;
>> >  -        if (ret == -EROFS)
>> >  -            ret = -EACCES;
>> >  -        return ret;
>> >  +    if (s->fd == -1) {
>> >  +        fd = qemu_open(filename, s->open_flags, 0644);
>> >  +        if (fd<  0) {
>> >  +            ret = -errno;
>> >  +            if (ret == -EROFS) {
>> >  +                ret = -EACCES;
>> >  +            }
>> >  +            return ret;
>> >  +        }
>> >  +        s->fd = fd;
>> >       }
>> >  -    s->fd = fd;
>> >       s->aligned_buf = NULL;
>> >
>> >       if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>> >  @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>> >    {
>> >       BDRVRawState *s = bs->opaque;
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_FILE;
>> >       return raw_open_common(bs, filename, flags, 0);
>> >    }
>> >  @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>> >       .create_options = raw_create_options,
>> >    };
>> >
>> >  +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
>> >  +{
>> >  +    BDRVRawState *s = bs->opaque;
>> >  +    const char *fd_str;
>> >  +    int fd;
>> >  +
>> >  +    /* extract the file descriptor - fail if it's not fd: */
>> >  +    if (!strstart(filename, "fd:",&fd_str)) {
>> >  +        return -EINVAL;
>> >  +    }
>> >  +
>> >  +    if (!qemu_isdigit(fd_str[0])) {
>> >  +        /* get fd from monitor */
>> >  +        fd = qemu_get_fd(fd_str);
>> >  +        if (fd == -1) {
>> >  +            return -EBADF;
>> >  +        }
>> >  +    } else {
>> >  +        char *endptr = NULL;
>> >  +
>> >  +        fd = strtol(fd_str,&endptr, 10);
>> >  +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>> >  +            return -EBADF;
>> >  +        }
>> >  +    }
>> >  +
>> >  +    s->fd = fd;
>> >  +    s->type = FTYPE_FILE;
>> >  +
>> >  +    return raw_open_common(bs, filename, flags, 0);
>> >  +}
>> >  +
>> >  +static BlockDriver bdrv_file_fd = {
>> >  +    .format_name = "file",
>> >  +    .protocol_name = "fd",
>> >  +    .instance_size = sizeof(BDRVRawState),
>> >  +    .bdrv_probe = NULL, /* no probe for protocols */
>> >  +    .bdrv_file_open = raw_open_fd,
>> >  +    .bdrv_read = raw_read,
>> >  +    .bdrv_write = raw_write,
>> >  +    .bdrv_close = raw_close,
>> >  +    .bdrv_flush = raw_flush,
>> >  +    .bdrv_discard = raw_discard,
>> >  +
>> >  +    .bdrv_aio_readv = raw_aio_readv,
>> >  +    .bdrv_aio_writev = raw_aio_writev,
>> >  +    .bdrv_aio_flush = raw_aio_flush,
>> >  +
>> >  +    .bdrv_truncate = raw_truncate,
>> >  +    .bdrv_getlength = raw_getlength,
>> >  +
>> >  +    .create_options = raw_create_options,
>> >  +};
>> >  +
>> >    /***********************************************/
>> >    /* host device */
>> >
>> >  @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>> >       }
>> >    #endif
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_FILE;
>> >    #if defined(__linux__)
>> >       {
>> >  @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>> >       BDRVRawState *s = bs->opaque;
>> >       int ret;
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_FD;
>> >
>> >       /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
>> >  @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>> >    {
>> >       BDRVRawState *s = bs->opaque;
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_CD;
>> >
>> >       /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
>> >  @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>> >        * Register all the drivers.  Note that order is important, the driver
>> >        * registered last will get probed first.
>> >        */
>> >  +    bdrv_register(&bdrv_file_fd);
>> >       bdrv_register(&bdrv_file);
>> >       bdrv_register(&bdrv_host_device);
>> >    #ifdef __linux__
>> >  diff --git a/block/vmdk.c b/block/vmdk.c
>> >  index 922b23d..2ea808e 100644
>> >  --- a/block/vmdk.c
>> >  +++ b/block/vmdk.c
>> >  @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>> >               return -1;
>> >
>> >           pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            /* backing file currently not supported by fd: protocol */
>> >  +            return -1;
>> >  +        }
>> >       }
>> >
>> >       return 0;
>> >  diff --git a/block_int.h b/block_int.h
>> >  index fa91337..a305ee2 100644
>> >  --- a/block_int.h
>> >  +++ b/block_int.h
>> >  @@ -152,6 +152,7 @@ struct BlockDriverState {
>> >       int encrypted; /* if true, the media is encrypted */
>> >       int valid_key; /* if true, a valid encryption key has been set */
>> >       int sg;        /* if true, the device is a /dev/sg* */
>> >  +    int fd_protocol; /* if true, the fd: protocol was specified */
> bool?
>
I was following suit here, but I agree that bool would be better.  Or 
better, these could all be reduced to bit flags.  My thought is that 
I'll stick with following suit here though.


>> >       /* event callback when inserting/removing */
>> >       void (*change_cb)(void *opaque, int reason);
>> >       void *change_opaque;
>> >  diff --git a/blockdev.c b/blockdev.c
>> >  index e81e0ab..a536c20 100644
>> >  --- a/blockdev.c
>> >  +++ b/blockdev.c
>> >  @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>> >
>> >       bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>> >
>> >  +    if (strncmp(file, "fd:", 3) == 0) {
>> >  +        dinfo->bdrv->fd_protocol = 1;
>> >  +    }
>> >  +
>> >       ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>> >       if (ret<  0) {
>> >           error_report("could not open disk image %s: %s",
>> >  @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>> >           goto out;
>> >       }
>> >
>> >  +    if (bdrv_is_fd_protocol(bs)) {
>> >  +        qerror_report(QERR_UNSUPPORTED);
>> >  +        ret = -1;
>> >  +        goto out;
>> >  +    }
>> >  +
>> >       pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>> >
>> >       old_drv = bs->drv;
>> >  diff --git a/monitor.c b/monitor.c
>> >  index 59a3e76..ea60be2 100644
>> >  --- a/monitor.c
>> >  +++ b/monitor.c
>> >  @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
>> >       return -1;
>> >    }
>> >
>> >  +int qemu_get_fd(const char *fdname)
>> >  +{
>> >  +    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
>> >  +}
>> >  +
>> >    static const mon_cmd_t mon_cmds[] = {
>> >    #include "hmp-commands.h"
>> >       { NULL, NULL, },
>> >  diff --git a/monitor.h b/monitor.h
>> >  index 4f2d328..de5b987 100644
>> >  --- a/monitor.h
>> >  +++ b/monitor.h
>> >  @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>> >                                   void *opaque);
>> >
>> >    int monitor_get_fd(Monitor *mon, const char *fdname);
>> >  +int qemu_get_fd(const char *fdname);
>> >
>> >    void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>> >       GCC_FMT_ATTR(2, 0);
>> >  diff --git a/qemu-options.hx b/qemu-options.hx
>> >  index 1d5ad8b..f9b66f4 100644
>> >  --- a/qemu-options.hx
>> >  +++ b/qemu-options.hx
>> >  @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}).
>> >    ETEXI
>> >
>> >    DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>> >  -    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>> >  +    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>> >       "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>> >       "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>> >       "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>> >  @@ -129,10 +129,12 @@ STEXI
>> >    Define a new drive. Valid options are:
>> >
>> >    @table @option
>> >  - at item file=@var{file}
>> >  + at item file=[fd:]@var{file}
>> >    This option defines which disk image (@pxref{disk_images}) to use with
>> >    this drive. If the filename contains comma, you must double it
>> >  -(for instance, "file=my,,file" to use file "my,file").
>> >  +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
>> >  +specifies the file descriptor of an already open disk image.
>> >  + at option{format=}@var{format} is required by @option{fd:}@var{file}.
>> >    @item if=@var{interface}
>> >    This option defines on which type on interface the drive is connected.
>> >    Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
>> >  diff --git a/qemu-tool.c b/qemu-tool.c
>> >  index 41e5c41..8fe6b8c 100644
>> >  --- a/qemu-tool.c
>> >  +++ b/qemu-tool.c
>> >  @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>> >    {
>> >       return 0;
>> >    }
>> >  +
>> >  +int qemu_get_fd(const char *fdname)
>> >  +{
>> >  +    return -1;
>> >  +}
>> >  --
>> >  1.7.1
>> >
>> >
>> >


Regards,
Corey Bryant




More information about the libvir-list mailing list