[libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol
Corey Bryant
bryntcor at us.ibm.com
Thu Jun 16 14:48:51 UTC 2011
On 06/15/2011 03:12 PM, Blue Swirl wrote:
> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor at us.ibm.com> wrote:
>> > sVirt provides SELinux MAC isolation for Qemu guest processes and their
>> > corresponding resources (image files). sVirt provides this support
>> > by labeling guests and resources with security labels that are stored
>> > in file system extended attributes. Some file systems, such as NFS, do
>> > not support the extended attribute security namespace, which is needed
>> > for image file isolation when using the sVirt SELinux security driver
>> > in libvirt.
>> >
>> > The proposed solution entails a combination of Qemu, libvirt, and
>> > SELinux patches that work together to isolate multiple guests' images
>> > when they're stored in the same NFS mount. This results in an
>> > environment where sVirt isolation and NFS image file isolation can both
>> > be provided.
>> >
>> > This patch contains the Qemu code to support this solution. I would
>> > like to solicit input from the libvirt community prior to starting
>> > the libvirt patch.
>> >
>> > Currently, Qemu opens an image file in addition to performing the
>> > necessary read and write operations. The proposed solution will move
>> > the open out of Qemu and into libvirt. Once libvirt opens an image
>> > file for the guest, it will pass the file descriptor to Qemu via a
>> > new fd: protocol.
>> >
>> > If the image file resides in an NFS mount, the following SELinux policy
>> > changes will provide image isolation:
>> >
>> > - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>> > allow Qemu (svirt_t) to only have SELinux read and write
>> > permissions on nfs_t files
>> >
>> > - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>> > (virtd_t) file descriptors
>> >
>> > Following is a sample invocation of Qemu using the fd: protocol on
>> > the command line:
>> >
>> > qemu -drive file=fd:4,format=qcow2
>> >
>> > The fd: protocol is also supported by the drive_add monitor command.
>> > This requires that the specified file descriptor is passed to the
>> > monitor alongside a prior getfd monitor command.
>> >
>> > There are some additional features provided by certain image types
>> > where Qemu reopens the image file. All of these scenarios will be
>> > unsupported for the fd: protocol, at least for this patch:
>> >
>> > - The -snapshot command line option
>> > - The savevm monitor command
>> > - The snapshot_blkdev monitor command
>> > - Starting Qemu with a backing file
> There's also native CDROM device. Did you consider adding an explicit
> reopen method to block layer?
Thanks. Yes it looks like I overlooked CDROM reopens.
I'm not sure that I'm clear on the purpose of the reopen function.
Would the goal be to funnel all block layer reopens through a single
function, enabling potential future support where a privileged layer of
Qemu, or libvirt, performs the open?
>
>> > The thought is that this support can be added in the future, but is
>> > not required for the initial fd: support.
>> >
>> > This patch was tested with the following formats: raw, cow, qcow,
>> > qcow2, qed, and vmdk, using the fd: protocol from the command line
>> > and the monitor. Tests were also run to verify existing file name
>> > support and qemu-img were not regressed. Non-valid file descriptors,
>> > fd: without format, snapshot and backing files were also tested.
>> >
>> > Signed-off-by: Corey Bryant<coreyb at linux.vnet.ibm.com>
>> > ---
>> > block.c | 16 ++++++++++
>> > block.h | 1 +
>> > block/cow.c | 5 +++
>> > block/qcow.c | 5 +++
>> > block/qcow2.c | 5 +++
>> > block/qed.c | 4 ++
>> > block/raw-posix.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++------
>> > block/vmdk.c | 5 +++
>> > block_int.h | 1 +
>> > blockdev.c | 10 ++++++
>> > monitor.c | 5 +++
>> > monitor.h | 1 +
>> > qemu-options.hx | 8 +++--
>> > qemu-tool.c | 5 +++
>> > 14 files changed, 140 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/block.c b/block.c
>> > index 24a25d5..500db84 100644
>> > --- a/block.c
>> > +++ b/block.c
>> > @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> > char tmp_filename[PATH_MAX];
>> > char backing_filename[PATH_MAX];
>> >
>> > + if (bdrv_is_fd_protocol(bs)) {
>> > + return -ENOTSUP;
>> > + }
>> > +
>> > /* if snapshot, we create a temporary backing file and open it
>> > instead of opening 'filename' directly */
>> >
>> > @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >
>> > /* Find the right image format driver */
>> > if (!drv) {
>> > + /* format must be specified for fd: protocol */
>> > + if (bdrv_is_fd_protocol(bs)) {
>> > + return -ENOTSUP;
>> > + }
>> > ret = find_image_format(filename,&drv);
>> > }
>> >
>> > @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>> > return bs->enable_write_cache;
>> > }
>> >
>> > +int bdrv_is_fd_protocol(BlockDriverState *bs)
>> > +{
>> > + return bs->fd_protocol;
>> > +}
>> > +
>> > /* XXX: no longer used */
>> > void bdrv_set_change_cb(BlockDriverState *bs,
>> > void (*change_cb)(void *opaque, int reason),
>> > @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>> > BlockDriver *drv = bs->drv;
>> > if (!drv)
>> > return -ENOMEDIUM;
>> > + if (bdrv_is_fd_protocol(bs)) {
>> > + return -ENOTSUP;
>> > + }
>> > if (drv->bdrv_snapshot_create)
>> > return drv->bdrv_snapshot_create(bs, sn_info);
>> > if (bs->file)
>> > diff --git a/block.h b/block.h
>> > index da7d39c..dd46d52 100644
>> > --- a/block.h
>> > +++ b/block.h
>> > @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>> > int bdrv_is_read_only(BlockDriverState *bs);
>> > int bdrv_is_sg(BlockDriverState *bs);
>> > int bdrv_enable_write_cache(BlockDriverState *bs);
>> > +int bdrv_is_fd_protocol(BlockDriverState *bs);
>> > int bdrv_is_inserted(BlockDriverState *bs);
>> > int bdrv_media_changed(BlockDriverState *bs);
>> > int bdrv_is_locked(BlockDriverState *bs);
>> > diff --git a/block/cow.c b/block/cow.c
>> > index 4cf543c..e17f8e7 100644
>> > --- a/block/cow.c
>> > +++ b/block/cow.c
>> > @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>> > pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>> > cow_header.backing_file);
>> >
>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) {
>> > + /* backing file currently not supported by fd: protocol */
>> > + goto fail;
>> > + }
>> > +
>> > bitmap_size = ((bs->total_sectors + 7)>> 3) + sizeof(cow_header);
>> > s->cow_sectors_offset = (bitmap_size + 511)& ~511;
>> > return 0;
>> > diff --git a/block/qcow.c b/block/qcow.c
>> > index a26c886..1e46bdb 100644
>> > --- a/block/qcow.c
>> > +++ b/block/qcow.c
>> > @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>> > if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>> > goto fail;
>> > bs->backing_file[len] = '\0';
>> > +
>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) {
>> > + /* backing file currently not supported by fd: protocol */
>> > + goto fail;
>> > + }
>> > }
>> > return 0;
>> >
>> > diff --git a/block/qcow2.c b/block/qcow2.c
>> > index 8451ded..8b9c160 100644
>> > --- a/block/qcow2.c
>> > +++ b/block/qcow2.c
>> > @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>> > goto fail;
>> > }
>> > bs->backing_file[len] = '\0';
>> > +
>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) {
>> > + ret = -ENOTSUP;
>> > + goto fail;
>> > + }
>> > }
>> > if (qcow2_read_snapshots(bs)< 0) {
>> > ret = -EINVAL;
>> > diff --git a/block/qed.c b/block/qed.c
>> > index 3970379..5028897 100644
>> > --- a/block/qed.c
>> > +++ b/block/qed.c
>> > @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>> > return ret;
>> > }
>> >
>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) {
>> > + return -ENOTSUP;
>> > + }
>> > +
>> > if (s->header.features& QED_F_BACKING_FORMAT_NO_PROBE) {
>> > pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>> > }
>> > diff --git a/block/raw-posix.c b/block/raw-posix.c
>> > index 4cd7d7a..c72de3d 100644
>> > --- a/block/raw-posix.c
>> > +++ b/block/raw-posix.c
>> > @@ -28,6 +28,7 @@
>> > #include "block_int.h"
>> > #include "module.h"
>> > #include "block/raw-posix-aio.h"
>> > +#include "monitor.h"
>> >
>> > #ifdef CONFIG_COCOA
>> > #include<paths.h>
>> > @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> > int bdrv_flags, int open_flags)
>> > {
>> > BDRVRawState *s = bs->opaque;
>> > - int fd, ret;
>> > + int fd = -1;
>> > + int ret;
>> >
>> > ret = raw_normalize_devicepath(&filename);
>> > if (ret != 0) {
>> > @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> > if (!(bdrv_flags& BDRV_O_CACHE_WB))
>> > s->open_flags |= O_DSYNC;
>> >
>> > - s->fd = -1;
>> > - fd = qemu_open(filename, s->open_flags, 0644);
>> > - if (fd< 0) {
>> > - ret = -errno;
>> > - if (ret == -EROFS)
>> > - ret = -EACCES;
>> > - return ret;
>> > + if (s->fd == -1) {
>> > + fd = qemu_open(filename, s->open_flags, 0644);
>> > + if (fd< 0) {
>> > + ret = -errno;
>> > + if (ret == -EROFS) {
>> > + ret = -EACCES;
>> > + }
>> > + return ret;
>> > + }
>> > + s->fd = fd;
>> > }
>> > - s->fd = fd;
>> > s->aligned_buf = NULL;
>> >
>> > if ((bdrv_flags& BDRV_O_NOCACHE)) {
>> > @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>> > {
>> > BDRVRawState *s = bs->opaque;
>> >
>> > + s->fd = -1;
>> > s->type = FTYPE_FILE;
>> > return raw_open_common(bs, filename, flags, 0);
>> > }
>> > @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>> > .create_options = raw_create_options,
>> > };
>> >
>> > +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
>> > +{
>> > + BDRVRawState *s = bs->opaque;
>> > + const char *fd_str;
>> > + int fd;
>> > +
>> > + /* extract the file descriptor - fail if it's not fd: */
>> > + if (!strstart(filename, "fd:",&fd_str)) {
>> > + return -EINVAL;
>> > + }
>> > +
>> > + if (!qemu_isdigit(fd_str[0])) {
>> > + /* get fd from monitor */
>> > + fd = qemu_get_fd(fd_str);
>> > + if (fd == -1) {
>> > + return -EBADF;
>> > + }
>> > + } else {
>> > + char *endptr = NULL;
>> > +
>> > + fd = strtol(fd_str,&endptr, 10);
>> > + if (*endptr || (fd == 0&& fd_str == endptr)) {
>> > + return -EBADF;
>> > + }
>> > + }
>> > +
>> > + s->fd = fd;
>> > + s->type = FTYPE_FILE;
>> > +
>> > + return raw_open_common(bs, filename, flags, 0);
>> > +}
>> > +
>> > +static BlockDriver bdrv_file_fd = {
>> > + .format_name = "file",
>> > + .protocol_name = "fd",
>> > + .instance_size = sizeof(BDRVRawState),
>> > + .bdrv_probe = NULL, /* no probe for protocols */
>> > + .bdrv_file_open = raw_open_fd,
>> > + .bdrv_read = raw_read,
>> > + .bdrv_write = raw_write,
>> > + .bdrv_close = raw_close,
>> > + .bdrv_flush = raw_flush,
>> > + .bdrv_discard = raw_discard,
>> > +
>> > + .bdrv_aio_readv = raw_aio_readv,
>> > + .bdrv_aio_writev = raw_aio_writev,
>> > + .bdrv_aio_flush = raw_aio_flush,
>> > +
>> > + .bdrv_truncate = raw_truncate,
>> > + .bdrv_getlength = raw_getlength,
>> > +
>> > + .create_options = raw_create_options,
>> > +};
>> > +
>> > /***********************************************/
>> > /* host device */
>> >
>> > @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>> > }
>> > #endif
>> >
>> > + s->fd = -1;
>> > s->type = FTYPE_FILE;
>> > #if defined(__linux__)
>> > {
>> > @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>> > BDRVRawState *s = bs->opaque;
>> > int ret;
>> >
>> > + s->fd = -1;
>> > s->type = FTYPE_FD;
>> >
>> > /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
>> > @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>> > {
>> > BDRVRawState *s = bs->opaque;
>> >
>> > + s->fd = -1;
>> > s->type = FTYPE_CD;
>> >
>> > /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
>> > @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>> > * Register all the drivers. Note that order is important, the driver
>> > * registered last will get probed first.
>> > */
>> > + bdrv_register(&bdrv_file_fd);
>> > bdrv_register(&bdrv_file);
>> > bdrv_register(&bdrv_host_device);
>> > #ifdef __linux__
>> > diff --git a/block/vmdk.c b/block/vmdk.c
>> > index 922b23d..2ea808e 100644
>> > --- a/block/vmdk.c
>> > +++ b/block/vmdk.c
>> > @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>> > return -1;
>> >
>> > pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>> > +
>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) {
>> > + /* backing file currently not supported by fd: protocol */
>> > + return -1;
>> > + }
>> > }
>> >
>> > return 0;
>> > diff --git a/block_int.h b/block_int.h
>> > index fa91337..a305ee2 100644
>> > --- a/block_int.h
>> > +++ b/block_int.h
>> > @@ -152,6 +152,7 @@ struct BlockDriverState {
>> > int encrypted; /* if true, the media is encrypted */
>> > int valid_key; /* if true, a valid encryption key has been set */
>> > int sg; /* if true, the device is a /dev/sg* */
>> > + int fd_protocol; /* if true, the fd: protocol was specified */
> bool?
>
I was following suit here, but I agree that bool would be better. Or
better, these could all be reduced to bit flags. My thought is that
I'll stick with following suit here though.
>> > /* event callback when inserting/removing */
>> > void (*change_cb)(void *opaque, int reason);
>> > void *change_opaque;
>> > diff --git a/blockdev.c b/blockdev.c
>> > index e81e0ab..a536c20 100644
>> > --- a/blockdev.c
>> > +++ b/blockdev.c
>> > @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>> >
>> > bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>> >
>> > + if (strncmp(file, "fd:", 3) == 0) {
>> > + dinfo->bdrv->fd_protocol = 1;
>> > + }
>> > +
>> > ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>> > if (ret< 0) {
>> > error_report("could not open disk image %s: %s",
>> > @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>> > goto out;
>> > }
>> >
>> > + if (bdrv_is_fd_protocol(bs)) {
>> > + qerror_report(QERR_UNSUPPORTED);
>> > + ret = -1;
>> > + goto out;
>> > + }
>> > +
>> > pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>> >
>> > old_drv = bs->drv;
>> > diff --git a/monitor.c b/monitor.c
>> > index 59a3e76..ea60be2 100644
>> > --- a/monitor.c
>> > +++ b/monitor.c
>> > @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
>> > return -1;
>> > }
>> >
>> > +int qemu_get_fd(const char *fdname)
>> > +{
>> > + return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
>> > +}
>> > +
>> > static const mon_cmd_t mon_cmds[] = {
>> > #include "hmp-commands.h"
>> > { NULL, NULL, },
>> > diff --git a/monitor.h b/monitor.h
>> > index 4f2d328..de5b987 100644
>> > --- a/monitor.h
>> > +++ b/monitor.h
>> > @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>> > void *opaque);
>> >
>> > int monitor_get_fd(Monitor *mon, const char *fdname);
>> > +int qemu_get_fd(const char *fdname);
>> >
>> > void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>> > GCC_FMT_ATTR(2, 0);
>> > diff --git a/qemu-options.hx b/qemu-options.hx
>> > index 1d5ad8b..f9b66f4 100644
>> > --- a/qemu-options.hx
>> > +++ b/qemu-options.hx
>> > @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}).
>> > ETEXI
>> >
>> > DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>> > - "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>> > + "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>> > " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>> > " [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>> > " [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>> > @@ -129,10 +129,12 @@ STEXI
>> > Define a new drive. Valid options are:
>> >
>> > @table @option
>> > - at item file=@var{file}
>> > + at item file=[fd:]@var{file}
>> > This option defines which disk image (@pxref{disk_images}) to use with
>> > this drive. If the filename contains comma, you must double it
>> > -(for instance, "file=my,,file" to use file "my,file").
>> > +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
>> > +specifies the file descriptor of an already open disk image.
>> > + at option{format=}@var{format} is required by @option{fd:}@var{file}.
>> > @item if=@var{interface}
>> > This option defines on which type on interface the drive is connected.
>> > Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
>> > diff --git a/qemu-tool.c b/qemu-tool.c
>> > index 41e5c41..8fe6b8c 100644
>> > --- a/qemu-tool.c
>> > +++ b/qemu-tool.c
>> > @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>> > {
>> > return 0;
>> > }
>> > +
>> > +int qemu_get_fd(const char *fdname)
>> > +{
>> > + return -1;
>> > +}
>> > --
>> > 1.7.1
>> >
>> >
>> >
Regards,
Corey Bryant
More information about the libvir-list
mailing list