[dm-devel] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath

Mike Snitzer snitzer at redhat.com
Fri Apr 30 20:15:43 UTC 2021


On Thu, Apr 29 2021 at  2:02am -0400,
Hannes Reinecke <hare at suse.de> wrote:

> On 4/28/21 9:54 PM, Mike Snitzer wrote:
> >On Thu, Apr 22 2021 at  4:21P -0400,
> >mwilck at suse.com <mwilck at suse.com> wrote:
> >
> >>From: Martin Wilck <mwilck at suse.com>
> >>
> >>In virtual deployments, SCSI passthrough over dm-multipath devices is a
> >>common setup. The qemu "pr-helper" was specifically invented for it. I
> >>believe that this is the most important real-world scenario for sending
> >>SG_IO ioctls to device-mapper devices.
> >>
> >>In this configuration, guests send SCSI IO to the hypervisor in the form of
> >>SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
> >>ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
> >>switch paths in dm_blk_ioctl() code path"), no path switching was done at
> >>all. Worse though, if an SG_IO call fails because of a path error,
> >>dm-multipath doesn't retry the IO on a another path; rather, the failure is
> >>passed back to the guest, and paths are not marked as faulty.  This is in
> >>stark contrast with regular block IO of guests on dm-multipath devices, and
> >>certainly comes as a surprise to users who switch to SCSI passthrough in
> >>qemu. In general, users of dm-multipath devices would probably expect failover
> >>to work at least in a basic way.
> >>
> >>This patch fixes this by taking a special code path for SG_IO on request-
> >>based device mapper targets. Rather then just choosing a single path,
> >>sending the IO to it, and failing to the caller if the IO on the path
> >>failed, it retries the same IO on another path for certain error codes,
> >>using the same logic as blk_path_error() to determine if a retry would make
> >>sense for the given error code. Moreover, it sends a message to the
> >>multipath target to mark the path as failed.
> >>
> >>I am aware that this looks sort of hack-ish. I'm submitting it here as an
> >>RFC, asking for guidance how to reach a clean implementation that would be
> >>acceptable in mainline. I believe that it fixes an important problem.
> >>Actually, it fixes the qemu SCSI-passthrough use case on dm-multipath,
> >>which is non-functional without it.
> >>
> >>One problem remains open: if all paths in a multipath map are failed,
> >>normal multipath IO may switch to queueing mode (depending on
> >>configuration). This isn't possible for SG_IO, as SG_IO requests can't
> >>easily be queued like regular block I/O. Thus in the "no path" case, the
> >>guest will still see an error. If anybody can conceive of a solution for
> >>that, I'd be interested.
> >>
> >>Signed-off-by: Martin Wilck <mwilck at suse.com>
> >>---
> >>  block/scsi_ioctl.c     |   5 +-
> >>  drivers/md/dm.c        | 134 ++++++++++++++++++++++++++++++++++++++++-
> >>  include/linux/blkdev.h |   2 +
> >>  3 files changed, 137 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> >>index 6599bac0a78c..bcc60552f7b1 100644
> >>--- a/block/scsi_ioctl.c
> >>+++ b/block/scsi_ioctl.c
> >>@@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
> >>  	return ret;
> >>  }
> >>-static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>-		struct sg_io_hdr *hdr, fmode_t mode)
> >>+int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>+	  struct sg_io_hdr *hdr, fmode_t mode)
> >>  {
> >>  	unsigned long start_time;
> >>  	ssize_t ret = 0;
> >>@@ -369,6 +369,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
> >>  	blk_put_request(rq);
> >>  	return ret;
> >>  }
> >>+EXPORT_SYMBOL_GPL(sg_io);
> >>  /**
> >>   * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
> >>diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >>index 50b693d776d6..443ac5e5406c 100644
> >>--- a/drivers/md/dm.c
> >>+++ b/drivers/md/dm.c
> >>@@ -29,6 +29,8 @@
> >>  #include <linux/part_stat.h>
> >>  #include <linux/blk-crypto.h>
> >>  #include <linux/keyslot-manager.h>
> >>+#include <scsi/sg.h>
> >>+#include <scsi/scsi.h>
> >>  #define DM_MSG_PREFIX "core"
> >
> >Ngh... not loving this at all.  But here is a patch (that I didn't
> >compile test) that should be folded in, still have to think about how
> >this could be made cleaner in general though.
> >
> >Also, dm_sg_io_ioctl should probably be in dm-rq.c (maybe even dm-mpath.c!?)
> >
> I fully share your sentiments; this just feels so awkward, having to
> redo the logic in scsi_cmd_ioctl().
> 
> My original idea to that was to _use_ scsi_cmd_ioctl() directly from
> dm_blk_ioctl:
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 50b693d776d6..007ff981f888 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -567,6 +567,9 @@ static int dm_blk_ioctl(struct block_device
> *bdev, fmode_t mode,
>         struct mapped_device *md = bdev->bd_disk->private_data;
>         int r, srcu_idx;
> 
> +       if (cmd == SG_IO)
> +               return scsi_cmd_blk_ioctl(bdev, mode, cmd, arg);
> +
>         r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
>         if (r < 0)
>                 goto out;
> 
> 
> But that crashes horribly as sg_io() expects the request pdu to be
> of type 'scsi_request', which it definitely isn't here.
> 
> We _could_ prepend struct dm_rq_target_io with a struct
> scsi_request, seeing that basically _all_ dm-rq installations are on
> SCSI devices:

If calling sg_io() is an issue then how does Martin's latest patchset,
that exports and calls sg_io direct from DM, work!?

And _no_ I have no interest in embedding struct scsi_request in
dm_rq_target_io.

Mike

> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 13b4385f4d5a..aa7856621f83 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -16,6 +16,7 @@
>   * One of these is allocated per request.
>   */
>  struct dm_rq_target_io {
> +       struct scsi_request scsi_req;
>         struct mapped_device *md;
>         struct dm_target *ti;
>         struct request *orig, *clone;
> 
> Then the above should work, but we would increase the size of
> dm_rq_target_io by quite a lot. Although, given the current size of
> it, question is whether it matters...
> 
> Hmm?
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare at suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
> 




More information about the dm-devel mailing list