[dm-devel] dm: dm_blk_ioctl(): implement failover for SG_IO on dm-multipath

Hannes Reinecke hare at suse.de
Thu Apr 29 06:02:12 UTC 2021


On 4/28/21 9:54 PM, Mike Snitzer wrote:
> On Thu, Apr 22 2021 at  4:21P -0400,
> mwilck at suse.com <mwilck at suse.com> wrote:
> 
>> From: Martin Wilck <mwilck at suse.com>
>>
>> In virtual deployments, SCSI passthrough over dm-multipath devices is a
>> common setup. The qemu "pr-helper" was specifically invented for it. I
>> believe that this is the most important real-world scenario for sending
>> SG_IO ioctls to device-mapper devices.
>>
>> In this configuration, guests send SCSI IO to the hypervisor in the form of
>> SG_IO ioctls issued by qemu. But on the device-mapper level, these SCSI
>> ioctls aren't treated like regular IO. Until commit 2361ae595352 ("dm mpath:
>> switch paths in dm_blk_ioctl() code path"), no path switching was done at
>> all. Worse though, if an SG_IO call fails because of a path error,
>> dm-multipath doesn't retry the IO on a another path; rather, the failure is
>> passed back to the guest, and paths are not marked as faulty.  This is in
>> stark contrast with regular block IO of guests on dm-multipath devices, and
>> certainly comes as a surprise to users who switch to SCSI passthrough in
>> qemu. In general, users of dm-multipath devices would probably expect failover
>> to work at least in a basic way.
>>
>> This patch fixes this by taking a special code path for SG_IO on request-
>> based device mapper targets. Rather then just choosing a single path,
>> sending the IO to it, and failing to the caller if the IO on the path
>> failed, it retries the same IO on another path for certain error codes,
>> using the same logic as blk_path_error() to determine if a retry would make
>> sense for the given error code. Moreover, it sends a message to the
>> multipath target to mark the path as failed.
>>
>> I am aware that this looks sort of hack-ish. I'm submitting it here as an
>> RFC, asking for guidance how to reach a clean implementation that would be
>> acceptable in mainline. I believe that it fixes an important problem.
>> Actually, it fixes the qemu SCSI-passthrough use case on dm-multipath,
>> which is non-functional without it.
>>
>> One problem remains open: if all paths in a multipath map are failed,
>> normal multipath IO may switch to queueing mode (depending on
>> configuration). This isn't possible for SG_IO, as SG_IO requests can't
>> easily be queued like regular block I/O. Thus in the "no path" case, the
>> guest will still see an error. If anybody can conceive of a solution for
>> that, I'd be interested.
>>
>> Signed-off-by: Martin Wilck <mwilck at suse.com>
>> ---
>>   block/scsi_ioctl.c     |   5 +-
>>   drivers/md/dm.c        | 134 ++++++++++++++++++++++++++++++++++++++++-
>>   include/linux/blkdev.h |   2 +
>>   3 files changed, 137 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
>> index 6599bac0a78c..bcc60552f7b1 100644
>> --- a/block/scsi_ioctl.c
>> +++ b/block/scsi_ioctl.c
>> @@ -279,8 +279,8 @@ static int blk_complete_sghdr_rq(struct request *rq, struct sg_io_hdr *hdr,
>>   	return ret;
>>   }
>>   
>> -static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>> -		struct sg_io_hdr *hdr, fmode_t mode)
>> +int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>> +	  struct sg_io_hdr *hdr, fmode_t mode)
>>   {
>>   	unsigned long start_time;
>>   	ssize_t ret = 0;
>> @@ -369,6 +369,7 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
>>   	blk_put_request(rq);
>>   	return ret;
>>   }
>> +EXPORT_SYMBOL_GPL(sg_io);
>>   
>>   /**
>>    * sg_scsi_ioctl  --  handle deprecated SCSI_IOCTL_SEND_COMMAND ioctl
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 50b693d776d6..443ac5e5406c 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -29,6 +29,8 @@
>>   #include <linux/part_stat.h>
>>   #include <linux/blk-crypto.h>
>>   #include <linux/keyslot-manager.h>
>> +#include <scsi/sg.h>
>> +#include <scsi/scsi.h>
>>   
>>   #define DM_MSG_PREFIX "core"
>>   
> 
> Ngh... not loving this at all.  But here is a patch (that I didn't
> compile test) that should be folded in, still have to think about how
> this could be made cleaner in general though.
> 
> Also, dm_sg_io_ioctl should probably be in dm-rq.c (maybe even dm-mpath.c!?)
> 
I fully share your sentiments; this just feels so awkward, having to 
redo the logic in scsi_cmd_ioctl().

My original idea to that was to _use_ scsi_cmd_ioctl() directly from 
dm_blk_ioctl:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 50b693d776d6..007ff981f888 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -567,6 +567,9 @@ static int dm_blk_ioctl(struct block_device *bdev, 
fmode_t mode,
         struct mapped_device *md = bdev->bd_disk->private_data;
         int r, srcu_idx;

+       if (cmd == SG_IO)
+               return scsi_cmd_blk_ioctl(bdev, mode, cmd, arg);
+
         r = dm_prepare_ioctl(md, &srcu_idx, &bdev);
         if (r < 0)
                 goto out;


But that crashes horribly as sg_io() expects the request pdu to be of 
type 'scsi_request', which it definitely isn't here.

We _could_ prepend struct dm_rq_target_io with a struct scsi_request, 
seeing that basically _all_ dm-rq installations are on SCSI devices:

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 13b4385f4d5a..aa7856621f83 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -16,6 +16,7 @@
   * One of these is allocated per request.
   */
  struct dm_rq_target_io {
+       struct scsi_request scsi_req;
         struct mapped_device *md;
         struct dm_target *ti;
         struct request *orig, *clone;

Then the above should work, but we would increase the size of 
dm_rq_target_io by quite a lot. Although, given the current size of it, 
question is whether it matters...

Hmm?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer





More information about the dm-devel mailing list