[dm-devel] [PATCH] dm-mpath: Work with blk multi-queue drivers

Hannes Reinecke hare at suse.de
Wed Sep 24 09:02:30 UTC 2014


On 09/23/2014 07:03 PM, Keith Busch wrote:
> I'm working with multipathing nvme devices using the blk-mq version of
> the nvme driver, but dm-mpath only works with the older request based
> drivers. This patch proposes to enable dm-mpath to work with both types
> of request queues and is succesfull with my dual ported nvme drives.
> 
> I think there may still be fix ups to do around submission side error
> handling, but I think it's at a decent stopping point to solicit feedback
> before I pursue taking it further. I hear there may be some resistance
> to add blk-mq support to dm-mpath anyway, but it seems too easy to add
> support to not at least try. :)
> 
> To work, this has dm allocate requests from the request_queue for
> the device-mapper type rather than allocate one on its own, so the
> cloned request is properly allocated and initialized for the device's
> request_queue. The original request's 'special' now points to the
> dm_rq_target_io rather than at the cloned request because the clone
> is allocated later by the block layer rather than by dm, and then all
> the other back referencing to the original seems to work out. The block
> layer then inserts the cloned reqest using the appropriate function for
> the request_queue type rather than just calling q->request_fn().
> 
> Compile tested on 3.17-rc6; runtime teseted on Matias Bjorling's
> linux-collab nvmemq_review using 3.16.
> 
The resistance wasn't so much for enabling multipath for block-mq,
it was _how_ multipath should be modelled on top of block-mq.

With a simple enabling we actually have two layers of I/O
scheduling; once in multipathing to select between the individual
queues, and once in block-mq to select the correct hardware context.
So we end up with a four-tiered hierarchy:

m priority groups - n pg_paths/request_queues -> o cpus -> p hctx

Giving us a full m * n * p (hctx are tagged per cpu) variety where
the I/Os might be send.

Performance wise it might be beneficial to tag a hardware context
to a given path, effectively removing I/O scheduling from
block-mq. But this would require some substantial update to the
current blk-mq design (blocked paths, dynamic reconfiguration).

However, this looks like a good starting point.
I'll give it a go and see how far I'll be getting with it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)




More information about the dm-devel mailing list