[dm-devel] [RFC 0/3] Add support of iopoll for dm device

JeffleXu jefflexu at linux.alibaba.com
Thu Oct 22 05:28:01 UTC 2020


On 10/22/20 4:39 AM, Mike Snitzer wrote:

> What you've _done_ could serve as a stop-gap but I'd really rather we
> get it properly designed from the start.

Indeed I totally agree with you that the design should be done nicely at 
the very beginning. And this

is indeed the purpose of this RFC patch.


>> This patch set adds support of iopoll for dm device.
>>
>> This is only an RFC patch. I'm really looking forward getting your
>> feedbacks on if you're interested in supporting iopoll for dm device,
>> or if there's a better design to implement that.
>>
>> Thanks.
>>
>>
>> [Purpose]
>> IO polling is an important mode of io_uring. Currently only mq devices
>> support iopoll. As for dm devices, only dm-multipath is request-base,
>> while others are all bio-based and have no support for iopoll.
>> Supporting iopoll for dm devices can be of great meaning when the
>> device seen by application is dm device such as dm-linear/dm-stripe,
>> in which case iopoll is not usable for io_uring.
> I appreciate you taking the initiative on this; polling support is on my
> TODO so your work serves as a nice reminder to pursue this more
> urgently.

It's a good news that iopoll for DM is meaningful.


> but we cannot avoid properly mapping a cookie to each
> split bio.  Otherwise you resort to inefficiently polling everything.

Yes. At the very beginning  I tried to build the mapping a cookie to 
each bio, but I failed with several

blocking design issues. By the way maybe we could clarify these design 
issues here, if you'd like.


>
> Seems your attempt to have the cookie point to a dm_io object was likely
> too coarse-grained (when bios are split they get their own dm_io on
> recursive re-entry to DM core from ->submit_bio); but isn't having a
> list of cookies still too imprecise for polling purposes?  You could
> easily have a list that translates to many blk-mq queues.  Possibly
> better than your current approach of polling everything -- but not
> much.

To make the discussion more specific, assume that dm0 is mapped to 
dm1/2/3, while dm1 mapped to

nvme1, dm2 mapped to dm2, etc..

                     dm0

dm1             dm2            dm3

nvme1        nvme2        nvme3


Then the returned cookie of dm0 could be pointer pointing to dm_io 
object of dm0.

struct dm_io {  // the returned cookie points to dm_io object
	...
+	struct list_head cookies;
};

struct dm_target_io {
	...
	/*
	 * The corresponding polling hw queue if submitted to mq device (such as nvme1/2/3),
	 * NULL if submitted to dm device (such as dm1/2/3)
	 */
+	struct blk_mq_hw_ctx *hctx;
+	struct list_head      node;  // add to @cookies list
};

The @cookies list of dm_io object could maintain all dm_target_io objects
of all **none-dm** devices, that is, all hw queues that we should poll on.


returned  ->  @cookies list	
cookie	      of dm_io object of dm0
		   |
		   +--> dm_target_io	 ->  dm_target_io     ->  dm_target_io
			object of nvme1      object of nvme2	  object of nvme3

When polling returned cookie of dm0, actually we're polling @cookies 
list. Once one of the dm_target_io

completed (e.g. nvme2), it should be removed from the @cookies list., 
and thus we should only focus on

hw queues that have not completed.



>> [Design Notes]
>>
>> cookie
>> ------
>> Let's start from cookie. Cookie is one important concept in iopoll. It
>> is used to identify one specific request in one specific hardware queue.
>> The concept of cookie is initially designed as a per-bio concept, and
>> thus it doesn't work well when bio-split involved. When bio is split,
>> the returned cookie is indeed the cookie of one of the split bio, and
>> the following polling on this returned cookie can only guarantee the
>> completion of this specific split bio, while the other split bios may
>> be still uncompleted. Bio-split is also resolved for dm device, though
>> in another form, in which case the original bio submitted to the dm
>> device may be split into multiple bios submitted to the underlying
>> devices.
>>
>> In previous discussion, Lei Ming has suggested that iopoll should be
>> disabled for bio-split. This works for the normal bio-split (done in
>> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
>> usable for dm devices if this also applies for dm device.
>>
>> So come back to the design of the cookie for dm devices. At the very
>> beginning I want to refactor the design of cookie, from 'unsigned int'
>> type to the pointer type for dm device, so that cookie can point to
>> something, e.g. a list containing all cookies of one original bio,
>> something like this:
>>
>> struct dm_io { // the returned cookie points to dm_io
>> 	...
>> 	struct list_head cookies;
>> };
>>
>> In this design, we can fetch all cookies of one original bio, but the
>> implementation can be difficult and buggy. For example, the
>> 'struct dm_io' structure may be already freed when the returned cookie
>> is used in blk_poll(). Then what if maintain a refcount in struct dm_io
>> so that 'struct dm_io' structure can not be freed until blk_poll()
>> called? Then the 'struct dm_io' structure will never be freed if the
>> IO polling is not used at all.
> I'd have to look closer at the race in the code you wrote (though you
> didn't share it);

I worried that dm_target_io/dm_io objects could have been freed 
before/when we are polling on them,

and thus could cause use-after-free when accessing @cookies list in 
dm_target_io. It could happen

when there are multiple polling instance. io_uring has implemented 
per-instance polling thread. If

there are two bios submitted to dm0, please consider the following race 
sequence:


1. race 1: dm_target_io/dm_io objects could have been freed ****when**** 
we are polling on them

```*
*

*thread1 polling on bio1                    thread2 polling on bio2*

     fetch dm_io object by the cookie

reaps completions in nvme 1/2/3,

completes bio1 and frees dm_io object of bio1 by the way

     use-after-free when accessing dm_io object

```

Maybe we should get a refcount of dm_io of bio1 when start polling, but 
I'm not sure if the design is

elegant or not.



2. race 2: dm_target_io/dm_io objects could have been freed 
****before**** we are polling on them

```

*thread1 polling on bio1                    thread2 polling on bio2*

reaps completions in nvme 1/2/3,

clone_endio

     dec_pending

         free_io(md, io);  // free dm_io object

     __blkdev_direct_IO

         READ_ONCE(dio->waiter)  // dio->waiter is still none-NULL

         bio_endio(io->orig_bio) //call bi_end_io() of original bio, 
that is blkdev_bio_end_io

             WRITE_ONCE(dio->waiter, NULL);

         blk_poll

             fetch dm_io object by the cookie  // use-after-free!

```


Thanks.

Jeffle.




More information about the dm-devel mailing list