[dm-devel] [RFC 0/3] Add support of iopoll for dm device

Jeffle Xu jefflexu at linux.alibaba.com
Tue Oct 20 06:54:17 UTC 2020


This patch set adds support of iopoll for dm device.

This is only an RFC patch. I'm really looking forward getting your
feedbacks on if you're interested in supporting iopoll for dm device,
or if there's a better design to implement that.

Thanks.


[Purpose]
IO polling is an important mode of io_uring. Currently only mq devices
support iopoll. As for dm devices, only dm-multipath is request-base,
while others are all bio-based and have no support for iopoll.
Supporting iopoll for dm devices can be of great meaning when the
device seen by application is dm device such as dm-linear/dm-stripe,
in which case iopoll is not usable for io_uring.


[Design Notes]

cookie
------
Let's start from cookie. Cookie is one important concept in iopoll. It
is used to identify one specific request in one specific hardware queue.
The concept of cookie is initially designed as a per-bio concept, and
thus it doesn't work well when bio-split involved. When bio is split,
the returned cookie is indeed the cookie of one of the split bio, and
the following polling on this returned cookie can only guarantee the
completion of this specific split bio, while the other split bios may
be still uncompleted. Bio-split is also resolved for dm device, though
in another form, in which case the original bio submitted to the dm
device may be split into multiple bios submitted to the underlying
devices.

In previous discussion, Lei Ming has suggested that iopoll should be
disabled for bio-split. This works for the normal bio-split (done in
blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
usable for dm devices if this also applies for dm device.

So come back to the design of the cookie for dm devices. At the very
beginning I want to refactor the design of cookie, from 'unsigned int'
type to the pointer type for dm device, so that cookie can point to
something, e.g. a list containing all cookies of one original bio,
something like this:

struct dm_io { // the returned cookie points to dm_io
	...
	struct list_head cookies; 
};

In this design, we can fetch all cookies of one original bio, but the
implementation can be difficult and buggy. For example, the
'struct dm_io' structure may be already freed when the returned cookie
is used in blk_poll(). Then what if maintain a refcount in struct dm_io
so that 'struct dm_io' structure can not be freed until blk_poll()
called? Then the 'struct dm_io' structure will never be freed if the
IO polling is not used at all.

So finally I gived up refactoring the design of cookie and maintain all
cookies of one original bio. The initial design of cookie gets retained.
The returned cookie is still the cookie of one of the split bio, and thus
it is not used at all when polling dm devices. The polling of dm device,
is indeed iterating and polling on all hardware queues (in polling mode)
of all underlying target devices.

This design is simple and easy to implement. But I'm not sure if the
performance can be an issue if one dm device mapped to too many target
devices or the dm stack depth grows.

REQ_NOWAIT
----------
The polling bio will set REQ_NOWAIT in bio->bi_flags, and the device
need to be capable of handling REQ_NOWAIT. Commit 021a24460dc2
("block: add QUEUE_FLAG_NOWAIT") and commit 6abc49468eea ("dm: add
support for REQ_NOWAIT and enable it for linear target") add this
support for dm device, and currently only dm-linear supports that.

hybrid polling
--------------
When execute hybrid polling, cookie is used to fetch the request,
and check if the request has completed or not. In the design for
dm device described above, the returned cookie is still the cookie
of one of the split bios, and thus we have no way checking if all
the split bios have completed or not. Thus in the current
implementation, hybrid polling is not supported for dm device.


[Performance]
I test 4k-randread on a dm-linear mapped to only one nvme device.

SSD: INTEL SSDPEDME800G4
dm-linear:dmsetup create testdev --table "0 209715200 linear /dev/nvme0n1 0"

fio configs:
```
ioengine=io_uring
iodepth=128
numjobs=1
thread
rw=randread
direct=1
registerfiles=1
#hipri=1 with iopoll enabled, hipri=0 with iopoll disabled
bs=4k
size=100G
runtime=60
time_based
group_reporting
randrepeat=0

[device]
filename=/dev/mapper/testdev
```

The test result shows that there's ~40% performance growth when iopoll
enabled.

test | iops | bw | avg lat
---- | ---- | ---- | ----
iopoll-disabled | 244k | 953MiB/s  | 524us
iopoll-enabled  | 345k | 1349MiB/s | 370us

[TODO]
The store method of "io_poll"/"io_poll_delay" has not been adapted
for dm device.


Jeffle Xu (3):
  block/mq: add iterator for polling hw queues
  block: add back ->poll_fn in request queue
  dm: add support for IO polling

 block/blk-mq.c         | 30 ++++++++++++++++++++++++------
 drivers/md/dm-core.h   |  1 +
 drivers/md/dm-table.c  | 30 ++++++++++++++++++++++++++++++
 drivers/md/dm.c        | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  6 ++++++
 include/linux/blkdev.h |  3 +++
 6 files changed, 104 insertions(+), 6 deletions(-)

-- 
2.27.0




More information about the dm-devel mailing list