[dm-devel] [PATCH v6 2/2] dm: support bio polling

Ming Lei ming.lei at redhat.com
Thu Mar 10 04:00:01 UTC 2022


On Wed, Mar 09, 2022 at 09:11:26AM -0700, Jens Axboe wrote:
> On 3/8/22 6:13 PM, Ming Lei wrote:
> > On Tue, Mar 08, 2022 at 06:02:50PM -0700, Jens Axboe wrote:
> >> On 3/7/22 11:53 AM, Mike Snitzer wrote:
> >>> From: Ming Lei <ming.lei at redhat.com>
> >>>
> >>> Support bio(REQ_POLLED) polling in the following approach:
> >>>
> >>> 1) only support io polling on normal READ/WRITE, and other abnormal IOs
> >>> still fallback to IRQ mode, so the target io is exactly inside the dm
> >>> io.
> >>>
> >>> 2) hold one refcnt on io->io_count after submitting this dm bio with
> >>> REQ_POLLED
> >>>
> >>> 3) support dm native bio splitting, any dm io instance associated with
> >>> current bio will be added into one list which head is bio->bi_private
> >>> which will be recovered before ending this bio
> >>>
> >>> 4) implement .poll_bio() callback, call bio_poll() on the single target
> >>> bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
> >>> dm_io_dec_pending() after the target io is done in .poll_bio()
> >>>
> >>> 5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
> >>> which is based on Jeffle's previous patch.
> >>
> >> It's not the prettiest thing in the world with the overlay on bi_private,
> >> but at least it's nicely documented now.
> >>
> >> I would encourage you to actually test this on fast storage, should make
> >> a nice difference. I can run this on a gen2 optane, it's 10x the IOPS
> >> of what it was tested on and should help better highlight where it
> >> makes a difference.
> >>
> >> If either of you would like that, then send me a fool proof recipe for
> >> what should be setup so I have a poll capable dm device.
> > 
> > Follows steps for setup dm stripe over two nvmes, then run io_uring on
> > the dm stripe dev.
> 
> Thanks! Much easier when I don't have to figure it out... Setup:

Jens, thanks for running the test!

> 
> CPU: 12900K
> Drives: 2x P5800X gen2 optane (~5M IOPS each at 512b)
> 
> Baseline kernel:
> 
> sudo taskset -c 10 t/io_uring -d128 -b512 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 /dev/dm-0
> Added file /dev/dm-0 (submitter 0)
> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> submitter=0, tid=1004
> IOPS=2794K, BW=1364MiB/s, IOS/call=31/30, inflight=(124)
> IOPS=2793K, BW=1363MiB/s, IOS/call=31/31, inflight=(62)
> IOPS=2789K, BW=1362MiB/s, IOS/call=31/30, inflight=(124)
> IOPS=2779K, BW=1357MiB/s, IOS/call=31/31, inflight=(124)
> IOPS=2780K, BW=1357MiB/s, IOS/call=31/31, inflight=(62)
> IOPS=2779K, BW=1357MiB/s, IOS/call=31/31, inflight=(62)
> ^CExiting on signal
> Maximum IOPS=2794K
> 
> generating about 500K ints/sec,

~5.6 IOs completed in each int averagely, looks irq coalesce is working.

> and using 4k blocks:
> 
> sudo taskset -c 10 t/io_uring -d128 -b4096 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 /dev/dm-0
> Added file /dev/dm-0 (submitter 0)
> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> submitter=0, tid=967
> IOPS=1683K, BW=6575MiB/s, IOS/call=24/24, inflight=(93)
> IOPS=1685K, BW=6584MiB/s, IOS/call=24/24, inflight=(124)
> IOPS=1686K, BW=6588MiB/s, IOS/call=24/24, inflight=(124)
> IOPS=1684K, BW=6581MiB/s, IOS/call=24/24, inflight=(93)
> IOPS=1686K, BW=6589MiB/s, IOS/call=24/24, inflight=(124)
> IOPS=1687K, BW=6593MiB/s, IOS/call=24/24, inflight=(128)
> IOPS=1687K, BW=6590MiB/s, IOS/call=24/24, inflight=(93)
> ^CExiting on signal
> Maximum IOPS=1687K
> 
> which ends up being bw limited for me, because the devices aren't linked
> gen4. That's about 1.4M ints/sec.

Looks one interrupt just completes one IO with 4k bs, no irq coalesce
any more. The interrupts may not run in CPU 10 I guess.

> 
> With the patched kernel, same test:
> 
> sudo taskset -c 10 t/io_uring -d128 -b512 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 /dev/dm-0
> Added file /dev/dm-0 (submitter 0)
> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> submitter=0, tid=989
> IOPS=4151K, BW=2026MiB/s, IOS/call=16/15, inflight=(128)
> IOPS=4159K, BW=2031MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=4193K, BW=2047MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=4191K, BW=2046MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=4202K, BW=2052MiB/s, IOS/call=15/15, inflight=(128)
> ^CExiting on signal
> Maximum IOPS=4202K
> 
> with basically zero interrupts, and 4k:
> 
> sudo taskset -c 10 t/io_uring -d128 -b4096 -s31 -c16 -p1 -F1 -B1 -n1 -R1 -X1 /dev/dm-0
> Added file /dev/dm-0 (submitter 0)
> polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> submitter=0, tid=1015
> IOPS=1706K, BW=6666MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
> IOPS=1704K, BW=6658MiB/s, IOS/call=15/15, inflight=(128)
> ^CExiting on signal
> Maximum IOPS=1706K

Looks improvement on 4k is small, is it caused by pcie bw limit?
What is the IOPS when running the same t/io_uring on single optane
directly?



Thanks, 
Ming


More information about the dm-devel mailing list