[dm-devel] [PATCH 0/4] Fix order when split bio and send remaining back to itself
Danny Shih
dannyshih at synology.com
Thu Dec 31 08:28:55 UTC 2020
Mike Snitzer writes:
>> submit_bio_noacct_add_head() in block device layer when we want to
>> split bio and send remaining back to itself.
> Ordering aside, you cannot split more than once. So your proposed fix
> to insert at head isn't valid because you're still implicitly allocating
> more than one bio from the bioset which could cause deadlock in a low
> memory situation.
>
> I had to deal with a comparable issue with DM core not too long ago, see
> this commit:
>
> commit ee1dfad5325ff1cfb2239e564cd411b3bfe8667a
> Author: Mike Snitzer <snitzer at redhat.com>
> Date: Mon Sep 14 13:04:19 2020 -0400
>
> dm: fix bio splitting and its bio completion order for regular IO
>
> dm_queue_split() is removed because __split_and_process_bio() _must_
> handle splitting bios to ensure proper bio submission and completion
> ordering as a bio is split.
>
> Otherwise, multiple recursive calls to ->submit_bio will cause multiple
> split bios to be allocated from the same ->bio_split mempool at the same
> time. This would result in deadlock in low memory conditions because no
> progress could be made (only one bio is available in ->bio_split
> mempool).
>
> This fix has been verified to still fix the loss of performance, due
> to excess splitting, that commit 120c9257f5f1 provided.
>
> Fixes: 120c9257f5f1 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
> Cc: stable at vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Mike Snitzer <snitzer at redhat.com>
>
> Basically you cannot split the same bio more than once without
> recursing. Your elaborate documentation shows things going wrong quite
> early in step 3. That additional split and recursing back to MD
> shouldn't happen before the first bio split completes.
>
> Seems the proper fix is to disallow max_sectors_kb to be imposed, via
> blk_queue_split(), if MD has further splitting constraints, via
> chunk_sectors, that negate max_sectors_kb anyway.
>
> Mike
Hi Mike,
I think you're right that a driver should not split the same bio more
than once without recursing when using the same mempool.
If a driver only split bio once, the out-of-order issue no longer exists.
(Therefore, this problem won't occur on DM device.)
But the MD devices are using their private bioset (mddev->bio_set
or conf->bio_split) for splitting by themselves that are not the same
bioset used in blk_queue_split() (i.e. q->bio_split). The deadlock
you have mentioned might not happen to them.
I think there are two solutions:
1. In case MD devices want to change to use q->bio_split someday
without this out-of-order issue, make them do split once would be
a solution.
2. If MD devices should split the bio twice, so we can separately handle
limits in blk_queue_split() and each raid level's (raid0, raid5,
raid1, ...).
I will try to find another solution in this case.
My proposal is not suitable after I reconsider the problem:
If a bio is split into A part and B part.
+------|------+
| A | B |
+------|------+
I think a driver should make sure A part is always handled before B
part.
Inserting bio at head of current->bio_list and submitting bio in the
same
time while handling A part could make bios generated from A part be
handled before B part. This broke the order of those bios that generated
form A part.
(Maybe I should find a way to make B part at the head of
bio_list_on_stack[1]
while submitting it...)
Thanks for your comments.
I will try to figure out a better way to fix it in the next version.
Best regards,
Danny Shih
More information about the dm-devel
mailing list