[dm-devel] [PATCH 0/4] Fix order when split bio and send remaining back to itself

Danny Shih dannyshih at synology.com
Thu Dec 31 08:28:55 UTC 2020


Mike Snitzer writes:
>> submit_bio_noacct_add_head() in block device layer when we want to
>> split bio and send remaining back to itself.
> Ordering aside, you cannot split more than once.  So your proposed fix
> to insert at head isn't valid because you're still implicitly allocating
> more than one bio from the bioset which could cause deadlock in a low
> memory situation.
>
> I had to deal with a comparable issue with DM core not too long ago, see
> this commit:
>
> commit ee1dfad5325ff1cfb2239e564cd411b3bfe8667a
> Author: Mike Snitzer <snitzer at redhat.com>
> Date:   Mon Sep 14 13:04:19 2020 -0400
>
>      dm: fix bio splitting and its bio completion order for regular IO
>
>      dm_queue_split() is removed because __split_and_process_bio() _must_
>      handle splitting bios to ensure proper bio submission and completion
>      ordering as a bio is split.
>
>      Otherwise, multiple recursive calls to ->submit_bio will cause multiple
>      split bios to be allocated from the same ->bio_split mempool at the same
>      time. This would result in deadlock in low memory conditions because no
>      progress could be made (only one bio is available in ->bio_split
>      mempool).
>
>      This fix has been verified to still fix the loss of performance, due
>      to excess splitting, that commit 120c9257f5f1 provided.
>
>      Fixes: 120c9257f5f1 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
>      Cc: stable at vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
>      Reported-by: Ming Lei <ming.lei at redhat.com>
>      Signed-off-by: Mike Snitzer <snitzer at redhat.com>
>
> Basically you cannot split the same bio more than once without
> recursing.  Your elaborate documentation shows things going wrong quite
> early in step 3.  That additional split and recursing back to MD
> shouldn't happen before the first bio split completes.
>
> Seems the proper fix is to disallow max_sectors_kb to be imposed, via
> blk_queue_split(), if MD has further splitting constraints, via
> chunk_sectors, that negate max_sectors_kb anyway.
>
> Mike


Hi Mike,

I think you're right that a driver should not split the same bio more
than once without recursing when using the same mempool.

If a driver only split bio once, the out-of-order issue no longer exists.
(Therefore, this problem won't occur on DM device.)

But the MD devices are using their private bioset (mddev->bio_set
or conf->bio_split) for splitting by themselves that are not the same
bioset used in blk_queue_split() (i.e. q->bio_split). The deadlock
you have mentioned might not happen to them.

I think there are two solutions:

1. In case MD devices want to change to use q->bio_split someday
    without this out-of-order issue, make them do split once would be
    a solution.

2. If MD devices should split the bio twice, so we can separately handle
    limits in blk_queue_split() and each raid level's (raid0, raid5, 
raid1, ...).
    I will try to find another solution in this case.

    My proposal is not suitable after I reconsider the problem:

    If a bio is split into A part and B part.

    +------|------+
    |   A  |   B  |
    +------|------+

    I think a driver should make sure A part is always handled before B 
part.
    Inserting bio at head of current->bio_list and submitting bio in the 
same
    time while handling A part could make bios generated from A part be
    handled before B part. This broke the order of those bios that generated
    form A part.

    (Maybe I should find a way to make B part at the head of 
bio_list_on_stack[1]
    while submitting it...)

Thanks for your comments.
I will try to figure out a better way to fix it in the next version.

Best regards,
Danny Shih





More information about the dm-devel mailing list