[dm-devel] [PATCH 5.20 1/4] block: add bio_rewind() API

Kent Overstreet kent.overstreet at gmail.com
Fri Jul 1 21:09:09 UTC 2022


On Fri, Jul 01, 2022 at 11:58:35AM +0800, Ming Lei wrote:
> Why do we need to pay the cost(4 bytes added to bio and the updating
> in absolutely fast path) if rewind isn't used? So far, only dm requeue
> needs it, and it is one very unusual event.
> 
> Assuming fixed bio end sector should cover most of cases, especially
> if bio_rewind is only for dm or driver.
> 
> I'd suggest to not take this way until turning out bio_rewind() is not
> enough for new requirement or usages.

Well, you're proposing add this to the block layer, not just dm, so we should be
looking at potential users and not just this one use case.

Check out block/blk-crypto-fallback.c -> blk_crypto_fallback_decrypt_bio: it's
using __bio_for_each_segment, which is how I found it. It's also using a
bio_crypt_ctx they've stashed in their per-bio bio_fallback_crypt_ctx, in
blk_crypto_fallback_bio_prep, in addition to saving a copy of bio->bi_iter.

bio_crypt_ctx is 40 bytes, struct bvec_iter is currently 20 bytes, so we'd be
replacing 60 bytes of per-bio state with 8 bytes, if we went with bio_pos. And
this code path is used by block/blk-crypto.c as well, not just
blk-crypto-fallback.c.

drivers/md/dm-integrity.c: Again, we're saving a bvec_iter (using
dm_bio_record()) so that after the bio completes we can walk through the bio as
it was when we submitted it. Here, we're also interested in bi_integrity, but
what dm_bio_record() is doing looks highly suspect - we're only saving a copy of
the bio->bi_integrity pointer, but bio_integrity_payload contains another
bvec_iter and that's probably what should be getting saved.

So for this code, switching bio_(get|set)_pos would likely either be eliminating
the need for a tricky workaround I haven't spotted yet, or perhaps fixing an
actual bug - and it'd be saving 20 bytes of state in dm_bio_details.

Side note: according to the comment in dm_bio_details, what that code is trying
to do is actually something rather interesting - fully restore the state of a
bio so it can be resubmitted to another device after an error. This is probably
something that's worth promoting to block/bio.c, so it can be kept in sync with
e.g. bio_init() and bio_reset, and because we have other code in the kernel
that's doing similar stuff and might want to make use of it if it was standard
(btrfs, bcachefs, md-multipath, perhaps aoe).

drivers/block/aoe/aoecmd.c: here we're also saving bvec_iter, but this code is
doing a different trick that it's able to do because it's not submitting the bio
to other block drivers, the terminal handling is in its own code. It's not
modifying bio->bi_iter at all, which is neat because it makes resubmitting after
an error trivial.

This code is smart, and I would consider it a vote in favor of the bio_iter
approach (but see below, I'm not actually advocating we do that). I do something
similar in bcachefs, my read retry path: the original bio may have been
fragmented, and a retry may be for only a fragment of an original bio. Having a
separate bvec_iter means that the retry path can submit a retry for only a
fragment of the original bio, without having to mutate it or race with other
threads that are doing things with other parts of the original bio. It's not
strictly necessary functionality - I could've also solved this by not freeing
the fragment and retrying that - but it kept things simpler in other ways (the
retry may itself end up being fragmented if the extents we're reading from
changed, which means in the normal read path fragment bch_read_bios only ever
point to toplevel bch_read_bios, but retrying fragment bch_read_bios would mean
when doing retries we'd have fragments pointing to fragments pointing to
fragments, and memory allocations bounded only by the maximum number of retries
we could do).

So the struct bio_iter approach - segregating all the things we mutate when
iterating over a bio into a sub-type, i.e. what I was originally doing with
bvec_iter - has some nice and useful properties, but OTOH with integrity and now
bio_crypt_ctx it doesn't look super practical anymore - bio_iter would always
have to contain whatever it is we need for iterating over crypt and integrity if
those features are #ifdef'd in, whereas now an individual bio only pays the
memory overhead for those features if they're in use.

OTOH, looking at actual code, bio_(get|set)_pos gets us 90% of the elegance of a
separate iterator type, and it works for what all the code I've looked at is
trying to do (except for bcachefs, but I have no interest in blk crypt or
integrity so I'm fine with the current situation).

That's just what I found with a bit of perusing - there's other things I'd want
to look at if I wanted to be thorough (anything with multipath in the name for
sure, I haven't looked at all at the nvme multipath code yet).



More information about the dm-devel mailing list