[dm-devel] [PATCH 00/17] btrfs: split bio at btrfs_map_bio() time
Qu Wenruo
wqu at suse.com
Wed Dec 1 08:57:35 UTC 2021
On 2021/12/1 13:17, Qu Wenruo wrote:
> [BACKGROUND]
>
> Currently btrfs never uses bio_split() to split its bio against RAID
> stripe boundaries.
>
> Instead inside btrfs we check our stripe boundary everytime we allocate
> a new bio, and ensure the new bio never cross stripe boundaries.
>
> [PROBLEMS]
>
> Although this behavior works fine, it's against the common practice used in
> stacked drivers, and is making the effort to convert to iomap harder.
>
> There is also an hidden burden, every time we allocate a new bio, we uses
> BIO_MAX_BVECS, but since we know the boundaries, for RAID0/RAID10 we can
> only fit at most 16 pages (fixed 64K stripe size, and 4K page size),
> wasting the 256 slots we allocated.
>
> [CHALLENGES]
>
> To change the situation, this patchset attempts to improve the situation
> by moving the bio split into btrfs_map_bio() time, so upper layer should
> no longer bother the bio split against RAID stripes or even chunk
> boundaries.
>
> But there are several challenges:
>
> - Conflicts in various endio functions
> We want the existing granularity, instead of chained endio, thus we
> must make the involved endio functions to handle split bios.
>
> Although most endio functions are already doing their works
> independent of the bio size, they are not yet fully handling split
> bio.
>
> This patch will convert them to use saved bi_iter and only iterate
> the split range instead of the whole bio.
> This change involved 3 types of IOs:
>
> * Buffered IO
> Including both data and metadata
> * Direct IO
> * Compressed IO
>
> Their endio functions needs different level of updates to handle split
> bios.
>
> Furthermore, there is another endio, end_workqueue_bio(), it can't
> handle split bios at all, thus we change the timing so that
> btrfs_bio_wq_end_io() is only called after the bio being split.
>
> - Checksum verification
> Currently we rely on btrfs_bio::csum to contain the checksum for the
> whole bio.
> If one bio get split, csum will no longer points to the correct
> location for the split bio.
>
> This can be solved by introducing btrfs_bio::offset_to_original, and
> use that new member to calculate where we should read csum from.
>
> For the parent bio, it still has btrfs_bio::csum for the whole bio,
> thus it can still free it correctly.
>
> - Independent endio for each split bio
> Unlike stack drivers, for RAID10 btrfs needs to try its best effort to
> read every sectors, to handle the following case: (X means bad, either
> unable to read or failed to pass checksum verification, V means good)
>
> Dev 1 (missing) | D1 (X) |
> Dev 2 (OK) | D1 (V) |
> Dev 3 (OK) | D2 (V) |
> Dev 4 (OK) | D2 (X) |
>
> In the above RAID10 case, dev1 is missing, and although dev4 is fine,
> its D2 sector is corrupted (by bit rot or whatever).
>
> If we use bio_chain(), read bio for both D1 and D2 will be split, and
> since D1 is missing, the whole D1 and D2 read will be marked as error,
> thus we will try to read from dev2 and dev4.
>
> But D2 in dev4 has csum mismatch, we can only rebuild D1 and D2
> correctly from dev2:D1 and dev3:D2.
>
> This patchset resolve this by saving bi_iter into btrfs_bio::iter, and
> uses that at endio to iterate only the split part of an bio.
> Other than this, existing read/write page endio functions can handle
> them properly without problem.
>
> - Bad RAID56 naming/functionality
> There are quite some RAID56 call sites relies on specific behavior on
> __btrfs_map_block(), like returning @map_length as stripe_len other
> than real mapped length.
>
> This is handled by some small cleanups specific for RAID56.
>
> [NEED FEEDBACK]
> In this refactor, btrfs is utilizing a lot of call sites like:
>
> btrfs_bio_save_iter(); // Save bi_iter into some other location
> __bio_for_each_segment(bvec, bio, iter, btrfs_bio->iter) {
> /* Doing endio for each bvec */
> }
>
> And manually implementing an endio which does some work of
> __bio_chain_endio() but with extra btrfs specific workload.
>
> I'm wondering if block layer is fine to provide some *enhanced* chain
> bio facilities?
>
> [CHANGELOG]
> RFC->v1:
> - Better patch split
> Now patch 01~06 are refactors/cleanups/preparations.
> While 07~13 are the patches that doing the conversion while can handle
> both old and new bio split timings.
> Finally patch 14~16 convert the bio split call sites one by one to
> newer facility.
> The final patch is just a small clean up.
>
> - Various bug fixes
> During the full fstests run, various stupid bugs are exposed and
> fixed.
The latest version can be fetched from this branch:
https://github.com/adam900710/linux/tree/refactor_chunk_map
Which already has the fix for the btrfs/187 crash, which is caused by a
missing modification for scrub_stripe_index_and_offset().
Thanks,
Qu
>
> Qu Wenruo (17):
> btrfs: update an stale comment on btrfs_submit_bio_hook()
> btrfs: save bio::bi_iter into btrfs_bio::iter before submitting
> btrfs: use correct bio size for error message in btrfs_end_dio_bio()
> btrfs: refactor btrfs_map_bio()
> btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio()
> btrfs: replace btrfs_dio_private::refs with
> btrfs_dio_private::pending_bytes
> btrfs: introduce btrfs_bio_split() helper
> btrfs: make data buffered read path to handle split bio properly
> btrfs: make data buffered write endio function to be split bio
> compatible
> btrfs: make metadata write endio functions to be split bio compatible
> btrfs: make dec_and_test_compressed_bio() to be split bio compatible
> btrfs: return proper mapped length for RAID56 profiles in
> __btrfs_map_block()
> btrfs: allow btrfs_map_bio() to split bio according to chunk stripe
> boundaries
> btrfs: remove buffered IO stripe boundary calculation
> btrfs: remove stripe boundary calculation for compressed IO
> btrfs: remove the stripe boundary calculation for direct IO
> btrfs: unexport btrfs_get_io_geometry()
>
> fs/btrfs/btrfs_inode.h | 10 +-
> fs/btrfs/compression.c | 70 +++-----------
> fs/btrfs/disk-io.c | 9 +-
> fs/btrfs/extent_io.c | 189 +++++++++++++++++++++++++------------
> fs/btrfs/extent_io.h | 2 +
> fs/btrfs/inode.c | 210 ++++++++++++++++-------------------------
> fs/btrfs/raid56.c | 14 ++-
> fs/btrfs/raid56.h | 2 +-
> fs/btrfs/scrub.c | 4 +-
> fs/btrfs/volumes.c | 157 ++++++++++++++++++++++--------
> fs/btrfs/volumes.h | 75 +++++++++++++--
> 11 files changed, 435 insertions(+), 307 deletions(-)
>
More information about the dm-devel
mailing list