[dm-devel] [announce] thin-provisioning-tools v1.0.0-rc1
Eric Wheeler
dm-devel at lists.ewheeler.net
Fri Mar 3 21:21:13 UTC 2023
On Thu, 2 Mar 2023, Joe Thornber wrote:
> Hi Eric,
>
> On Wed, Mar 1, 2023 at 10:26 PM Eric Wheeler <dm-devel at lists.ewheeler.net> wrote:
>
> Hurrah! I've been looking forward to this for a long time...
>
>
> ...So if you have any commentary on the future of dm-thin with respect
> to metadata range support, or dm-thin performance in general, that I would
> be very curious about your roadmap and your plans.
>
>
> The plan over the next few months is roughly:
>
> - Get people using the new Rust tools. They are _so_ much faster than
> the old C++ ones. [available now]
> - Push upstream a set of patches I've been working on to boost thin
> concurrency performance. These are nearing completion and are
> available here for those who are interested:
> https://github.com/jthornber/linux/tree/2023-02-28-thin-concurrency-7.
> These are making a huge difference to performance in my testing, eg,
> fio with 16 jobs running concurrently gets several times the throughput.
> [Upstream in the next month hopefully]
It would be nice to get people testing the new improvements:
Do you think it can make it for the 6.3 merge window that is open?
> - Change thinp metadata to store ranges rather than individual mappings.
> This will reduce the amount of space the metadata consumes, and have
> the knock on effect of boosting performance slightly (less metadata
> means faster lookups). However I consider this a half-way house, in
> that I'm only going to change the metadata and not start using ranges
> within the core target (I'm not moving away from fixed block sizes).
> [Next 3 months]
Good idea.
> I don't envisage significant changes to dm-thin or dm-cache after this.
Seems reasonable.
> Longer term I think we're nearing a crunch point where we drastically
> change how we do things. Since I wrote device-mapper in 2001 the speed
> of devices has increased so much that I think dm is no longer doing a
> good job:
>
> - The layering approach introduces inefficiencies with each layer.
> Sure it may only be a 5% hit to add another linear mapping into the
> stack. But those 5%'s add up.
> - dm targets only see individual bios rather than the whole request
> queue. This prevents a lot of really useful optimisations. Think how
> much smarter dm-cache and dm-thin could be if they could look at the
> whole queue.
> - The targets are getting too complicated. I think dm-thin is around 8k
> lines of code, though it shares most of that with dm-cache. I
> understand the dedup target from the vdo guys weighs in at 64k lines.
> Kernel development is fantastically expensive (or slow depending
> how you want to look at it). I did a lot of development work on
> thinp v2, and it was looking a lot like a filesystem shoe-horned into
> the block layer. I can see why bcache turned into bcache-fs.
Did thinp v2 get dropped, or just turn into the patchset above?
> - Code within the block layer is memory constrained. We can't allocate
> arbitrary sized allocations within targets, instead we have to use
> mempools of fixed size objects (frowned upon these days), or declare
> up front how much memory we need to service a bio (forcing us to
> assume the worst case).
> This stuff isn't hard, just tedious and makes coding sophisticated targets pretty joyless.
>
> So my plan going forwards is to keep the fast path of these targets in
> kernel (eg, a write to a provisioned, unsnapshotted region). But take
> the slow paths out to userland.
Seems reasonable.
> I think io_uring, and ublk have shown us that this is viable. That way
> a snapshot copy-on-write, or dm-cache data migration, which are very
> slow operations can be done with ordinary userland code.
Would be nice to minimize CoW latency somehow if going to userspace
increases that a notable amount. CoW for spinning disks is definitely
slow, but NVMe's are pretty quick to copy a 64k chunk.
> For the fast paths, layering will be removed by having userland give
> the kernel instruction to execute for specific regions of the virtual
> device (ie. remap to here).
Maybe you just answered my question of latency?
> The kernel driver will have nothing specific to thin/cache etc. I'm not
> sure how many of the current dm-targets would fit into this model, but
> I'm sure thin provisioning, caching, linear, and stripe can.
To be clear, linear and stripe would stay in the kernel?
-Eric
>
> - Joe
>
>
>
>
>
>
>
> Thanks again for all your great work on this.
>
> -Eric
>
> > [note: _data_ sharing was always maintained, this is purely about metadata space usage]
> >
> > # thin_metadata_pack/unpack
> >
> > These are a couple of new tools that are used for support. They compress
> > thin metadata, typically to a tenth of the size (much better than you'd
> > get with generic compressors). This makes it easier to pass damaged
> > metadata around for inspection.
> >
> > # blk-archive
> >
> > The blk-archive tools were initially part of this thin-provisioning-tools
> > package. But have now been split off to their own project:
> >
> > https://github.com/jthornber/blk-archive
> >
> > They allow efficient archiving of thin devices (data deduplication
> > and compression). Which will be of interest to those of you who are
> > holding large numbers of snapshots in thin pools as a poor man's backup.
> >
> > In particular:
> >
> > - Thin snapshots can be used to archive live data.
> > - it avoids reading unprovisioned areas of thin devices.
> > - it can calculate deltas between thin devices to minimise
> > how much data is read and deduped (incremental backups).
> > - restoring to a thin device tries to maximise data sharing
> > within the thin pool (a big win if you're restoring snapshots).
> >
> >
>
>
>
More information about the dm-devel
mailing list