<div dir="ltr"><div>Hi Eric,</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 1, 2023 at 10:26 PM Eric Wheeler <<a href="mailto:dm-devel@lists.ewheeler.net">dm-devel@lists.ewheeler.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Hurrah! I've been looking forward to this for a long time...<br>
<br>
<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
...So if you have any commentary on the future of dm-thin with respect <br>
to metadata range support, or dm-thin performance in general, that I would <br>
be very curious about your roadmap and your plans.<br></blockquote><div><br></div><div>The plan over the next few months is roughly:</div><div><br></div><div>- Get people using the new Rust tools. They are _so_ much faster than the old C++ ones. [available now]</div><div>- Push upstream a set of patches I've been working on to boost thin concurrency performance. These are </div><div> nearing completion and are available here for those who are interested: <a href="https://github.com/jthornber/linux/tree/2023-02-28-thin-concurrency-7">https://github.com/jthornber/linux/tree/2023-02-28-thin-concurrency-7</a>.</div><div> These are making a huge difference to performance in my testing, eg, fio with 16 jobs running concurrently gets several times the throughput.</div><div> [Upstream in the next month hopefully]</div><div>- Change thinp metadata to store ranges rather than individual mappings. This will reduce the amount of space the metadata consumes, and </div><div> have the knock on effect of boosting performance slightly (less metadata means faster lookups). However I consider this a half-way house, in</div><div> that I'm only going to change the metadata and not start using ranges within the core target (I'm not moving away from fixed block sizes). [Next 3 months]</div><div><br></div><div>I don't envisage significant changes to dm-thin or dm-cache after this.</div><div><br></div><div><br></div><div>Longer term I think we're nearing a crunch point where we drastically change how we do things. Since I wrote device-mapper in 2001 the speed of</div><div>devices has increased so much that I think dm is no longer doing a good job:</div><div><br></div><div>- The layering approach introduces inefficiencies with each layer. Sure it may only be a 5% hit to add another linear mapping into the stack.</div><div> But those 5%'s add up.</div><div>- dm targets only see individual bios rather than the whole request queue. This prevents a lot of really useful optimisations.</div><div> Think how much smarter dm-cache and dm-thin could be if they could look at the whole queue.</div><div>- The targets are getting too complicated. I think dm-thin is around 8k lines of code, though it shares most of that with dm-cache.</div><div> I understand the dedup target from the vdo guys weighs in at 64k lines. Kernel development is fantastically expensive (or slow depending</div><div> how you want to look at it). I did a lot of development work on thinp v2, and it was looking a lot like a filesystem shoe-horned into the block layer.</div><div> I can see why bcache turned into bcache-fs.</div><div>- Code within the block layer is memory constrained. We can't allocate arbitrary sized allocations within targets, instead we have to use mempools</div><div> of fixed size objects (frowned upon these days), or declare up front how much memory we need to service a bio (forcing us to assume the worst case).</div><div> This stuff isn't hard, just tedious and makes coding sophisticated targets pretty joyless.</div><div><br></div><div>So my plan going forwards is to keep the fast path of these targets in kernel (eg, a write to a provisioned, unsnapshotted region). But take</div><div>the slow paths out to userland. I think io_uring, and ublk have shown us that this is viable. That way a snapshot copy-on-write, or dm-cache data</div><div>migration, which are very slow operations can be done with ordinary userland code. For the fast paths, layering will be removed by having userland give the kernel</div><div>instruction to execute for specific regions of the virtual device (ie. remap to here). The kernel driver will have nothing specific to thin/cache etc.</div><div>I'm not sure how many of the current dm-targets would fit into this model, but I'm sure thin provisioning, caching, linear, and stripe can.</div><div><br></div><div>- Joe</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Thanks again for all your great work on this.<br>
<br>
-Eric<br>
<br>
> [note: _data_ sharing was always maintained, this is purely about metadata space usage]<br>
> <br>
> # thin_metadata_pack/unpack<br>
> <br>
> These are a couple of new tools that are used for support. They compress<br>
> thin metadata, typically to a tenth of the size (much better than you'd<br>
> get with generic compressors). This makes it easier to pass damaged<br>
> metadata around for inspection.<br>
> <br>
> # blk-archive<br>
> <br>
> The blk-archive tools were initially part of this thin-provisioning-tools<br>
> package. But have now been split off to their own project:<br>
> <br>
> <a href="https://github.com/jthornber/blk-archive" rel="noreferrer" target="_blank">https://github.com/jthornber/blk-archive</a><br>
> <br>
> They allow efficient archiving of thin devices (data deduplication<br>
> and compression). Which will be of interest to those of you who are<br>
> holding large numbers of snapshots in thin pools as a poor man's backup.<br>
> <br>
> In particular:<br>
> <br>
> - Thin snapshots can be used to archive live data.<br>
> - it avoids reading unprovisioned areas of thin devices.<br>
> - it can calculate deltas between thin devices to minimise<br>
> how much data is read and deduped (incremental backups).<br>
> - restoring to a thin device tries to maximise data sharing<br>
> within the thin pool (a big win if you're restoring snapshots).<br>
> <br>
> </blockquote></div></div>