[dm-devel] Potential enhancements to dm-thin v2

Mon Apr 11 08:16:02 UTC 2022

Dne 11. 04. 22 v 0:03 Demi Marie Obenour napsal(a):
> For quite a while, I have wanted to write a tool to manage thin volumes
> that is not based on LVM.  The main thing holding me back is that the
> current dm-thin interface is extremely error-prone.  The only per-thin
> metadata stored by the kernel is a 24-bit thin ID, and userspace must
> take great care to keep that ID in sync with its own metadata.  Failure
> to do so results in data loss, data corruption, or even security
> vulnerabilities.  Furthermore, having to suspend a thin volume before
> one can take a snapshot of it creates a critical section during which
> userspace must be very careful, as I/O or a crash can lead to deadlock.
> I believe both of these problems can be solved without overly
> complicating the kernel implementation.

Hi

These things are coming with initial design of whole DM world - where there is 
a split of complexity between kernel & user-space. So projects like btrfs, 
ZFS, decided to go the other way and create a monolithic 'all-in-one' 
solution, where they avoid some problems related with communication between 
kernel & user-space - but at the price of having a pretty complicated and very 
hard to devel & debug  kernel code.

So let me explain one of the reasons, we have this logic with suspend is this 
basic principle:

write new lvm metadata ->  suspend (with all table preloads) ->  commit  new 
lvm2 metadata -> resume

with this we ensure the user space maintain the only valid 'view' of metadata.

Your proposal actually breaks this sequence and would move things to the state 
of  'guess at which states we are now'. (and IMHO presents much more risk than 
virtual problem with suspend from user-space - which is only a problem if you 
are using suspended device as 'swap' and 'rootfs' - so there are very easy 
ways how to orchestrate your LVs to avoid such problems).

Basically you are essentially wanting to move whole management into kernel for 
some not so great speed gains (related to the rest of the running system (and 
you can certainly do that by writing your own kernel module to manage your 
ratehr unique software problem)

But IMHO creation and removal of thousands of devices in very short period of 
time rather suggest there is something sub-optimal in your original software 
design as I'm really having hard time imagining why would you need this ?

If you wish to operate lots of devices - keep them simply created and ready - 
and eventually blkdiscard them for next device reuse.

I'm also unsure from where would arise any special need to instantiate  that 
many snapshots -  if there is some valid & logical purpose -   lvm2 can have 
extended user space API to create multiple snapshots at once maybe (so i.e.    
create  10 snapshots   with      name-%d  of a single thinLV)

Not to mentioning operating that many thin volumes from a single thin-pool is 
also nothing close to high performance goal you try to reach...

Regards

Zdenek