[linux-lvm] about the lying nature of thin
pattonme at yahoo.com
Tue May 3 13:43:57 UTC 2016
> I didn't know thin (or LVM) doesn't maintain maps of used blocks.
Right, so you're ignorant of basics like how the various subsystems work. Like I said, go find a text on OS and filesystem design. Hell, read the EXT and LVM code or even just the design docs.
> The recent DISCARD improvements apparently just signal some special case
> (?) but SSDs DO maintain maps or it wouldn't even work (?).
Again, read up on the inner workings of SSDs. To over-simplify, SSDs have their own "LVM". No different really than a hardware RAID controller does - admittedly most raid controllers don't do anything particularly advanced.
> I don't know, it would seem that having a map of used extents in a thin
> pool is in some way deeply important in being able to allocate unused
clearly you are in need of much more studying. LVM knows exactly out of all of it's defined extents which ones are free and which ones have been assigned to an LV - aka written to. What individual blocks (aka range of bytes) inside those extents have FS-managed data in them it knows not nor does it care.
> I guess continuous polling would be deeply disrespectful of the hardware
> and software resources.
Not to mention instantaneously invalid. So you poll LVM, "what is your allocation map and do you have any free extents?" You get the results. Then the FS having been assured there is free space issues writes. But oh no, in the round-trip some other LV has grabbed the extent you had intended to use! IO=FAIL.
The ONLY way for a FS to "reserve" a set of blocks (aka extent) to itself is to write to it - but mind the FS has NO IDEA if needs to do an reservation in the first place nor if this IO just so happens to fit inside the allocated range but the next IO at offset +1 will require a new extent to be allocated from the THINP.
I haven't checked, but it's perfectly possible for LVM THINP to respond to FS issued DISCARD notices and thus build an allocation map of an extent. And should an extent be fully empty to return the extent to the thin pool. Only to have to allocate a new extent if any IO hits the same block range in the future. This kind of extent churn is probably not very useful unless your workload is in the habit of writing tons of data, freeing it and waiting a reasonable amount of time and potentially doing it again. SSDs resort to it because they must - it's the nature of the silicon device itself.
> It would say to a filesystem: these regions are currently unavailable.
> You would even get more flags:
> - this region is entirely unavailable
> - this region is now more expensive to allocate to
> - this region is the preferred place
All of this "inside knowledge" and "coordination" you so desperately seem to want is called integration. And again spelled BTRFS and ZFS. et. al.
> In the theoretical system I proposed it would be a constant
yeah, have fun with that theoretical system.
Xen, dude seriously. Go do a LOT more reading.
More information about the linux-lvm