[linux-lvm] Reserve space for specific thin logical volumes

Tue Sep 12 15:03:01 UTC 2017

> I need to take a step back: my main use for thinp is virtual machine 
> backing store
...
> Rather, I had to use a single, big thin volumes with XFS on top.
...
>I used  ZFS as volume manager, with the intent to place an XFS filesystem on top

Good grief, you had integration (ZFS) and then you broke it. The ZFS as block or as filesystem is just symantics. While you're at it dig into libvirt and see if you can fix it's silliness.

> provisioned blocks. Rather, I all for something as "if free space  is lower than 30%, disable new snapshot *creation*"

Say you allowed a snapshot to be created when it was 31%. And 100 milliseconds later you had 2 more all ask for a snapshot and they succeeded. But 2 seconds later just one of your snapshot writers decided to write till it ran off the end of available space. What have you gained?

> is that, by cleaver using of the refreservation property, I can engineer 

You're not being nearly clever enough. You're using the wrong set of tools and making unsupported assumptions about future writes.

> Committed (fsynced) writes are safe

FSync'd where? Inside your client VM? The hell they're safe. Your hypervisor is under no obligation to honor a write request issued to XFS as if it's synchronous. 
Is XFS at the hypervisor being mounted 'sync'? That's not nearly enough though. You can also prove that there is a direct 1:1 map between the client VM's aggregate of FSync inspired blocks and general writes being de-staged at the same time it gets handed off to the hypervisor's XFS with the same atomicity? And furthermore when your client VM's kernel ACK's the FSYNC it is saying so without having any idea that the write actually made it. It *thought* it had done all it was supposed to do. Now the user software as well as the VM kernel are being actively misled!

You're going about this completely wrong.

You have to push the "did my write actually succeed or not and how do I recover" to inside the client VM. Your client VM either gets issued a block device that is iSCSI (can be same host) or 'bare metal' LVM on the hypervisor. That's the ONLY way to make sure the I/O's don't get jumbled and errors map exactly. Otherwise for application scribble, the client VM mounts an NFS share that can be thinLV+XFS at the fileserver. Or buy a proper enterprise storage array (they are dirt-cheap used, off maint) where people far smarter than you have solved this problem decades ago.

> really want to prevent full thin pools even in the face of failed

And yet you have demonstrated no ability to do so. Or at least have a very naive notion of what happens when multiple, simultaneous actors are involved. It sounds like some of your preferred toolset is letting you down. Roll up your sleeves and fix it. Why you give a damn about what filesystem is 'default' in any particular distribution is beyond me. Use the combination that actually works - not "if only this or that were changed it could/might work."

> to design system where some types of problems can not  simply happen.

And yet you persist on using the dumbest combo available: thin + xfs. No offense to LVM Thin, it works great WHEN used correctly. To channel Apple, "you're holding it wrong".