[linux-lvm] thin handling of available space

Tue May 3 13:01:30 UTC 2016

On Mon, 5/2/16, Mark Mielke <mark.mielke at gmail.com> wrote:

<quote>
 very small use case in reality. I think large service
 providers would use Ceph or EMC or NetApp, or some such
 technology to provision large amounts of storage per
 customer, and LVM would be used more at the level of a
 single customer, or a single machine.
</quote>

Ceph?!? yeah I don't think so.

If you thin-provision an EMC/Netapp volume and the block device runs out of blocks (aka Raid Group is full) all volumes on it will drop OFFLINE. They don't even go RO. Poof, they disappear. Why? Because there is no guarantee that every NFS client, every iSCSI client, every FC client is going to do the right thing. The only reliable means of telling everyone "shit just broke" is for the asset to disappear.

All in-flight writes to the volume that the array ACK'd are still good even if they haven't been de-staged to the intended device thanks to NVRAM and the array's journal device.

<quote>
 In these cases, I
 would expect that LVM thin volumes should not be used across
 multiple customers without understanding the exact type of
 churn expected, to understand what the maximum allocation
 that would be required.
</quote>

sure, but that spells responsible sysadmin. Xen's post implied he didn't want to be bothered to manage his block layer  that magically the FS' job was to work closely with the block layer to suss out when it was safe to keep accepting writes. There's an answer to "works closely with block layer" - it's spelled BTRFS and ZFS.

LVM has no obligation to protect careless sysadmins doing dangerous things from themselves. There is nothing wrong with using THIN every which way you want just as long as you understand and handle the eventuality of extent exhaustion. Even thin snaps go invalid if it needs to track a change and can't allocate space for the 'copy'.

Responsible usage has nothing to do with single vs multiple customers. Though Xen broached the 'hosting' example and in the cut-rate hosting business over-provisioning is rampant. It's not a problem unless the syadmin drops the ball.

> Amazon would make sure to have enough storage to meet my requirement if I need them.

Yes, because Amazon is a RESPONSIBLE sysadmin and has put in place tools to manage the fact they are thin-provisoning and to make damn sure they can cash the checks they are writing.

> the nature of the block device, such as "how much space
> do you *really* have left?"

So you're going to write and then backport "second guess the block layer" code to all filesystems in common use and god knows how many versions back? Of course not. Just try to get on the EXT developer mailing list and ask them to write "block layer second-guessing code (aka branch on device flag=thin)" because THINP will cause problems for the FS when it runs out of extents. To which the obvious and correct response will be "Don't use THINP if you're not prepared to handle it's pre-requisites."

> you and the other people. You think the block storage should
> be as transparent as possible, as if the storage was not
> thin. Others, including me, think that this theory is
> impractical

Then by all means go ahead and retrofit all known filesystems with the extra logic. ALL of the filesystems were written with the understanding that the block layer is telling the truth and that any "white lie" was benign in so much that it would be made good and thus could be assumed to be "truth" for practical purpose.