[linux-lvm] thin handling of available space

Tue May 3 12:00:45 UTC 2016

> written as required. If the file system has particular areas
> of importance that need to be writable to prevent file
> system failure, perhaps the file system should have a way of
> communicating this to the volume layer. The naive approach
> here might be to preallocate these critical blocks before
>  proceeding with any updates to these blocks, such that the
> failure situations can all be "safe" situations,
> where ENOSPC can be returned without a danger of the file
> system locking up or going read-only.

why all of a sudden does each and every FS have to have this added code to second guess the block layer? The quickest solution is to mount the FS in sync mode. Go ahead and pay the performance piper. It's still not likely to be bullet proof but it's a sure step closer.

What you're saying is that when mounting a block device the layer needs to expose a "thin-mode" attribute (or the sysdmin sets such a flag via tune2fs). Something analogous to mke2fs can "detect" LVM raid mode geometry (does that actually work reliably?). 

Then there has to be code in every FS block de-stage path:
IF thin {
  tickle block layer to allocate the block (aka write zeros to it? - what about pre-existing data, is there a "fake write" BIO call that does everything but actually write data to a block but would otherwise trigger LVM thin's extent allocation logic?)
   IF success, destage dirty block to block layer ELSE
   inform userland of ENOSPC
}

In a fully journal'd FS (metadata AND data) the journal could be 'pinned' and likewise the main metadata areas if for no other reason they are zero'd at onset and or constantly being written to. Once written to, LVM thin isn't going to go back and yank away an allocated extent.

This at least should maintain FS integrity albeit you may end up in a situation where the journal can never get properly de-staged, so you're stuck on any further writes and need to force RO.

> just want a sanely behaving LVM + XFS...)

IMO if the system admin made a conscious decision to use thin AND overprovision (thin by itself is not dangerous), it's up to HIM to actively manage his block layer. Even on million dollar SANs the expectation is that the engineer will do his job and not drop the mic and walk away. Maybe the "easiest" implementation would be a MD layer job that the admin can tailor to fail all allocation requests once extent count drops below a number and thus forcing all FS mounted on the thinpool to go into RO mode.

But in any event it won't prevent irate users from demanding why the space they appear to have isn't actually there.