[linux-lvm] LVM thin pool advice

Wed Feb 15 09:25:49 UTC 2017

David Shaw schreef op 15-02-2017 1:33:

> Is there some way to cap the amount of data that the snapshot can
> allocate from the pool?  Also, is there some way to allocate enough
> metadata space that it can't run out?  By way of analogy, using the
> old snapshot system, if the COW is sufficiently large (larger than the
> volume being snapshotted), it cannot overflow because even if every
> block of the original volume is dirtied, the COW can handle all of it.
>  Is there some similar way to size the metadata space of a thin pool
> such that overflow is "impossible"?

Personally I do not know the current state of affairs but the response 
I've often got here is that there is no such mechanic and it is up to 
the administrator to find out.

Maybe this is a bit ghastly to say it like this, my apologies.

I would very much like to be called wrong here.

The problem is although the LVM monitor (I think) does respond, or can 
be configured to respond to a "thin pool fillup" it does so as a kind of 
daemon, a watch-dog, but it is not an in-system guard.

Typically what I've found in the past is that a fill-up will just hang 
your system.

So I am probably very wrong about some things so I would rather let the 
developers answer.

But as you've found it, the snapshot for a thin volume is always 
allocated with the same size as the origin volume. That means unless you 
have double the space available, your system can crash.

I have personally once ventured -- but I am just some by-stander right 
-- that a proper solution would have to involve inter-layer 
communication between filesystems and block devices, but that is even 
outside of the problem here. The problem as far as I can see it is that 
there is very unexpected behaviour when the thin pool fills up.

Zdenek once pointed out that the allocator does not have a full map of 
what is available. For efficiency reasons, it goes "in search" of the 
next block to allocate. (Next extent).

It does so in response to a filesystem read or write (a write, 
supposedly). The filesystem knows of no limits in the thin pool and 
expects sufficient behaviour. The block layer (in this case LVM) can 
respond with failure or success but I do not know how it is handled or 
what results it produces when the thin pool is full and no new blocks 
can be allocated.

However I expect your system to freeze when the snapshot allocates more 
space than is available. I think the designated behaviour is for the 
snapshot to be dropped but I doubt this happens?

After all the snapsnot might be mounted, etc?...

It seems to me the first thing to do is to create safety margins, but 
then... I do not develop this thing right now :p.

I think what is required is advance-allocation where each (individual) 
volume allocates a pre-defined number of blocks in advance. Then, any 
out of space message from the thin volume manager would implicate the 
pre-allocation and not the actual allocation for the filesystem.

You create a bit of a buffer. In time. Once the individual pool 
allocator knows the thin pool is having problems, but it still has 
extents available to itself that it pre-allocated, it can already start 
informing the filesystem -- ideally -- that there is mayhem to be 
coming.

But also it means that a snapshot could recognise problems ahead of time 
and be told that it needs to start failing if a certain minimum of free 
space is not to be found.

But also, all of this requires that the central thin volume manager 
knows ahead of time, or in any case, at any single moment, how many 
extents are available. If this is concurrently done and there are many 
such allocators operating, all of them would need to operate on 
synchronized numbers of available space. Particularly when space is 
running out I feel there should be some sort of emergency mode where 
restrictions start to apply.

It is just unacceptable to me that the system will crash when space runs 
out. In case of a depleted thin pool, any snapshot should really be 
discarded by default I feel. Otherwise the entire thin pool should be 
readily frozen. But why the system should crash on this is beyond me.

My apologies for this perhaps petulant message. I just think it should 
not be understated how important it is that a system does not crash,

and I just was indicating that in the past the message has often been 
that it is _your_ job to create safety.

But this is slightly impossible. This would indicate... well whatever.

The failure case of a filled-up thin pool should not be relegated to the 
shadows.

I hope to be made wrong here and good luck with your endeavour.  I would 
suggest that a thin pool is very sexy ;-). But thus far there are no 
safeguards.

Please be advised that I do not know if such limits currently exist that 
you ask of. I have just been told here that the thin snapshot is of 
equal size to origin volume and there is nothing you can do about it?

Regards.