[linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

Wed Feb 28 21:43:26 UTC 2018

Dne 28.2.2018 v 20:07 Gionatan Danti napsal(a):
> Hi all,
> 
> Il 28-02-2018 10:26 Zdenek Kabelac ha scritto:
>> Overprovisioning on DEVICE level simply IS NOT equivalent to full
>> filesystem like you would like to see all the time here and you've
>> been already many times explained that filesystems are simply not
>> there ready - fixes are on going but it will take its time and it's
>> really pointless to exercise this on 2-3 year old kernels...
> 
> this was really beaten to death in the past months/threads. I generally agree 
> with Zedenk.
> 
> To recap (Zdeneck, correct me if I am wrong): the main problem is that, on a 
> full pool, async writes will more-or-less silenty fail (with errors shown on 
> dmesg, but nothing more). Another possible cause of problem is that, even on a 
> full pool, *some* writes will complete correctly (the one on already allocated 
> chunks).

On default - full pool starts to 'error' all 'writes' in 60 seconds.

> 
> In the past was argued that putting the entire pool in read-only mode (where 
> *all* writes fail, but read are permitted to complete) would be a better 
> fail-safe mechanism; however, it was stated that no current dmtarget permit that.

Yep - I'd probably like to see slightly different mechanism - that all
on going writes would be failing  - so far - some 'writes' will pass
(those to already provisioned areas) - some will fail (those to unprovisioned).

The main problem is - after reboot - this 'missing/unprovisioned' space may 
provide some old data...

> 
> Two (good) solution where given, both relying on scripting (see "thin_command" 
> option on lvm.conf):
> - fsfreeze on a nearly full pool (ie: >=98%);
> - replace the dmthinp target with the error target (using dmsetup).

Yep - this all can happen via 'monitoring.
The key is to do it early before disaster happens.

> I really think that with the good scripting infrastructure currently built in 
> lvm this is a more-or-less solved problem.

It still depends - there is always some sort of 'race' - unless you are 
willing to 'give-up' too early to be always sure, considering there are 
technologies that may write many GB/s...

>> Do NOT take thin snapshot of your root filesystem so you will avoid
>> thin-pool overprovisioning problem.
> 
> But is someone *really* pushing thinp for root filesystem? I always used it 

You can use rootfs with thinp - it's very fast for testing i.e. upgrades
and quickly revert back - just there should be enough free space.

> In stress testing, I never saw a system crash on a full thin pool, but I was 
> not using it on root filesystem. There are any ill effect on system stability 
> which I need to know?

Depends on version of kernel and filesystem in use.

Note RHEL/Centos kernel has lots of backport even when it's look quite old.

> The solution is to use scripting/thin_command with lvm tags. For example:
> - tag all snapshot with a "snap" tag;
> - when usage is dangerously high, drop all volumes with "snap" tag.

Yep - every user has different plans in his mind - scripting gives user 
freedom to adapt this logic to local needs...

>>> However, I don't have the space for a full copy of every filesystem, so if 
>>> I snapshot, I will automatically overprovision.

As long as admin responsible controls space in thin-pool and takes action
long time before thin-pool runs out-of-space all is fine.

If admin hopes in some kind of magic to happen - we have a problem....

>>
>> Back to rule #1 - thin-p is about 'delaying' deliverance of real space.
>> If you already have plan to never deliver promised space - you need to
>> live with consequences....
> 
> I am not sure to 100% agree on that. Thinp is not only about "delaying" space 
> provisioning; it clearly is also (mostly?) about fast, modern, usable 
> snapshots. Docker, snapper, stratis, etc. all use thinp mainly for its fast, 
> efficent snapshot capability. Denying that is not so useful and led to 
> "overwarning" (ie: when snapshotting a volume on a virtually-fillable thin pool).

Snapshot are using space - with hope that if you will 'really' need that space
you either add this space to you system - or you drop snapshots.

Still the same logic applied....

>> !SNAPSHOTS ARE NOT BACKUPS!
>>
>> This is the key problem with your thinking here (unfortunately you are
>> not 'alone' with this thinking)
> 
> Snapshot are not backups, as they do not protect from hardware problems (and 
> denying that would be lame); however, they are an invaluable *part* of a 
> successfull backup strategy. Having multiple rollaback target, even on the 
> same machine, is a very usefull tool.

Backups primarily sits on completely different storage.

If you keep backup of data in same pool:

1.)
error on this in single chunk shared by all your backup + origin - means it's 
total data loss - especially in case where filesystem are using 'BTrees' and 
some 'root node' is lost - can easily render you origin + all backups 
completely useless.

2.)
problems in thin-pool metadata can make all your origin+backups just an 
unordered mess of chunks.

> Again, I don't understand by we are speaking about system crashes. On root 
> *not* using thinp, I never saw a system crash due to full data pool. >
> Oh, and I use thinp on RHEL/CentOS only (Debian/Ubuntu backports are way too 
> limited).

Yep - this case is known to be pretty stable.

But as said - with today 'rush' of development and load of updates - user do 
want to try 'new disto upgrade' - if it works - all is fine - if it doesn't 
let's have a quick road back -  so using thin volume for rootfs is pretty 
wanted case.

Trouble is there is quite a lot of issues non-trivial to solve.

There are also some on going ideas/projects - one of them was to have thinLVs 
with priority to be always fully provisioned - so such thinLV could never be 
the one to have unprovisioned chunks....
Other was a better integration of filesystem with 'provisioned' volumes.

Zdenek