[linux-lvm] Reserve space for specific thin logical volumes

Wed Sep 13 08:15:44 UTC 2017

Dne 13.9.2017 v 09:53 Gionatan Danti napsal(a):
> Il 13-09-2017 01:22 matthew patton ha scritto:
>>> Step-by-step example:
>>  > - create a 40 GB thin volume and subtract its size from the thin
>> pool (USED 40 GB, FREE 60 GB, REFER 0 GB);
>>  > - overwrite the entire volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
>>  > - snapshot the volume (USED 40 GB, FREE 60 GB, REFER 40 GB);
>>
>> And 3 other threads also take snapshots against the same volume, or
>> frankly any other volume in the pool.
>> Since the next step (overwrite) hasn't happened yet or has written
>> less than 20GB, all succeed.
>>
>>  > - completely overwrite the original volume (USED 80 GB, FREE 20 GB,
>> REFER 40 GB);
>>
>> 4 threads all try to write their respective 40GB. Afterall, they got
>> the green-light since their snapshot was allowed to be taken.
>> Your thinLV blows up spectacularly.
>>
>>  > - a new snapshot creation will fails (REFER is higher then FREE).
>> nobody cares about new snapshot creation attempts at this point.
>>
>>
>>> When do you decide it ?  (you need to see this is total race-lend)
>>
>> exactly!
> 
> I all the examples I did, the snapshot are suppose to be read-only or at least 
> never written. I thought that it was implicitly clear due to ZFS (used as 
> example) being read-only by default. Sorry for not explicitly stating that.
> 

Ohh this is pretty major constrain ;)

But as pointed out multiple times - with scripting around various fullness 
moments of thin-pool - several different actions can be programmed around,
starting from fstrim, ending with plain erase of unneeded snapshot.
(Maybe erasing unneeded files....)

To get most secure application - such app should actually avoid using 
page-cache (using direct-io)  in such case you are always guaranteed
to get exact error at the exact time (i.e. even without journaled mounting 
option for ext4....)

> After the last write, the cloned cvol1 is clearly corrputed, but the original 
> volume has not problem at all.

Surely there is good reason we keep 'old snapshots' still with us - although 
everyone knows it's implementation has aged :)

There are cases where this copying into separate COW areas simply works better 
- especially for temporary living object with low number of 'small' changes.

We even support  old-snapshot for thin-volumes for this reason - so you can 
use 'bigger' thin-pool chunks - but for temporary snapshot for taking backups
you can take old snapshot of thin volume...

> 
> This was more or less the case with classical, fat LVM: a snapshot runnig out 
> of space *will* fail, but the original volume remains unaffected.

Partially this might get solved in 'some' cases with fully provisioned thinLVs 
within thin-pool...

What comes to my mind as possible supporting solution is -
adding possible enhancement on LVM2 side could be  'forcible' removal of 
running volumes  (aka lvm2 equivalent  of 'dmsetup remove --force')

ATM lvm2 prevents you to remove 'running/mounted' volumes.

I can well imagine  LVM will let you forcible  replace such LV with error 
target  - so instead of  thinLV  - you will have  single 'error' target 
snapshot - which could be possibly even  auto-cleaned once the volume 
use-count drops bellow 0  (lvmpolld/dmeventd monitoring whatever...)

(Of course - we are not solving what happens to application using/running out 
of such error target - hopefully something not completely bad....)

This way - you get very 'powerful' weapon to be used in those 'scriplets'
so you can drop uneeded volumes ANYTIME you need to and reclaim its resources...

Regards

Zdenek