[linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac
zdenek.kabelac at gmail.com
Mon Sep 11 17:34:18 UTC 2017
Dne 11.9.2017 v 16:00 Xen napsal(a):
> Just responding to second part of your email.
>
>>> Only manual intervention this one... and last resort only to prevent crash
>>> so not really useful in general situation?
>>
>> Let's simplify it for the case:
>>
>> You have 1G thin-pool
>> You use 10G of thinLV on top of 1G thin-pool
>>
>> And you ask for 'sane' behavior ??
>
> Why not? Really.
Because all filesystems put on top of thinLV do believe all blocks on the
device actually exist....
>> Any idea of having 'reserved' space for 'prioritized' applications and
>> other crazy ideas leads to nowhere.
>
> It already exists in Linux filesystems since long time (root user).
Did I say you can't compare filesystem problem with block level problem ?
If not ;) let's repeat - being out of space in a single filesystem
is completely different fairy-tail with out of space thin-pool.
>
>> Actually there is very good link to read about:
>>
>> https://lwn.net/Articles/104185/
>
> That was cute.
>
> But we're not asking aeroplane to keep flying.
IMHO you just don't yet see the parallelism....
>> And we believe it's fine to solve exceptional case by reboot.
>
> Well it's hard to disagree with that but for me it might take weeks before I
> discover the system is offline.
IMHO it's problem of proper monitoring.
Still the same song here - you should actively trying to avoid car-collision,
since trying to resurrect often seriously injured or even dead passenger from
a demolished car is usually very complex job with unpredictable result...
We do put number of 'car-protection' safety mechanism - so the newer tools,
newer kernel the better - but still when you hit the wall in top-speed
you can't expect you just 'walk-out' easily... and it's way cheaper to solve
the problem in way you will NOT crash at all..
>
> Otherwise most services would probably continue.
>
> So now I need to install remote monitoring that checks the system is still up
> and running etc.
Of course you do.
thin-pool needs attention/care :)
> If all solutions require more and more and more and more monitoring, that's
> not good.
It's the best we can provide....
>> So don't expect lvm2 team will be solving this - there are more prio work....
>
> Sure, whatever.
>
> Safety is never prio right ;-).
We are safe enough (IMHO) to NOT loose committed data,
We cannot guarantee stable system though - it's too complex.
lvm2/dm can't be fixing extX/btrfs/XFS and other kernel related issues...
Bold men can step in - and fix it....
>> If the system volume IS that important - don't use it with over-provisiong!
>
> System-volume is not overprovisioned.
If you have enough blocks in thin-pool to cover all needed block for all
thinLV attached to it - you are not overprovisioning.
> Just something else running in the system....
Use different pools ;)
(i.e. 10G system + 3 snaps needs 40G of data size & appropriate metadata size
to be safe from overprovisioning)
> That will crash the ENTIRE SYSTEM when it fills up.
>
> Even if it was not used by ANY APPLICATION WHATSOEVER!!!
Full thin-pool on recent kernel is certainly NOT randomly crashing entire
system :)
If you think it's that case - provide full trace of crashed kernel and open BZ
- just be sure you use upstream Linux...
> My system LV is not even ON a thin pool.
Again - if you reproduce on kernel 4.13 - open BZ and provide reproducer.
If you use older kernel - take a recent one and reproduce.
If you can't reproduce - problem has been already fixed.
It's then for your kernel provider to either back-port fix
or give you fixed newer kernel - nothing really for lvm2...
> It's way more practical solution the trying to fix OOM problem :)
>
> Aye but in that case no one can tell you to ensure you have auto-expandable
> memory ;-) ;-) ;-) :p :p :p.
I'd probably recommend reading some books about how is memory mapped on a
block device and what are all the constrains and related problems..
>>> Yes email monitoring would be most important I think for most people.
>> Put mail messaging into plugin script then.
>> Or use any monitoring software for messages in syslog - this worked
>> pretty well 20 years back - and hopefully still works well :)
>
> Yeah I guess but I do not have all this knowledge myself about all these
> different kinds of softwares and how they work, I hoped that thin LVM would
> work for me without excessive need for knowledge of many different kinds.
We do provide some 'generic' script - unfortunately - every use-case is
basically pretty different set of rules and constrains.
So the best we have is 'auto-extension'
We used to trying to umount - but this has possibly added more problems then
it has actually solved...
>>> I am just asking whether or not there is a clear design limitation that
>>> would ever prevent safety in operation when 100% full (by accident).
>>
>> Don't user over-provisioning in case you don't want to see failure.
>
> That's no answer to that question.
There is a lot of technical complexity behind it.....
I'd say the main part is - 'fs' would need to be able to know understand
it's living on provisioned device (something we actually do not want to,
as you can change 'state' in runtime - so 'fs' should be aware & unaware
at the same time ;) - checking with every request that thin-provisioning
is in the place would impact performance, doing in mount-time make it
also bad.
Then you need to deal with fact, that writes to filesystem are 'process'
aware, while writes to block-device are some anonymous page writes for your
page cache.
Have I said the level of problems for a single filesystem is totally different
story yet ?
So in a simple statement - thin-p has it's limits - if you are unhappy with
them, then you probably need to look for some other solution - or starting
sending patches and improve things around...
>
>> It's the same as you should not overcommit your RAM in case you do not
>> want to see OOM....
>
> But with RAM I'm sure you can typically see how much you have and can thus
> take account of that, filesystem will report wrong figure ;-).
Unfortunately you cannot....
Number of your free RAM is very fictional number ;) and you run in much bigger
problems if you start overcommiting memory in kernel....
You can't compare your user-space failing malloc and OOM crashing Firefox....
Block device runs in-kernel - and as root...
There are no reserves, all you know is you need to write block XY,
you have no idea what is the block about..
(That's where ZFS/Btrfs was supposed to excel - they KNOW.... :)
Regard
Zdenek
More information about the linux-lvm
mailing list