[linux-lvm] when bringing dm-cache online, consumes all memory and reboots

Mon Mar 23 09:57:03 UTC 2020

Dne 23. 03. 20 v 9:26 Joe Thornber napsal(a):
> On Sun, Mar 22, 2020 at 10:57:35AM -0700, Scott Mcdermott wrote:
>> have a 931.5 GibiByte SSD pair in raid1 (mdraid) as cache LV for a
>> data LV on 1.8 TebiByte raid1 (mdraid) pair of larger spinning disk.
>> these disks are hosted by a small 4GB big.little ARM system
>> running4.4.192-rk3399 (armbian 5.98 bionic).  parameters were set
>> with: lvconvert --type cache --cachemode writeback --cachepolicy smq
>> --cachesettings migration_threshold=10000000
> 
> If you crash then the cache assumes all blocks are dirty and performs
> a full writeback.  You have set the migration_threshold extremely high
> so I think this writeback process is just submitting far too much io at once.
> 
> Bring it down to around 2048 and try again.
> 

Hi

Users should be 'performing' some benchmarking about the 'useful' sizes of
hotspot areas - using nearly 1T of cache for 1.8T of origin doesn't look
the right ration for caching.
(i.e. like if your CPU cache would be halve of your DRAM)

Too big 'cache size' leads usually into way too big caching chunks
(since we try to limit number of 'chunks' in cache to 1 milion  - you
can rise up this limit - but it will consume a lot of your RAM space as well)
So IMHO I'd recommend to use at most 512K chunks - which gives you
about 256GiB of cache size -  but still users should be benchmarking what is 
the best for them...)

Another hint - lvm2 introduced support for new dm-writecache target as well.
So if you intent to accelerate mainly 'write throughput' - dm-cache isn't
the one with highest performance here.

Regards

Zdenek