[dm-devel] dm-cache issue
Teodor Milkov
tm at del.bg
Sat Nov 19 17:07:50 UTC 2016
On 16.11.2016 16:06, Zdenek Kabelac wrote:
> Dne 16.11.2016 v 14:45 Teodor Milkov napsal(a):
>> On 16.11.2016 11:24, Zdenek Kabelac wrote:
>>> My first 'guess' in this reported case is - the disk I/O traffic
>>> seen is
>>> related to the 'reload' of cached chunks from disk back to cache.
>>>
>>> This will happen in the case, there has been unclean cache shutdown.
>>>
>>> However what is unclean is - why it slows down boot by hours.
>>> Is the cache too big??
>>
>> Indeed, cache is quite big – a 800GB SSD, but I found experimentally
>> that this
>> is the size where I get good cache hit ratios with my >10TB data volume.
>
> Yep - that's the current trouble of existing dm-cache target.
> It's getting inefficient when maintaining more then 1 million
> cache block entries - recent versions of lvm2 even do not allow
> create such cache without enforcing it.
> (so for 32k blocks it' ~30G cache data size)
I'm sorry for not being clear: similarly to the OP my SSD is split among
10 LVs, so eache cache is around 80GB.
>> As to the 'reload' vs 'flush' – I think it is flushing, because iirc
>> iostat
>> showed lots of SSD reading and HDD writing, but I'm not really sure
>> and need
>> to confirm that.
>>
>> So, are you saying that in case of unclean shutdown this 'reload' is
>> inevitable?
>
> Yes - clean shutdown is mandatory - otherwise cache can't know consitency
> and has to refresh itself. Other option would be probably to drop cache
> and let it rebuild - but you lose already gained 'knowledge' this way.
>
> Anyway AFAIK there is ongoing devel and up-streaming process for new
> cache target which will others couple shortcomings and should perform
> much
> better. lvm2 will supposedly handle transition to a new format in
> some way
> later.
>
>> How much time it takes obviously depends on the SSD size/speed & HDD
>> speed,
>> but with 800GB SSD it is reasonable to expect very long boot times.
>>
>>> Can you provide full logs from 'deactivation' and following activation?
>>
>> Any hints as to how to collect "full logs from 'deactivation' and
>> following
>> activation"? It happens early in the Debian boot process (I think
>> udev does
>> the activation) and I'm not sure how to enable logging... should I tweak
>> /etc/lvm/lvm.conf?
>
> All you need to collect is basically 'serial' console log from your
> machine - so if you have other box to trap serial console log - it's
> the most easiest option.
>
> But since you already said you use ~30times bigger cache size then
> the size with 'reasonable' performance - I think it's already clear
> where is your
> problem hidden.
>
> Until new target will be deployed - please consider to use
> significantly smaller cache size so the number of cache chunks is not
> above 1 000 000.
Thank you very much for your help! I'll give it another go at debugging
what the problem is.
I found dm-writeboost in write_around_mode (kinda write-through) works
well for me, so if I don't manage to get along with dm-cache I have plan B.
Best regards,
Teodor
More information about the dm-devel
mailing list