[linux-lvm] Caching policy in machine learning context

Wed Feb 15 13:30:29 UTC 2017

Thanks, I tried your suggestions, and tried going back to the mq policy and
play with those parameters. In the end, I tried:

lvchange --cachesettings 'migration_threshold=20000000
> sequential_threshold=10000000 read_promote_adjustment=1
> write_promote_adjustment=4' VG
>

With little success. This is probably due to the mq-policy looking only at
the hit-count, rather than the hit-rate. Or at least, that is what I make
up from line 595 in the code:
http://lxr.free-electrons.com/source/drivers/md/dm-cache-policy-mq.c?v=3.19#L595

I wrote a small script, so my users could empty the cache manually, if they
want to:

#!/bin/bash
> if [ "$(id -u)" != "0" ]; then
>    echo "This script must be run as root" 1>&2
>    exit 1
> fi
> lvremove -y VG/lv_cache
> lvcreate -L 445G -n lv_cache VG /dev/sda
> lvcreate -L 1G -n lv_cache_meta VG /dev/sda
> lvconvert -y --type cache-pool --poolmetadata VG/lv_cache_meta VG/lv_cache
> lvchange --cachepolicy smq VG
> lvconvert --type cache --cachepool VG/lv_cache VG/lv

So, the only remaining option for me, would to write my own policy. This
should be quite simple, as you basically need to act as if the cache is not
full yet.

Can someone point me in the right direction as to how to do this? I have
tried to find the last version of the code, but the best I could find was a
redhat CVS-server which times out when connecting.

cvs -d :pserver:cvs at sources.redhat.com:/cvs/dm login cvs
> CVS password:
> cvs [login aborted]: connect to sources.redhat.com(209.132.183.64):2401
> failed: Connection timed out

 Can someone direct me to the latest source of the smq-policy?

Yours sincerely,

Jonas

On 13 February 2017 at 15:33, Zdenek Kabelac <zdenek.kabelac at gmail.com>
wrote:

> Dne 13.2.2017 v 15:19 Jonas Degrave napsal(a):
>
>> I am on kernel version 4.4.0-62-generic. I cannot upgrade to kernel 4.9,
>> as it
>> did not play nice with
>> CUDA-drivers: https://devtalk.nvidia.com/def
>> ault/topic/974733/nvidia-linux-driver-367-57-and-up-do-not-
>> install-on-kernel-4-9-0-rc2-and-higher/
>> <https://devtalk.nvidia.com/default/topic/974733/nvidia-linu
>> x-driver-367-57-and-up-do-not-install-on-kernel-4-9-0-rc2-and-higher/>
>>
>> Yes, I understand the cache needs repeated usage of blocks, but my
>> question is
>> basically how many? And if I can lower that number?
>>
>> In our use case, you basically read a certain group of 100GB of data
>> completely about 100 times. Then another user logs in, and reads a
>> different
>> group of data about 100 times. But after a couple of such users, I observe
>> that only 20GB in total has been promoted to the cache. Even though the
>> cache
>> is 450GB big, and could easily fit all the data one user would need.
>>
>> So, I come to the conclusion that I need a more aggressive policy.
>>
>> I now have a reported hit rate of 19.0%, when there is so few data on the
>> volume that 73% of the data would fit in the cache. I could probably solve
>> this issue by making the caching policy more aggressive. I am looking for
>> a
>> way to do that.
>>
>
> There are 2 'knobs' - one is 'sequential_threshold' where cache tries
> to avoid promoting 'long' continuous reads into cache  - so if
> you do 100G reads then these likely meet the criteria and are avoided from
> being promoted (and I think this one is not configurable for smq.
>
> Other is 'migration_threshold' which limit bandwidth load on cache device.
>
> You can try to change its value:
>
> lvchange --cachesettings migration_threshold=10000000  vg/cachedlv
>
> (check with dmsetup status)
>
> Not sure thought how are there things configurable with smq cache policy.
>
> Regards
>
> Zdenek
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20170215/990a2ab1/attachment.htm>