dm-cache size is limited by 946GB

CoolCold coolthecold at gmail.com
Sun Jul 23 13:48:37 UTC 2017

We started to adopting new servers for image storages, and hit the
strange problem - no caching happens for cache lv > 946GB (so 947GB
and above do no work).

Storage box looks like:
2x240GB SSD for system (sw raid 1, lvm on top)
20x1.8TB SATA HDD for data  (sw raid 10, md124 + lvm on top)
4x960GB SSD for dm-caching puprose (sw raid5, md125).

Our naive approach was to create PV from md125 and make it all cache -
around 2.6TB of cache for 16TB of "raw" data should be quite good.
Cache created successfully, has seen the whole 2.6TB, but after
copying ~ 3TB data from old box, we still got only misses for reads
and writes in statistics and almost no activity in iostat for md125.
When i say "almost no activity" it was having some write operations,
but zeroes in KB -
https://gist.github.com/CoolCold/f79bb706d4dd1c083a4f4ed0ebd850d5 -
where dm-2 and dm-3 are cache data and cache meta volumes accordingly.

We have "old" servers which are running a bit different setup in
number of drives, they have 350-750GB of space for caching and it
works well. We tried to reduce cache size for new box, it worked for
80GB, so bisected to 946GB.

It doesn't look like any "magic" number (I though may be some problems
around 2TB for signed/unsigned or so) and right now i'm out of ideas
what the problem may be and need your advice.

Kernel version we are using:

[root at xxx rovchinnikov]# lvs --version
  LVM version:     2.02.166(2)-RHEL7 (2016-11-16)
  Library version: 1.02.135-RHEL7 (2016-11-16)
  Driver version:  4.34.0

Best regards,

