[dm-devel] dm-cache performance behaviour

Wed Apr 6 11:58:09 UTC 2016

On Tue, Apr 05, 2016 at 12:12:27PM -0400, Mike Snitzer wrote:
> On Tue, Apr 05 2016 at 10:05am -0400,
> Andreas Herrmann <aherrmann at suse.com> wrote:
> 
> > On Tue, Apr 05, 2016 at 10:36:12AM +0200, Zdenek Kabelac wrote:
> > > Dne 5.4.2016 v 09:12 Andreas Herrmann napsal(a):
> > > >Hi,
> > > >
> > > >I've recently looked at performance behaviour of dm-cache and bcache.
> > > >I've repeatedly observed very low performance with dm-cache in
> > > >different tests. (Similar tests with bcache showed no such oddities.)
> > > >
> > > >To rule out user errors that might have caused this, I shortly describe
> > > >what I've done and observed.
> > > >
> > > >- tested kernel version: 4.5.0
> > > >
> > > >- backing device: 1.5 TB spinning drive
> > > >
> > > >- caching device: 128 GB SSD (used for metadata and cache and size
> > > >   of metadata part calculated based on
> > > >   https://www.redhat.com/archives/dm-devel/2012-December/msg00046.html)
> > > >
> > > >- my test procedure consisted of a sequence of tests performing fio
> > > >   runs with different data sets, fio randread performance (bandwidth
> > > >   and IOPS) were compared, fio was invoked using something like
> > > >
> > > >   fio --directory=/cached-device --rw=randread --name=fio-1 \
> > > >     --size=50G --group_reporting --ioengine=libaio \
> > > >     --direct=1 --iodepth=1 --runtime=40 --numjobs=1
> > > >
> > > >   I've iterated over 10 runs for each of numjobs=1,2,3 and varied the
> > > >   name parameter to operate with different data sets.
> > > >
> > > >   This procedure implied that with 3 jobs the underlying data set for
> > > >   the test consisted of 3 files with 50G each which exceeds the size
> > > >   of the caching device.
> > > >
> > > >- Between some tests I've tried to empty the cache. For dm-cache I did
> > > >   this by unmounting the "compound" cache device, switching to cleaner
> > > >   target, zeroing metadata part of the caching device, recreating
> > > >   caching device and finally recreating the compound cache device
> > > >   (during this procedure I kept the backing device unmodified).
> > > >
> > > >   I used dmsetup status to check for success of this operation
> > > >   (checking for #used_cache_blocks).
> > > >   If there is an easier way to do this please let me know -- If it's
> > > >   documented I've missed it.
> > > >
> > > >- dm-cache parameters:
> > > >   * cache_mode: writeback
> > > >   * block size: 512 sectors
> > > >   * migration_threshold 2048 (default)
> > > >
> > > >I've observed two oddities:
> > > >
> > > >   (1) Only fio tests with the first data set created (and thus
> > > >   initially occupying the cache) showed decent performance
> > > >   results. Subsequent fio tests with another data set showed poor
> > > >   performance. I think this indicates that SMQ policy does not
> > > >   properly promote/demote data to/from caching device in my tests.
> > > >
> > > >   (2) I've seen results where performance was actually below "native"
> > > >   (w/o caching) performance of the backing device. I think that this
> > > >   should not happen. If a data access falls back to the backing device
> > > >   due to a cache miss I would have expected to see almost the
> > > >   performance of the backing device. Maybe this points to a
> > > >   performance issue in SMQ -- spending too much time in policy code
> > > >   before falling back to the backing device.
> > > >
> > > >I've tried to figure out what actually happened in SMQ code in these
> > > >cases - but eventually dismissed this. Next I want to check whether
> > > >there might be a flaw in my test setup/dm-cache configuration.
> > > 
> > > Hi
> > > 
> > > The dm-cache SMQ/MQ is a 'slow moving' hot-spot cache.
> > 
> > Yep that is mentioned in some places in the source code with the
> > hot-spot handling stuff.
> > 
> > > So before the block is 'promoted' to the cache - there needs to be a
> > > reason for it - and it's not a plain single read.
> > 
> > It's not obvious to me when a block finally gets promoted. I had the
> > impression that once the cache is filled with data, getting new data
> > into the cache takes quite some time.
> > 
> > > So if the other cache promotes the block to the cache with a single
> > > block access you may observe different performance.
> > 
> > Yep, that is what my measurements suggest.
> > 
> > > dm-cache is not targeted for 'quick' promoting of read blocks into a
> > > cache - rather 'slow' moving of often used blocks.
> > 
> > If I completely abandon to use a set of test files (which defined
> > hotspot blocks initially) and switch to a new set of test files this
> > "slow" moving of often used (in the past) blocks might be the cause of
> > the lower than expected (by me) performance in my tests. Would it be
> > possible to tune this behaviour to allow quicker promotion if a user
> > thinks he requires it for his workload?
> > 
> > > Unsure how that fits your testing environment and what you try to
> > > actually test?
> > 
> > Worst results for spinning disks are random accesses. I've seen some
> > dm-cache benchmark results (fio randread) that showed lower
> > performance than the underlying backing device itself. That was the
> > trigger for me to take a closer look at dm-cache and bcache and to do
> > some performance measurements esp. with random read I/O pattern.
> > 
> > I've observed two oddities (from my point of view) and either they are
> > due to setup errors, wrong expectations, or point to real issues that
> > might be worth to be looked at or to be aware of.
> > I think at least its worth to share my testing results.
> > 
> > > Regards
> > > 
> > > PS: 256K dm-cache blocks size is quite large - it really depends
> > > upon workload - min supported size is 32K - lvm2 defaults to 64K...
> > 
> > I had chosen 512 as block size because documenation mentioned it.
> > 
> > I've kicked off a test with the minimum block size.
> > Let's see whether that changes anything.
> 
> Are you using smq or mq cache policy?  Please use smq.  It is much
> better about adapting to changing workloads.  mq has since been
> converted over to an alias for smq (in Linux 4.6-rc1).

I've used smq.

> As for your randread fio test, there needs to be some amount of
> redundant access.  randread on its own won't give you that.

Yep.

> fio does have random_distribution (see zipf and pareto, afaik zipf
> being more useful.. but I never actually got a compelling fio
> commandline together that made use of random_distribution to
> simulate hotspots).

Thanks for the hint. (So far I haven't modified fio's
random_distribution option.)

Out of curiosity: what do you use for performance tests of dm-cache
(e.g. to track regressions) to simulate hot-spots -- some private
scripts?

> Anyway, as Zdenek effectively: said dm-cache isn't a writecache.  If you
> need a writecache then bcache is the only option as of now.  Though
> there is an emerging DM writecache target that has stalled but can be
> revisited, see:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=writecache

Thanks,

Andreas