[linux-lvm] Testing the new LVM cache feature

Fri May 30 20:53:59 UTC 2014

On Fri, May 30 2014 at  2:16pm -0400,
Mike Snitzer <snitzer at redhat.com> wrote:

> On Fri, May 30 2014 at 11:28am -0400,
> Richard W.M. Jones <rjones at redhat.com> wrote:
> > 
> > This time the LVM cache test is about 10% slower than the HDD test.
> > I'm not sure what to make of that at all.
> 
> It could be that the 32k cache blocksize increased the metadata overhead
> enough to reduce the performance to that degree.
> 
> And even though you recreated the filesystem it still could be the case
> that the IO issued from ext4 is slightly misaligned.  I'd welcome you
> going to back to a blocksize of 64K (you don't _need_ to go to 64K but it
> seems you're giving up quite a bit of performance now).  And then
> collecting blktraces of the origin volume for the fio run -- to see if
> 64K * 2 IOs are being issued for each 64K fio IO.  I would think it
> would be fairly clear from the blktrace but maybe not.

Thinking about this a little more: if the IO that ext4 is issuing to the
cache is aligned on a blocksize boundary (e.g. 64K) we really shouldn't
see _any_ IO from the origin device when you are running fio.  The
reason is we avoid promoting (aka copying) from the origin if an entire
cache block is being overwritten.

Looking at the fio output from the cache run you did using the 32K
blocksize it is very clear that the MD array (on sda and sdb) is
involved quite a lot.

And your even older fio run output when using the original 64K blocksize
shows a bunch of IO to md127...

So it seems fairly clear that dm-cache isn't utilizing the cache block
overwrite optimization it has to avoid promotions from the origin.  This
would _seem_ to validate my concern about alignment.. or something else
needs to explain why we're not able to avoid promotions.

If you have time to reconfigure with 64K blocksize and rerun the fio
test, please look at the amount of write IO performed by md127 (and sda
and sdb).. and also look at the number of promotions, via 'dmsetup
status' for the cache device, before and after the fio run.

We can try to reproduce using a pristine ext4 filesystem ontop of
MD with the fio job you provided... and I'm now wondering if we're
getting bitten by DM stacked on MD (due to bvec merge being limited to 1
page, see linux.git commit 8cbeb67a for some additional context).  So it
may be worth trying _without_ MD raid1 just as a test.  Use either sda
or sdb directly as the origin volume.