[dm-devel] [PATCH 4/4] dm: implement no-clone optimization

Fri Feb 15 14:09:16 UTC 2019

On Thu, 14 Feb 2019, Mike Snitzer wrote:

> On Thu, Feb 14 2019 at 11:54am -0500,
> Mikulas Patocka <mpatocka at redhat.com> wrote:
> 
> > > > x86-64, 2x six-core
> > > > /dev/ram0					2449MiB/s
> > > > /dev/mapper/lin 5.0-rc without optimization	1970MiB/s
> > > > /dev/mapper/lin 5.0-rc with optimization	2238MiB/s
> > > > 
> > > > arm64, quad core:
> > > > /dev/ram0					457MiB/s
> > > > /dev/mapper/lin 5.0-rc without optimization	325MiB/s
> > > > /dev/mapper/lin 5.0-rc with optimization	364MiB/s
> > > > 
> > > > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > > 
> > > Nice performance improvement.  But each device should have its own
> > > mempool for dm_noclone + front padding.  So it should be wired into
> > > dm_alloc_md_mempools().
> > 
> > We don't need to use mempools - if the slab allocation fails, we fall back 
> > to the cloning path that has mempools.
> 
> But the implementation benefits from each DM device having control over
> any extra memory it'd like to use for front padding.  Same as is done
> now for the full-blown DM core with cloning.

If the machine is out of memory, you alredy have much more serious 
problems to deal with - attempting to optimize I/O by 13% doesn't make 
sense.

> > > It is fine if you don't actually deal with supporting per-bio-data in 
> > > this patch, but a follow-on patch to add support for noclone-based 
> > > per-bio-data shouldn't be expected to refactor the location of the 
> > > mempool allocation (module vs per-device granularity).
> > > 
> > > Mike
> > 
> > I tried to use per-bio-data and other features - and it makes the 
> > structure dm_noclone and function noclone_endio grow:
> > 
> > #define DM_NOCLONE_MAGIC 9693664
> > struct dm_noclone {
> > 	struct mapped_device *md;
> > 	struct dm_target *ti;
> > 	struct bio *bio;
> > 	struct bvec_iter orig_bi_iter;
> > 	bio_end_io_t *orig_bi_end_io;
> > 	void *orig_bi_private;
> > 	unsigned long start_time;
> > 	/* ... per-bio data ... */
> > 	/* DM_NOCLONE_MAGIC */
> > };
> > 
> > And this growth degrades performance on linear target - from 2238MiB/s to 
> > 2145MiB/s.
> 
> It shouldn't if done properly.. for linear there wouldn't be any growth.

That means variable structure length depending on target?

Other targets are so slow that they don't need this optimization at all - 
for example dm-thin has 80 - 110MiB/s for the same use case - an 
optimization that improves performance of linear by 13% has no effect 
here.

If we had a target that performs as well as linear or striped, this 
optimization could be enabled for it.

Mikulas