[dm-devel] [PATCH 4/4] dm: implement no-clone optimization
Mikulas Patocka
mpatocka at redhat.com
Thu Feb 14 16:54:56 UTC 2019
On Thu, 14 Feb 2019, Mike Snitzer wrote:
> On Thu, Feb 14 2019 at 10:00am -0500,
> Mikulas Patocka <mpatocka at redhat.com> wrote:
>
> > This patch improves performance of dm-linear and dm-striped targets.
> > Device mapper copies the whole bio and passes it to the lower layer. This
> > copying may be avoided in special cases.
> >
> > This patch changes the logic so that instead of copying the bio we
> > allocate a structure dm_noclone (it has only 4 entries), save the values
> > bi_end_io and bi_private in it, overwrite these values in the bio and pass
> > the bio to the lower block device.
> >
> > When the bio is finished, the function noclone_endio restores te values
> > bi_end_io and bi_private and passes the bio to the original bi_end_io
> > function.
> >
> > This optimization can only be done by dm-linear and dm-striped targets,
> > the target can op-in by setting ti->no_clone = true.
> >
> > Performance improvement:
> >
> > # modprobe brd rd_size=1048576
> > # dd if=/dev/zero of=/dev/ram0 bs=1M oflag=direct
> > # dmsetup create lin --table "0 2097152 linear /dev/ram0 0"
> > # fio --ioengine=psync --iodepth=1 --rw=read --bs=512 --direct=1 --numjobs=12 --time_based --runtime=10 --group_reporting --name=/dev/mapper/lin
> >
> > x86-64, 2x six-core
> > /dev/ram0 2449MiB/s
> > /dev/mapper/lin 5.0-rc without optimization 1970MiB/s
> > /dev/mapper/lin 5.0-rc with optimization 2238MiB/s
> >
> > arm64, quad core:
> > /dev/ram0 457MiB/s
> > /dev/mapper/lin 5.0-rc without optimization 325MiB/s
> > /dev/mapper/lin 5.0-rc with optimization 364MiB/s
> >
> > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
>
> Nice performance improvement. But each device should have its own
> mempool for dm_noclone + front padding. So it should be wired into
> dm_alloc_md_mempools().
We don't need to use mempools - if the slab allocation fails, we fall back
to the cloning path that has mempools.
> It is fine if you don't actually deal with supporting per-bio-data in
> this patch, but a follow-on patch to add support for noclone-based
> per-bio-data shouldn't be expected to refactor the location of the
> mempool allocation (module vs per-device granularity).
>
> Mike
I tried to use per-bio-data and other features - and it makes the
structure dm_noclone and function noclone_endio grow:
#define DM_NOCLONE_MAGIC 9693664
struct dm_noclone {
struct mapped_device *md;
struct dm_target *ti;
struct bio *bio;
struct bvec_iter orig_bi_iter;
bio_end_io_t *orig_bi_end_io;
void *orig_bi_private;
unsigned long start_time;
/* ... per-bio data ... */
/* DM_NOCLONE_MAGIC */
};
And this growth degrades performance on linear target - from 2238MiB/s to
2145MiB/s.
Mikulas
More information about the dm-devel
mailing list