[dm-devel] [PATCH] dm-writecache: improve performance on Optane-based persistent memory
Mikulas Patocka
mpatocka at redhat.com
Wed Apr 29 20:19:44 UTC 2020
On Wed, 29 Apr 2020, Heinz Mauelshagen wrote:
> On 4/29/20 6:30 PM, Mikulas Patocka wrote:
> > Hi
> >
> > This is the clflushopt patch for the next merge window.
> >
> > Mikulas
> >
> >
> > From: Mikulas Patocka <mpatocka at redhat.com>
> >
> > When testing the dm-writecache target on a real Optane-based persistent
> > memory, it turned out that explicit cache flushing using the clflushopt
> > instruction performs better than non-temporal stores for block sizes 1k,
> > 2k and 4k.
> >
> > This patch adds a new function memcpy_flushcache_optimized that tests if
> > clflushopt is present - and if it is, we use it instead of
> > memcpy_flushcache.
> >
> > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> >
> > ---
> > drivers/md/dm-writecache.c | 29 ++++++++++++++++++++++++++++-
> > 1 file changed, 28 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/drivers/md/dm-writecache.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-writecache.c 2020-04-29 18:09:53.599999000
> > +0200
> > +++ linux-2.6/drivers/md/dm-writecache.c 2020-04-29 18:22:36.139999000
> > +0200
> > @@ -1137,6 +1137,33 @@ static int writecache_message(struct dm_
> > return r;
> > }
> > +static void memcpy_flushcache_optimized(void *dest, void *source, size_t
> > size)
> > +{
> > + /*
> > + * clufhsopt performs better with block size 1024, 2048, 4096
> > + * non-temporal stores perform better with block size 512
> > + *
> > + * block size 512 1024 2048 4096
> > + * movnti 496 MB/s 642 MB/s 725 MB/s 744
> > MB/s
> > + * clflushopt 373 MB/s 688 MB/s 1.1 GB/s 1.2
> > GB/s
> > + */
> > +#ifdef CONFIG_X86
> > + if (static_cpu_has(X86_FEATURE_CLFLUSHOPT) &&
> > + likely(boot_cpu_data.x86_clflush_size == 64) &&
> > + likely(size >= 768)) {
> > + do {
> > + memcpy((void *)dest, (void *)source, 64);
> > + clflushopt((void *)dest);
> > + dest += 64;
> > + source += 64;
> > + size -= 64;
> > + } while (size >= 64);
> > + return;
>
>
> Aren't memory barriers needed for ordering before and after the loop?
>
> Heinz
This is called while holding the writecache lock - and wc_unlock serves as
a memory barrier.
Mikulas
More information about the dm-devel
mailing list