[dm-devel] [PATCH 2/2] dm-writecache
Mikulas Patocka
mpatocka at redhat.com
Tue Nov 28 04:01:43 UTC 2017
On Fri, 24 Nov 2017, Dan Williams wrote:
> On Wed, Nov 22, 2017 at 6:36 PM, Mikulas Patocka <mpatocka at redhat.com> wrote:
> > Hi
> >
> > Here I'm sending slighly update dm-writecache target.
> >
>
> I would expect some theory of operation documentation in the changelog
> or in Documentation/ for a new driver.
It caches writes for a block device in persistent memory. It doesn't cache
reads because reads are supposed to be cached in normal ram.
> > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> >
> [..]
> > +/*
> > + * The clflushopt instruction is very slow on Broadwell, so we use non-temporal
> > + * stores instead.
> > + */
>
> The clflushopt instruction's first instantiation in a cpu was after
> Broadwell. Also, the performance of clflushopt does not much matter
> when running on a system without ADR (asynchronous DRAM refresh) where
> flushing the cpu cache can just leave the writes stranded on their way
> to the DIMM. ADR is not available on general purpose servers until
> ACPI 6.x NFIT-defined persistent memory platforms.
There used to be pcommit instruction, but intel abandoned it - see
https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction
So, according to the document, flushing the cache should be enough for
writes to reach persistent memory.
> > +#ifdef CONFIG_X86_64
>
> In general something is broken if we end up with per-arch ifdefs like
> this in drivers to handle persistent memory. This should be using the
> pmem api or extensions of it, and we need to settle on a mechanism for
> upper-level drivers to ask if pmem is driving platform protected
> persistent memory.
>
> > +#define NT_STORE(dest, src) asm ("movnti %1, %0" : "=m"(dest) : "r"(src))
> > +#define FLUSH_RANGE(dax, ptr, size) do { } while (0)
> > +#define COMMIT_FLUSHED() wmb()
> > +#else
> > +#define NT_STORE(dest, src) ACCESS_ONCE(dest) = (src)
> > +#define FLUSH_RANGE dax_flush
> > +#define COMMIT_FLUSHED() do { } while (0)
>
> Is this just for test purposes? How does the user discover that they
> are running in a degraded mode as far as persistence guarantees? I
> think we should be falling back DM_WRITECACHE_ONLY_SSD mode if we're
> not on a pmem platform.
What degraded mode do you mean? According to that document, flushing cache
should be enough. If it is not enough, what else do I need to do to flush
cache?
The above code is not for test purposes. It is for performance purposes.
On dual-socket Broadwell server with persistent memory, when we write to
persistent memory using cached write instructions and use dax_flush
afterwards to flush cache for the affected range, the performance is about
350MB/s. It is practically unusable - worse than low-end SSDs.
On the other hand, the movnti instruction can sustain performance of one
8-byte write per clock cycle. We don't have to flush cache afterwards, the
only thing that must be done is to flush the write-combining buffer with
the sfence instruction. Movnti has much better throughput than dax_flush.
You argue that I should use the pmem api, but there is no pmem api
providing the movnti intruction. I have no choice, but coding the
instruction directly in assembler.
If you want to create API, take that "NT_STORE" definition and move it to
some x86-specific include file.
Mikulas
More information about the dm-devel
mailing list