[dm-devel] [patch 4/4] dm-writecache: use new API for flushing

Dan Williams dan.j.williams at intel.com
Sat May 26 15:26:36 UTC 2018


On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka at redhat.com> wrote:
>
>
> On Fri, 25 May 2018, Dan Williams wrote:
>
>> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer at redhat.com> wrote:
>> > On Fri, May 25 2018 at  2:17am -0400,
>> > Mikulas Patocka <mpatocka at redhat.com> wrote:
>> >
>> >> On Thu, 24 May 2018, Dan Williams wrote:
>> >>
>> >> > I don't want to grow driver-local wrappers for pmem. You should use
>> >> > memcpy_flushcache directly() and if an architecture does not define
>> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
>> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
>> >> > see a need to add a standalone flush operation if all relevant archs
>> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
>> >> > directly since all archs define it. Alternatively we could introduce
>> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
>> >> > routine and memcpy_flushcache() would imply a wmb().
>> >>
>> >> But memcpy_flushcache() on ARM64 is slow.
>>
>> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
>> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
>> that as a trailblazing requirement, I see that as typical review and a
>> reduction of the operation space that you are proposing.
>
> memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
> slower than memcpy and explicit cache flush.
>
> Suppose that you want to write data to a block device and make it
> persistent. So you send a WRITE bio and then a FLUSH bio.
>
> Now - how to implement these two bios on persistent memory:
>
> On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
> wmb() - this is the optimal implementation.
>
> But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
> implementation is that the WRITE bio does just memcpy() and the FLUSH bio
> does arch_wb_cache_pmem() on the affected range.
>
> Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
> doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
> as memcpy() followed by a cache flush.
>
> Now - if you flush the cache immediatelly after memcpy, the cache is full
> of dirty lines and the cache-flushing code has to write these lines back
> and that is slow.
>
> If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
> received), the processor already flushed some part of the cache on its
> own, so the cache-flushing function has less work to do and it is faster.
>
> So the conclusion is - don't use memcpy_flushcache on ARM. This problem
> cannot be fixed by a better implementation of memcpy_flushcache.

It sounds like ARM might be better off with mapping its pmem as
write-through rather than write-back, and skip the explicit cache
management altogether. You speak of "optimal" and "sub-optimal", but
what would be more clear is fio measurements of the relative IOPs and
latency profiles of the different approaches. The reason I am
continuing to push here is that reducing the operation space from
'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
maintenance long term.




More information about the dm-devel mailing list