[dm-devel] [PATCH] dax: remove the pmem_dax_ops->flush abstraction

Mikulas Patocka mpatocka at redhat.com
Fri Sep 1 02:49:19 UTC 2017



On Thu, 31 Aug 2017, Dan Williams wrote:

> On Thu, Aug 31, 2017 at 6:47 PM, Mikulas Patocka <mpatocka at redhat.com> wrote:
> > The patch abebfbe2f731 ("dm: add ->flush() dax operation support") is
> > buggy. A dm device may be composed of multiple underlying devices and all
> > of them need to be flushed. The patch just routes the flush request to the
> > first device and ignores the other devices.
> >
> > It could be fixed by adding more complex logic to the device mapper. But
> > there is only one implementation of the method pmem_dax_ops->flush - that
> > is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
> > don't need the pmem_dax_ops->flush abstraction at all, we can call
> > arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
> > couldn't ever reach anything different from arch_wb_cache_pmem().
> 
> Unfortunately, this is not true, see usage of DAXDEV_WRITE_CACHE. This
> is for platforms that arrange for cpu caches to be flushed on
> power-fail, like standalone storage appliances, where it would be a
> waste for the kernel to track and flush dirty cachelines for dax.
> Theoretically this could be done on a per-address range basis (think
> battery backing a subset of the system memory). I think we need to fix
> the routing to flush the same dax_device that ->direct_access() was
> invoked.

With my patch, dax_flush checks DAXDEV_WRITE_CACHE and doesn't do anything 
if it's not set - it just doesn't go through device mapper.

> > It should be also pointed out that for some uses of persistent memory it
> > is needed to flush only very small amount of data (such as 1 cacheline),
> > and it would be overkill if we go through that device mapper machinery for
> > a single flushed cache line.
> 
> I think this is more an argument to not enable DAX on that
> device-mapper topology if operation routing impacts performance. DAX
> is meant to get the kernel out of the way most of the time.

I don't understand what do you mean? I am writing a device mapper target 
that uses persistent memory as a cache. When flushing the metadata, 
sometimes it is needed to flush single cache lines. Ideally, I would use 
the clwb instruction for this purpose, but that clwb is hidden behind a 
complicated pmem_dax_ops->flush abstraction (and that abstraction will get 
even more complicated if implemented correctly). How do you think I should 
write that target without DAX?

Mikulas




More information about the dm-devel mailing list