[dm-devel] [PATCH 0/6] Support DAX for device-mapper dm-linear devices

Kani, Toshimitsu toshi.kani at hpe.com
Tue Jun 14 18:00:06 UTC 2016


On Tue, 2016-06-14 at 11:41 -0400, Mike Snitzer wrote:
> On Tue, Jun 14 2016 at  9:50am -0400,
> Jeff Moyer <jmoyer at redhat.com> wrote:
> > "Kani, Toshimitsu" <toshi.kani at hpe.com> writes:
> > > > I had dm-linear and md-raid0 support on my list of things to look
> > > > at, did you have raid0 in your plans?
> > >
> > > Yes, I hope to extend further and raid0 is a good candidate. 
> >   
> > dm-flakey would allow more xfstests test cases to run.  I'd say that's
> > more important than linear or raid0.  ;-)
>
> Regardless of which target(s) grow DAX support the most pressing initial
> concern is getting the DM device stacking correct.  And verifying that
> IO that cross pmem device boundaries are being properly split by DM
> core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to
> max_io_len).

Agreed. I've briefly tested stacking and it seems working fine.  As for IO
crossing pmem device boundaries, __split_and_process_non_flush() is used
when the device is mounted without DAX option.  With DAX, this case is
handled by dm_blk_direct_access() that limits return size.  This leads the
caller to iterate (read/write) or fallback to a smaller size (mmap pfault).

> My hope is to nail down the DM core and its dependencies in block etc.
> Doing so in terms of dm-linear doesn't seem like wasted effort
> considering you told me it'd be useful to have for pmem devices.

Yes, I think dm-linear is useful as it gives more flexibility, ex. it allows
creating a large device with multiple pmem devices.

> > Also, the next step in this work is to then decide how to determine on
> > what numa node an LBA resides.  We had discussed this at a prior
> > plumbers conference, and I think the consensus was to use xattrs.
> > Toshi, do you also plan to do that work?
>
> How does the associated NUMA node relate to this?  Does the
> DM requests_queue need to be setup to only allocate from the NUMA node
> the pmem device is attached to?  I recently added support for this to
> DM.  But there will likely be some code need to propagate the NUMA node
> id accordingly.

Each pmem device has sysfs "numa_node" so that tools like numactl can be
used to bind application to run on the same locality as pmem device (since
CPU directly accesses to pmem).  This won't work well with mapped device
since it can be composed with multiple localities.  Locality info would need
to be managed file-basis as Jeff mentioned.

Thanks,
-Toshi




More information about the dm-devel mailing list