[dm-devel] Persistent memory interface

Doug Dumitru doug at easyco.com
Mon Jun 22 16:50:39 UTC 2015


I would like to comment on the BTT docs a bit.  There are some design
points you might want to consider.

First, real use cases will have no read/write collisions.  If you think of
a file system, the case of reading a block that is being written or writing
a single block twice just don't happen because the data itself is non
deterministic.  The driver still needs to handle these cases, but
optimizing it for this is not all that logical.

Lets start at the BTT table.  It would be useful if we could distinguish
between a stable block and one that is getting updated now.  An option is
to encode the TRIM/ERROR bits as four "states"  (stable, updating, trimmed,
error) and just use the BTT entry as an index.  This could probably point
directly to the FLOG entry.  The BTT table, at four bytes, has atomic
updates without locks, so two threads can simultaneously update it to point
to the FLOG table and then after the update, see if they won.  If they did
not win, they can wait for the first update to complete.  The FLOG table
could also have a parallel RAM based BTT2 table to store spinlocks or
linked-lists to handle collisions.  Then again, a simple spin or spin/sleep
is probably good enough.

The same works for readers.  If you read a block, check the BTT table after
you finish the read.  If it is the same, your read was good.  If it changed
underneath you, or is pointing to a FLOG block, then you need to wait or
re-read.  Again, the real-world frequency of collisions is very low.  This
would let you eliminate the RTT table entirely.

One final optimization would be to keep the BTT table both in standard RAM
as well as in NV RAM.  If standard RAM is faster, then reads could lookup
blocks without touching the NV driver.  For 512G, this is 1B blocks or 4G
of RAM.  Then again, if the NV RAM is just as fast, this would not help.
Perhaps an option.

I have gotten into a lot of trouble optimizing for fio collisions when
these collisions don't really impact real-workload performance.  The code
has to be "correct" in the collision case, but it does not really need to
be fast.

Doug Dumitru
EasyCo LLC



On Fri, Jun 19, 2015 at 9:50 AM, Verma, Vishal L <vishal.l.verma at intel.com>
wrote:

> On Fri, 2015-06-19 at 12:33 -0400, Mikulas Patocka wrote:
> > Hi
> >
> > I looked at the new the persistent memory block device driver
> > (drivers/block/pmem.c and arch/x86/kernel/pmem.c) and it seems that the
> > interface between them is incorrect.
> >
> > If I want to use persistent memory in another driver, for a different
> > purpose, how can I make sure that that drivers/block/pmem.c doesn't
> attach
> > to this piece of memory and export it? It seems not possible.
> > drivers/block/pmem.c attaches to everything without regard that there may
> > be other users of persistent memory.
> >
> > I think a correct solution would be to add a partition table at the
> > beginning of persistent memory area and this partition table would
> > describe which parts belong to which programs - so that different
> programs
> > could use persistent memory and not step over each other's data. Is there
> > some effort to standardize the partition table ongoing?
> >
> >
> > BTW. some journaling filesystems assume that 512-byte sector is written
> > atomically. drivers/block/pmem.c breaks this requirement. Persistent
> > memory only gurantees 8-byte atomic writes.
>
> Hi Mikulas,
>
> I can answer this part - The idea is that file systems that need sector
> atomicity will use the "Block Translation Table" (BTT) [1]. It would be
> a stacked block device on top of a pmem device (or partition), and file
> systems would be able to use it either for the entire space to get
> atomicity for all blocks, or if they want to use DAX, make two
> partitions, and enable the BTT only on one partition, and use it as the
> logdev.
>
>         -Vishal
>
> [1]: https://lkml.org/lkml/2015/6/17/950
>
> >
> > Mikulas
> > _______________________________________________
> > Linux-nvdimm mailing list
> > Linux-nvdimm at lists.01.org
> > https://lists.01.org/mailman/listinfo/linux-nvdimm
>
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>



-- 
Doug Dumitru
EasyCo LLC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20150622/5f285f13/attachment.htm>


More information about the dm-devel mailing list