[dm-devel] call for slideware ;)

Wed Feb 23 01:21:59 UTC 2011

On Thu, Feb 10 2011 at  9:59am -0500,
Joe Thornber <thornber at redhat.com> wrote:

> Hi Mike,
> 
> On Wed, 2011-02-09 at 18:16 -0500, Mike Snitzer wrote:
> > Joe and/or Heinz,
> > 
> > Could you provide a few slides on the thinp and shared snapshot
> > infrastructure and targets?  Planned features and performance
> > benefits,
> > etc.
> > 
> > 
> 
> I've started a new project on GitHub:
> 
> https://github.com/jthornber/storage-papers
> 
> Heinz and I have started putting stuff in there.

I just had a look at the latest content and have some questions (way
more than I'd imagine you'd like to see.. means I'm clearly missing a
lot):

1) from "Solution" slide:
   "Space comes from a preallocated ‘pool’, which is itself just another
   logical volume, thus can be resized on demand."
   ...
   "Separate metadata device simplifies extension, this is hidden by the
    LVM system so sys admin unlikely to be aware of it."
    Q: Can you elaborate on the role of the metadata?  It maps between
       physical "area" (allocated from pool) for all writes to the
       logical address space?
    Q: can thinp and snapshot metadata coexist in the same pool? -- ask
       similar question below.

2) from "Block size choice" slide:
   The larger the block size:
   - the less chance there is of fragmentation (describe this)
     Q: can you please "describe this"? :)
   - the less frequently we need the expensive mapping operation
     Q: "expensive" is all relative, seems you contradict the expense of
        the mapping operation in the "Performance" slide?
   - the smaller the metadata tables are, so more of them can be held in core
     at a time. Leading to faster access to the provisioned blocks by
     minimizing reading in mapping information
     Q: "more of them" -- "them" being metadata tables?  So the take
        away is more thinp devices available on the same host?

3) from "Performance" slide:
   "Expensive operation is mapping in a new ‘area’"
   Q: is area the same as a block in the pool?  Why not call block size:
   "area size"?  "Block size" is familiar to people?  Original snapshot
   had "chunk size".

4) Q: what did you decide to run with for reads to logical address space
      that weren't previously mapped?  Just return zeroes like was
      discussed on lvm-team?

The "Metadata object" section is where you lose me:

5) I'm not clear on the notion of "external" vs "internal" snapshots.
   Q: can you elaborate on their characteristics?
   Maybe the following question has some relation to external vs
   internal?

6) I'm not clear on how you're going to clone the metadata tree for
   userspace to walk (for snapshot merge, etc).  Is that "clone" really
   a snapshot of the metadata device? -- seems unlikely as you'd need a
   metadata device for your metadata device's snapshots?
   - you said: "Userland will be given the location of an alternative
     superblock for the metadata device. This is the root of a tree of
     blocks referring to other blocks in a variety of data structures
     (btrees, space maps etc.). Blocks will be shared with the ‘live’
     version of the metadata, their reference counts may change as
     sharing is broken, but we know the blocks will never be updated."
     - Q: is this describing an "internal snapshot"?

7) from the "thin' target section:
"All devices stored within a metadata object are instanced with this
target. Be they fully mapped devices, thin provisioned devices, internal
snapshots or external snapshots."
Q: what is a fully mapped device?

8) "The target line:

thin <pool object> <internal device id>"
Q: so by <pool object>, that is the _id_ of a pool object that was
returned from the 'create virtual device' message?

In general my understanding of all this shared store infrastructure is a
muddled.  I need the audience to take away big concepts not get tripped
up (or trip me up!) on the minutia.

Subtle inconsistencies and/or opaque explanation aren't helping, e.g.:
1) the detail of "Configuration/Use" for thinp volume
   - "Allocate (empty) logical volume for the thin provisioning pool"
      Q: how can it be "empty"?  Isn't it the data volume you hand to
         the pool target?
   - "Allocate small logical volume for the thin provisioning metadata"
      Q: before in "Solution" slide you said "Separate metadata device
         simplifies extension", can the metadata volume be extended too?
   - "Set up thin provisioning mapped device on aforementioned 2 LVs"
      Q: so there is no distinct step for creating a pool?
      Q: pool is implicitly created at the time the thinp device is
         created? (doubtful but how you enumerated the steps makes it
	 misleading/confusing).
      Q: can snapshot and thinp volumes share the same pool?
         (if possible I could see it being brittle?)
         (but expressing such capability will help the audience "get"
	 the fact that the pool is nicely abstracted/sound design,
	 etc).

versus:

2) the description of the 'pool' and 'thin' targets
   - "This (pool) target ties together a shared metadata volume and a
     shared data volume."
     Q: when does the "block size" get defined if it isn't provided in
        the target line of "pool"?
   - "Be they fully mapped devices, thin provisioned devices, internal
     snapshots or external snapshots."
     Q: where does the notion of a thinp-snapshot (or whatever you are
        calling it) get expressed as a distinct target?  This is all
	very opaque to me...

p.s. I was going to hold off sending this and take another pass of your
slides but decided your feedback to all my Q:s would likely be much more
helpful than me trying to parse the slides again.