[Libguestfs] [PATCH nbdkit 3/4] Add map filter.

Wed Aug 1 10:38:07 UTC 2018

On Tue, Jul 31, 2018 at 05:22:32PM -0500, Eric Blake wrote:
> Is there a syntax for explicitly mentioning a subset is unmapped
> even after a larger mapping is applied first (perhaps useful for
> redacting a portion of a disk containing sensitive information)?

It's a good idea for a TODO item, so I'll add it there.  At the moment
it's possible to express this in the map file, but only by positively
listing the regions you want to be mapped, not by negatively listing
regions you want unmapped.

> >+static int
> >+insert_mapping (struct map *map, const struct mapping *new_mapping)
> >+{
> >+  size_t i;
> >+
> >+  /* Adjust existing mappings if they overlap with this mapping. */
> >+  for (i = 0; i < map->nr_map; ++i) {
> >+    if (mappings_overlap (&map->map[i], new_mapping)) {
> >+      /* The four cases are:
> >+       *
> >+       * existing         +---+
> >+       * new        +-------------------+
> >+       *                       => erase existing mapping
> >+       *
> >+       * existing  +-------------------+
> >+       * new            +---+
> >+       *                       => split existing mapping into two
> 
> should that be 'two/three'?

The existing mapping is split into just two pieces, I think?  The
middle bit, hidden by the new mapping, gets discarded.

> >+       *
> >+       * existing          +-----------+
> >+       * new            +-----+
> >+       *                       => adjust start of existing mapping
> 
> or is it really a case that you first split into two, then adjust
> one of the two

I think the comment is correct, unless I'm misunderstanding what you
mean.  Note that where the new mapping overlaps the existing mapping,
the existing mapping is discarded (to maintain the invariant).

> >+You can also do obscure things like duplicating regions of the source:
> >+
> >+ # map file
> >+ 0,16K  0
> >+ 0,16K  16K
> >+
> >+                   ┌──────────────┬─── ─ ─ ─
> >+ Plugin serves ... │ aaaaaaaaaaaa │ (extra data)
> >+                   │    16K       │
> >+                   └──────────────┴─── ─ ─ ─
> >+ Filter                  │
> >+ transforms ...          └───┬──────────┐
> >+                             │          │
> >+                   ┌─────────▼────┬─────▼────────┐
> >+ Client sees ...   │ aaaaaaaaaaaa │ aaaaaaaaaaaa │
> >+                   └──────────────┴──────────────┘
> >+
> 
> When duplicating things, do we want to document that a single
> transaction is carried out in the order seen by the client (where
> aliases at later bytes overwrite any data written into the earlier
> alias in a long transaction), or do we want to put in hedge wording
> that (in the future) a request might be split into smaller regions
> that get operated on in parallel (thereby making the end contents
> indeterminate when writing to two aliases of the same byte in one
> transaction)?

I think I'd rather leave it unspecified.  I'll add some caveat text to
the documentation.

> 
> >+=head2 C<start-end>
> >+
> >+ start-end     offset
> >+
> >+means that the source region starting at byte C<start> through to byte
> >+C<end> (inclusive) is mapped to C<offset> through to
> >+C<offset+(end-start)> in the output.
> >+
> >+For example:
> >+
> >+ 1024-2047     2048
> >+
> >+maps the region starting at byte 1024 and ending at byte 2047
> >+(inclusive) to bytes 2048-3071 in the output.
> 
> Since you already support '2k', '2m' and such as shorthands for the
> start, is it worth creating a convenient shorthand for expressing
> '3M-1' for an end rather than having to write out 3145727?

Yeah I thought about that.  Unfortunately you quickly get into
needing to write a parser.  (Hey, what about "2^20-1"?!)

> >+
> >+=head2 C<start> to end of plugin
> >+
> >+ start-        offset
> >+ start         offset
> >+
> >+If the C<end> field is omitted it means "up to the end of the
> >+underlying plugin".
> >+
> >+=head2 Size modifiers
> >+
> >+You can use the usual power-of-2 size modifiers like C<K>, C<M> etc.
> >+
> >+=head2 Overlapping mappings
> >+
> >+If there are multiple mappings in the map file that may apply to a
> >+particular byte of the filter output then it is the last one in the
> >+file which applies.
> >+
> >+=head2 Virtual size
> >+
> >+The virtual size of the filter output finishes at the last byte of the
> >+final mapped region.  Note this is usually different from the size of
> >+the underlying plugin.
> 
> Is there a syntax for explicitly adding an unmapped tail, to make
> the filter's output longer than the underlying plugin's size?

In the later version of this filter, I documented that you can use the
truncate filter to do this.

As you've probably seen from the commit date, I've been working on the
map filter for nearly a month now.  It has been a frustrating
exercise!  I had it all working yesterday (including writes) and for
some reason this morning some change I have made has completely broken
it again :-(

Thanks for the feedback,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW