[dm-devel] Re: dm snapshot: allow live exception store handover between tables

Mike Snitzer snitzer at redhat.com
Wed Nov 11 23:35:36 UTC 2009


On Wed, Nov 11 2009 at  3:46pm -0500,
Mikulas Patocka <mpatocka at redhat.com> wrote:

> > > Hi
> > > 
> > > Please drop snapshot-linking through this "handover_snap" field, it will 
> > > create real or possible problems with locking or with dangling pointers 
> > > (i.e. are both properly locked? Which to lock first? If you destroy a 
> > > snapshot, you must check if something links to it, etc. --- these problems 
> > > could be solved, but if they can be avoided by dropping linking, it's 
> > > better to avoid them).
> > 
> > I agree, lock_snapshots_for_handover() is too complicated.  But by dropping
> > these 'handover_snap' associations we'll lose the ability to have tight
> > constraints on what handover will take place on resume of new snapshot
> > (ideally a source of handover will only have one destination).
> > 
> > The new snapshot (destination of exception handover) is not registered
> > with the origin until after resume's handover_exceptions (via
> > register_snapshot).  And therefore it can't be found via
> > find_snapshot_using_cow().  But that is a good thing as we'll know that
> > if find_snapshot_using_cow does return a snapshot then it is the source
> > of the handover.
> 
> It is registered in snapshot_ctr --- that's how it was originally written. 
> So it should be possible to find it.
>
> You moved the registration away? So move it back.

Why?  That would be counter-productive.

register_snapshot() is skipped if handover will be performed.  The
reason is register_snapshot() looks at the store's chunk_size.  Problem
is we don't have the store yet.  Your handover relied on rereading the
snapshot header.  I no longer read_metadata() at all if the store will
be handed over.
 
> I'm somehow concerned, how this simple handover code (that I wrote 1.5 
> years ago, on a train way from Berlin to Prague) is getting more and more 
> complicated, just for handling some "can't happen" cases.

Believe me, I'd rather not be lingering over the finer points of
exception handover; but once I hit the fact that snapshot_ctr() was
imposing that the COW device must not be suspended it opened a can or
worms (lvchange --refresh exposed this, in general it is fair to assume
the cow device isn't suspended in snapshot_ctr :).  

The handover-based snapshot_ctr() needing to access the COW device
begged the followong questions:
why are we reading the snapshot header via read_metadata()?
Why are we doing any IO to the COW if it will just be handed over?

Answer: we don't have a real need, so don't.  Not only that, but
Alasdair asserted read_metadata() must not be called if another snapshot
is already actively using the COW.  What if something else is changing
it at the same time that the new snapshot_ctr() re-reads it?

Anyway, best to avoid all of that (read_metadata and register_snapshot)
if we're doing exception handover.

> First of all, you don't have to handle user's misbehavior --- if the user 
> writes dd if=/dev/zero of=/dev/hda, he loses his data --- there is no need 
> to protect the user from doing it. Similarly, if the user loads bad dm 
> table, he may lose his data too and there is little that can be done.
> 
> You can add protection from double target load as a bonus --- but do it 
> only as long as it doesn't complicate the code. If this protection from 
> double load would complicate the code too much, then don't do it.

I just posted the latest version of the handover patch (v7); it no
longer has 'handover_snap' pointers.  As such the locking is simplified.

It also handles protecting against double target loads.

> Last night, when talking with Alasdair, we found out that the "active" 
> variable could be used as a token --- there could be at most one active 
> snapshot and on handover, you would set "active=0" on the old snapshot and 
> "active=1" on the new snapshot, always enforcing that there is just one 
> snapshot active.

Right, find_snapshot_using_cow() in v7 of the patch does make use of
active to know to overlook a snapshot on the origin's 'snapshots' list
that are not active (snapshot may be left on the list after handover,
snapshot_dtr() hasn't come through yet, but it is both invalid and
inactive).  v7 handover_exceptions() sets: snap_src->active = 0;

> But generally:
> 1. make handover work the simplest way without any protection (like I 
> wrote it).
> 2. add the protection from double load only if it doesn't complicate the 
> code too much (for example, checking that there is just one "active" 
> snapshot should be relatively easy).

I think v7 is fairly robust; I've tested it pretty extensively both
standalone and with the rest of the snapshot-merge patches.  As always
the full snapshot-merge DM patchset is available here:

http://people.redhat.com/msnitzer/patches/snapshot-merge/kernel/2.6.32/




More information about the dm-devel mailing list