[dm-devel] about the new snapshots

Tue Mar 24 08:40:11 UTC 2009

Hi Jon

The code is here: 
http://people.redhat.com/mpatocka/patches/kernel/new-snapshots/devel

(I'll soon upload a new version that can delete snapshots ... I have 
written it already, I hadn't done much testing yet)

The generic code is in dm-multisnap.c file. The Fujita+Daniel's exception 
store is in dm-multisnap-fujita-daniel.c, my exception store is in the 
rest of the files.

Each origin is represented with structure dm_multisnap, it contains list 
of attached snapshots, structures dm_multisnap_snap. It also contains 
pointer to exception-store-private part, struct exception_store_private 
and a method table (struct dm_multisnap_exception_store).

All I/O except reads from the origin is offloaded to a thread. This can be 
further optimized, that some more cases could be handled directly.

My snapshot store is composed as a B+tree with snapshot ranges, each entry 
contains key (chunk number, start snapshot ID, end snapshot ID) and a 
value of a new chunk where this is remapped. Thus, it has unlimited number 
of snapshots --- really limited to 2^32. It can be used for example for 
rolling snapshots, i.e. take snapshot every 5 minutes to record activity.

Things, you should know if you want to port this implementation into your 
code:

In case someone writes to the snapshots, there may be a need to create 
multiple copies. For example if you have snapshots 1,2,3 and someone 
writes to snapshot 2, it creates key [chunk number,2-2]. If someone writes 
to the origin, it needs to create two more records [chunk number,1-1] and 
[chunk number,3-3] and perform two copies. If there weren't write to the 
snapshot, it would create just one record [chunk number,1-3].

This is done in do_origin_write, the function resets exception store's 
search state machine with s->store->reset_query and then repeatedly asks 
if there is anything more to remap (s->store->query_next_remap) and if 
there is, add an entry to the b+tree (s->store->add_next_remap). When it 
finishes all remaps or when it fills all 8 kcopyd slots, it submits the 
request to kcopyd.

Fujita+Daniel's store uses bitmask, the upside is that it doesn't ever do 
multiple copies, so these methods are implemented in such a way that 
query_next_remap succeeds at most once and add_next_remap processes all 
the snapshots. The downside of the bitmask is that it is limited to max 64 
snapshots.

Another thing (that you'll need to implement in Fujita+Daniel's store too) 
is snapshot-to-snapshot copies. It happens when you write to a shared 
snapshot chunk. The write can't be dispatched directly (because it would 
be reflected to multiple snapshots), so it needs to perform a copy. I do 
it in do_snapshot_io, the method is s->store->make_chunk_writeable, it 
duplicates the record in the b+tree and returns the destination, where the 
data should be copied. After this method is called, kcopyd performs 
snapshot-to-snapshot copy.

Another abstraction that I added is check_conflicts method (called by 
check_pending_io) --- it checks if a given chunk+snapid conflicts with 
kcopyd I/O that is in progress. The exception store fills an union 
chunk_descriptor, that union is opaque to the generic code, and the 
generic code calls check_conflict method to check if there is conflict 
against that I/O. The rationale for this is that in my implementation, 
there may be many snapshots, one kcopyd action could create exceptions 
even for a range of several hundred snapshots, and it would be unsuitable 
to check them one-by-one for conflicts. For Fujita+Daniel's store, the 
union holds a 64-bit bitmap which IDs are copied. For my store, it holds 
the snapid range that is being copied.

When all kcopyd jobs finish, the method commit is called, it updates the 
exception store to reflect changes made so far. My store uses 
log-structure layout, this writes the new commit block that contains 
updated pointers. Fujita+Daniel's code doesn't have data consistency, so 
this only writes all dirty buffers, hoping that the computer won't crash 
between individual writes.

Unlike in the old snapshots, in my implementation the bio is retried after 
a kcopyd job --- when the job finishes, the bio is put back to the queue 
and checked against all the snapshots again. This simplifies design 
because not all snapshots need to be processed when processing the bio. 
For example, kcopyd has 8 slots, when all the slots are filled, there may 
still be unprocessed snapshots. But it doesn't matter, as they'll be 
retried.

In the old snapshots, this no-retry-bio thing is wrong too, it causes bug 
https://bugzilla.redhat.com/show_bug.cgi?id=182659. I already patched it 
(http://people.redhat.com/mpatocka/patches/kernel/merging/2.6.28/dm-snapshot-rework-origin-write.patch) 
but the patch was somehow forgotten by Alasdair, you may think about 
incorporating it into your patchset.

Mikulas