[dm-devel] [PATCH for-3.15 1/3] dm: add era target

Mikulas Patocka mpatocka at redhat.com
Tue Mar 11 00:08:14 UTC 2014



On Thu, 6 Mar 2014, Mike Snitzer wrote:

> From: Joe Thornber <ejt at redhat.com>
> 
> dm-era is a target that behaves similar to the linear target.  In
> addition it keeps track of which blocks were written within a user
> defined period of time called an 'era'.  Each era target instance
> maintains the current era as a monotonically increasing 32-bit
> counter.
> 
> Use cases include tracking changed blocks for backup software, and
> partially invalidating the contents of a cache to restore cache
> coherency after rolling back a vendor snapshot.
> 
> dm-era is primarily expected to be paired with the dm-cache target.

Hi

I reviewed dm-era, here is a possible list of bugs:

There is no tracking of in-progress writes. Suppose that this happens:
1) a write is received, era_map sees that metadata_current_marked(era->md, 
   block) is true, so it lets the write through
2) the user sends the checkpoint message
3) the write received in step 1) hits the disk and modifies it
4) the user sends metadata_take_snap message, the write is not in the 
   snapshot, although it modified the disk between steps 2) and 4)
You can use the functions dm_internal_suspend and dm_internal_resume to 
clear in-progress bios - this is lightweight suspend/resume callable only 
from the kernel.

era_destroy calls metadata_close(era->md); without checking that era->md 
is not NULL. This can cause crash if era_destroy is called early in the 
constructor.

era->sectors_per_block should be validated that it is not zero.

There is no write memory barrier between the setting of rpc->result and 
rpc->complete:
                if (r)
                        list_for_each_entry_safe(rpc, tmp, &calls, list)
                                rpc->result = r;
        }

        list_for_each_entry_safe(rpc, tmp, &calls, list) {
                atomic_set(&rpc->complete, 1);
And there is no read memory barrier between the reading of rpc->complete 
and rpc->result:
        wait_event(rpc->wait, atomic_read(&rpc->complete));

        return rpc->result;
You should use completion (include/linux/completion.h) instead of 
rpc->complete and rpc->wait, it simplifies the code and takes care of 
memory barriers.

metadata_space_map_root is not aligned on 64-bit boundary, but it is 
accessed as 64-bit numbers in sm_ll_open_metadata.

Mixing 32-bit and 64-bit current_era (I'm not sure it this is a bug or 
not).

Missing read and write memory barriers around accesses to 
archived_writesets (I'm not sure if this is a bug, but it looks strange).

create_fresh_metadata doesn't free md->tm and md->sm on errors.

format_metadata doesn't undo the effect of create_fresh_metadata if 
write_superblock fails.

superblock_all_zeroes checks the full block for zeroes, although lvm 
zeroes only zeroes 4k at the beginning of a new logical volume.

test_bit and test_and_set_bit take signed integer as a parameter (on some 
architectures the argument is long, on some int), so you must restrict the 
maximum number of blocks to 2^31-1. Or, alternatively, do not use test_bit 
and test_and_set_bit and implement these functions on your own - this will 
allow you to have 64-bit block numbers.

The argument to bitset_size may overflow, it is not checked. If you allow 
64-bit argument to bitset_size, you must also check that the conversion to 
size_t does not overflow.

era_status: /* FIXME: do we need to protect this? */ dm_sm_get_nr_free - 
on 32-bit architectures you need to lock it because 64-bit accesses are 
nonatomic. You can see the implementation of i_size_read and i_size_write 
and do something similar.

era_map doesn't distinguish REQ_FLUSH bios, it should just pass them 
through without accessing bio->bi_iter.bi_sector.


Possible improvements:

atomic64_inc(&md->current_era); - there is no concurrent access to 
current_era, so you can use this to avoid the interlocked instruction:
atomic64_set(&md->current_era, atomic64_read(&md->current_era) + 1)

process_deferred_bios: deferred_lock is never taken from interrupt, so you 
can replace spin_lock_irqsave/spin_unlock_irqrestore with 
spin_lock/spin_unlock.

writeset_test_and_set: test_and_set_bit is interlocked instruction - here, 
no concurrent access happens, so use __test_and_set_bit, which is not 
interlocked. Alternatively - reimplement it if you want 2^31 or more 
blocks - see above.


Mikulas




More information about the dm-devel mailing list