[linux-lvm] Snapshots and disk re-use

Thu Feb 24 00:09:13 UTC 2011

On 23/02/11 23:25, Stuart D. Gathman wrote:
> On Wed, 23 Feb 2011, Jonathan Tripathy wrote:
>
>> Give that I currently follow the current procedure for removing and adding
>> customers:
>>
>> To remove customer: zero out customer LV, then remove LV
>> To add customer: create a new LV
>>
>> And I want to run backups of a customer's LV using snapshots I just:
>> create a snapshot of the customer LV, then use rsync, then remove the
>> snapshot. Is there anything I should do to prevent cross-customer data
>> leakage?
> You are still ambiguous.  If by "create a new LV", you mean a new LV
> that is not an LVM snapshot, then just zero it out when you create it
> (rather than when you delete it).
This is exactly what I mean. However I was hoping to avoid zeroing an LV 
on creation due to the long amount of time it takes to perform such a 
task (about 40 mins). It takes so long as I have to select a small 
enough block size so that disk bandwidth is still available to the other 
customers using their VPSes.

This is why I'm trying to understand where data is stored while a 
snapshot is created for an origin. All my snapshots will be read only.

So I'm guessing then, that when a snapshot is created for an origin, 
then there are 2 physical copies of the data on disk? (Albeit only one 
is accessible at the regular filesystem level)

> IDEA - it seems that the device mapper could logically zero an LV by
> simply returning blocks of zero on reads until the corresponding block
> it written.  Yeah, would require overhead to track which blocks have
> been written.  That overhead could be 1 bit for each of fairly large blocks,
> and be fairly small, fit into ram easily, and be stored in a logically
> zeroed block and discarded when the last block is written.  So effectively
> it only requires storing a pointer in meta-data to the current block where
> the bitmap is stored.  I can see that compared to the simplicity of
> simply writing zeroes on allocation, it might not be worth it.
>
This sounds like a really good idea. Would be very useful in a 
multi-tenant environment. However it should be an option feature.