[linux-lvm] exposing snapshot block device

Tue Oct 22 21:38:34 UTC 2019

Hi,

Il 22-10-2019 18:15 Stuart D. Gathman ha scritto:
> "Old" snapshots are exactly as efficient as thin when there is exactly
> one.  They only get inefficient with multiple snapshots.  On the other
> hand, thin volumes are as inefficient as an old LV with one snapshot.
> An old LV is as efficient, and as anti-fragile, as a partition.  Thin
> volumes are much more flexible, but depend on much more fragile 
> database
> like meta-data.

this is both true and false: while in the single-snapshot case 
performance remains acceptable even from fat snapshots, the btree 
representation (and more modern code) of the "new" (7+ years old now) 
thin snapshots gurantees significantly higher performance, at least on 
my tests.

Note #1: I know that the old snapshot code uses 4K chunks by default, 
versus the 64K chunks of thinsnap. That said, I recorded higher thinsnap 
performance even when using a 64K chunk size for old fat snapshots.
Note #2: I generally disable thinpool zeroing (as I use a filesystem 
layer on top of thin volumes).

I 100% agree that old LVM code, with its plain text metadata and 
continuous plain-text backups, is extremely reliable and easy to 
fix/correct.

> For this reason, I always prefer "old" LVs when the functionality of
> thin LVs are not actually needed.  I can even manually recover from
> trashed meta data by editing it, as it is human readable text.

My main use of fat logical volumes is for boot and root filesystems, 
while thin vols (and zfs datasets, but this is another story...) are 
used for data partitions.

The main thing that somewhat scares me is that (if things had not 
changed) thinvol uses a single root btree node: losing it means losing 
*all* thin volumes of a specific thin pool. Coupled with the fact that 
metadata dump are not as handy as with the old LVM code (no 
vgcfgrestore), it worries me.

> The "rollforward" must be applied to the backup image of the snapshot.
> If the admin gets it paired with the wrong backup, massive corruption
> ensues.  This could be automated.  E.g. the full image backup and
> external cow would have unique matching names.  Or the full image 
> backup
> could compute an md5 in parallel, which would be store with the cow.
> But none of those tools currently exist.

This is the reason why I have not used thin_delta in production: an 
error from my part in recovering the volume (ie: applying the wrong 
delta) would cause massive data corruption. My current setup for instant 
recovery *and* added resiliance is somewhat similar to that: RAID -> 
DRBD -> THINPOOL -> THINVOL w/periodic snapshots (with the DRBD layer 
replicating to a sibling machine).

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8