[linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

Sat Apr 8 11:56:50 UTC 2017

Il 08-04-2017 00:24 Mark Mielke ha scritto:
> 
> We use lvmthin in many areas... from Docker's dm-thinp driver, to XFS
> file systems for PostgreSQL or other data that need multiple
> snapshots, including point-in-time backup of certain snapshots. Then,
> multiple sizes. I don't know that we have 8 TB anywhere right this
> second, but we are using it in a variety of ranges from 20 GB to 4 TB.
> 

Very interesting, this is the exact information I hoped to get. Thank 
you for reporting.

> 
> When you say "nightly", my experience is that processes are writing
> data all of the time. If the backup takes 30 minutes to complete, then
> this is 30 minutes of writes that get accumulated, and subsequent
> performance overhead of these writes.
> 
> But, we usually keep multiple hourly snapshots and multiply daily
> snapshots, because we want the option to recover to different points
> in time. With the classic LVM snapshot capability, I believe this is
> essentially non-functional. While it can work with "1 short lived
> snapshot", I don't think it works at all well for "3 hourly + 3 daily
> snapshots".  Remember that each write to an area will require that
> area to be replicated multiple times under classic LVM snapshots,
> before the original write can be completed. Every additional snapshot
> is an additional cost.

Right. For such a setup, classic LVM snapshot overhead would be 
enormous, grinding all to an halt.

> 
>> I more concerned about lenghtly snapshot activation due to a big,
>> linear CoW table that must be read completely...
> 
> I suspect this is a pre-optimization concern, in that you are
> concerned, and you are theorizing about impact, but perhaps you
> haven't measured it yourself, and if you did, you would find there was
> no reason to be concerned. :-)

For classic (non-thinly provided) LVM snapshot, relatively big metadata 
size was a know problem. Many talks happened on this list for this very 
topic. Basically, when the snapshot metadata size increased above a 
certain point (measured in some GB), snapshot activation failed due to 
timeout on LVM commands. This, in turn, was due that legacy snapshot 
behavior was not really tuned for long-lived, multi-gigabyte snapshots, 
rather for create-backup-remove behavior.

> 
> If you absolutely need a contiguous sequence of blocks for your
> drives, because your I/O patterns benefit from this, or because your
> hardware has poor seek performance (such as, perhaps a tape drive? :-)
> ), then classic LVM snapshots would retain this ordering for the live
> copy, and the snapshot could be as short lived as possible to minimize
> overhead to only that time period.
> 
> But, in practice - I think the LVM authors of the thinpool solution
> selected a default block size that would exhibit good behaviour on
> most common storage solutions. You can adjust it, but in most cases I
> think I don't bother, and just use the default. There is also the
> behaviour of the systems in general to take into account in that even
> if you had a purely contiguous sequence of blocks, your file system
> probably allocates files all over the drive anyways. With XFS, I
> believe they do this for concurrency, in that two different kernel
> threads can allocate new files without blocking each other, because
> they schedule the writes to two different areas of the disk, with
> separate inode tables.
> 
> So, I don't believe the contiguous sequence of blocks is normally a
> real thing. Perhaps a security camera that is recording a 1+ TB video
> stream might allocate contiguous, but basically nothing else does
> this.

True.

> 
> To me, LVM thin volumes is the right answer to this problem. It's not
> particularly new or novel either. Most "Enterprise" level storage
> systems have had this capability for many years. At work, we use
> NetApp and they take this to another level with their WAFL =
> Write-Anywhere-File-Layout. For our private cloud solution based upon
> NetApp AFF 8080EX today, we have disk shelves filled with flash
> drives, and NetApp is writing everything "forwards", which extends the
> life of the flash drives, and allows us to keep many snapshots of the
> data. But, it doesn't have to be flash to take advantage of this. We
> also have large NetApp FAS 8080EX or 8060 with all spindles, including
> 3.5" SATA disks. I was very happy to see this type of technology make
> it back into LVM. I think this breathed new life into LVM, and made it
> a practical solution for many new use cases beyond being just a more
> flexible partition manager.
> 
> --
> 
> Mark Mielke <mark.mielke at gmail.com>

Yeah, CoW-enabled filesystem are really cool ;) Too bad BTRFS has very 
low performance when used as VM backing store...

Thank you very much Mark, I really appreciate the information you 
provided.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8