[linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

Fri Apr 7 22:24:15 UTC 2017

On Fri, Apr 7, 2017 at 5:12 AM, Gionatan Danti <g.danti at assyoma.it> wrote:

> Il 07-04-2017 10:19 Mark Mielke ha scritto:
>
>>
>> I found classic LVM snapshots to suffer terrible performance. I
>> switched to BTRFS as a result, until LVM thin pools became a real
>> thing, and I happily switched back.
>>
>
> So you are now on lvmthin? Can I ask on what pool/volume/filesystem size?

We use lvmthin in many areas... from Docker's dm-thinp driver, to XFS file
systems for PostgreSQL or other data that need multiple snapshots,
including point-in-time backup of certain snapshots. Then, multiple sizes.
I don't know that we have 8 TB anywhere right this second, but we are using
it in a variety of ranges from 20 GB to 4 TB.

>
>> I expect this depends on exactly what access patterns you have, how
>> many accesses will happen during the time the snapshot is held, and
>> whether you are using spindles or flash. Still, even with some attempt
>> to be objective and critical... I think I would basically never use
>> classic LVM snapshots for any purpose, ever.
>>
>
> Sure, but for nightly backups reduced performance should not be a problem.
> Moreover, increasing snapshot chunk size (eg: from default 4K to 64K) gives
> much faster write performance.
>

When you say "nightly", my experience is that processes are writing data
all of the time. If the backup takes 30 minutes to complete, then this is
30 minutes of writes that get accumulated, and subsequent performance
overhead of these writes.

But, we usually keep multiple hourly snapshots and multiply daily
snapshots, because we want the option to recover to different points in
time. With the classic LVM snapshot capability, I believe this is
essentially non-functional. While it can work with "1 short lived
snapshot", I don't think it works at all well for "3 hourly + 3 daily
snapshots".  Remember that each write to an area will require that area to
be replicated multiple times under classic LVM snapshots, before the
original write can be completed. Every additional snapshot is an additional
cost.

> I more concerned about lenghtly snapshot activation due to a big, linear
> CoW table that must be read completely...

I suspect this is a pre-optimization concern, in that you are concerned,
and you are theorizing about impact, but perhaps you haven't measured it
yourself, and if you did, you would find there was no reason to be
concerned. :-)

If you absolutely need a contiguous sequence of blocks for your drives,
because your I/O patterns benefit from this, or because your hardware has
poor seek performance (such as, perhaps a tape drive? :-) ), then classic
LVM snapshots would retain this ordering for the live copy, and the
snapshot could be as short lived as possible to minimize overhead to only
that time period.

But, in practice - I think the LVM authors of the thinpool solution
selected a default block size that would exhibit good behaviour on most
common storage solutions. You can adjust it, but in most cases I think I
don't bother, and just use the default. There is also the behaviour of the
systems in general to take into account in that even if you had a purely
contiguous sequence of blocks, your file system probably allocates files
all over the drive anyways. With XFS, I believe they do this for
concurrency, in that two different kernel threads can allocate new files
without blocking each other, because they schedule the writes to two
different areas of the disk, with separate inode tables.

So, I don't believe the contiguous sequence of blocks is normally a real
thing. Perhaps a security camera that is recording a 1+ TB video stream
might allocate contiguous, but basically nothing else does this.

To me, LVM thin volumes is the right answer to this problem. It's not
particularly new or novel either. Most "Enterprise" level storage systems
have had this capability for many years. At work, we use NetApp and they
take this to another level with their WAFL = Write-Anywhere-File-Layout.
For our private cloud solution based upon NetApp AFF 8080EX today, we have
disk shelves filled with flash drives, and NetApp is writing everything
"forwards", which extends the life of the flash drives, and allows us to
keep many snapshots of the data. But, it doesn't have to be flash to take
advantage of this. We also have large NetApp FAS 8080EX or 8060 with all
spindles, including 3.5" SATA disks. I was very happy to see this type of
technology make it back into LVM. I think this breathed new life into LVM,
and made it a practical solution for many new use cases beyond being just a
more flexible partition manager.

-- 
Mark Mielke <mark.mielke at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20170407/1b1de33d/attachment.htm>