[linux-lvm] Thin pool performance when allocating lots of blocks

Zdenek Kabelac zdenek.kabelac at gmail.com
Tue Feb 8 21:30:25 UTC 2022

Dne 08. 02. 22 v 22:02 Demi Marie Obenour napsal(a):
> On 2/8/22 15:37, Zdenek Kabelac wrote:
>> Dne 08. 02. 22 v 20:00 Demi Marie Obenour napsal(a):
>>> Are thin volumes (which start as snapshots of a blank volume) efficient
>>> for building virtual machine images?  Given the nature of this workload
>>> (writing to lots of new, possibly-small files, then copying data from
>>> them to a huge disk image), I expect that this will cause sharing to be
>>> broken many, many times, and the kernel code that breaks sharing appears
>>> to be rather heavyweight.  Furthermore, since zeroing is enabled, this
>>> might cause substantial write amplification.  Turning zeroing off is not
>>> an option for security reasons.
>>> Is there a way to determine if breaking sharing is the cause of
>>> performance problems?  If it is, are there any better solutions?
>> Hi
>> Usually the smaller the thin chunks size is the smaller the problem gets.
>> With current released version of thin-provisioning minimal chunk size is
>> 64KiB. So you can't use smaller value to further reduce this impact.
>> Note - even if you do a lot of tiny 4KiB writes  - only the 'first' such write
>> into 64K area breaks sharing all following writes to same location no longer
>> have this penalty (also zeroing with 64K is less impactful...)
>> But it's clear thin-provisioning comes with some price - so if it's not good
>> enough from time constrains some other solutions might need to be explored.
>> (i.e. caching, better hw, splitting  FS into multiple partitions with
>> 'read-only sections,....)
> Are the code paths that break sharing as heavyweight as I was worried
> about?  Would a hypothetical dm-thin2 that used dm-bio-prison-v2 be
> faster?

Biggest problem is the size of chunks - the smaller chunk you could use,
the less amplification you get. On the other hand the amount of metadata 
handling is increasing. Then there is a lot about parallelization, locking and 
disk synchronization.

If you are more interested in this topic, dive into kernel code.
Also I'd suggest to make some good benchmarking.



More information about the linux-lvm mailing list