[linux-lvm] Does LVM have any plan/schedule to support btrfs in fsadm

Tue Jun 29 22:32:05 UTC 2021

Il 2021-06-29 01:00 Chris Murphy ha scritto:
> Pretty sure it's fixed since 4.14.
> https://lkml.org/lkml/2019/2/10/23

Hi Chris, the headline states "harden against duplicate fsid". Does it 
means that the issue is "only" less likely or it was really solved?

> It's not inherently slow, it's a tracking cost problem as very large
> numbers of extents accumulate. And it also depends on the write
> pattern of the guest file system. If you use Btrfs in a guest on a
> host using Btrfs, it's a lot more competitive. There's certainly room
> for improvement, possibly with some hinting to avoid writing out a
> metric ton of 4KiB blocks as other file systems are prone to doing,
> where btrfs can turn these into  largely sequential writes, they lose
> any locality optimization the guest file system expects for subsequent
> reads. A lot of the locality issue is a factor on rotational devices.
> When talking about hundreds of thousands of extents per VM file, this
> has a noticeable impact on even SSDs, but the much reduced latency
> makes it tolerable for some scenarios.

I think the main issue stems for btrfs striking to have 4K CoW extents.
ZFS has a default 128K recordsize that, while commanding a fair 
read/modify/write overhead, works much better with HDDs (for SSDs one 
can lower recordize to 16K or 32K).
XFS with reflink does something similar, doing CoW at 128K block 
granularity (we had a similar discussion in the past: 
https://www.spinics.net/lists/linux-xfs/msg35679.html)

> But I've seen similar problems with VM's on LVM thinp when making many
> snapshots and incurring cow, however temporary (like a btrfs nodatacow
> file that's subject to snapshots or reflink copies; or a backing file
> on xfs likewise reflink copied). There really isn't much better we can
> do than LVM thick in this regard. And if that's the standard bearer,
> it's not much different if you fallocate a nodatacow file.

If I remember correctly thin LVM minimum chunk size should be 64K, 
making it much less prone to fragmentation. Moreover, it only CoW when a 
snapshot if overwritten for the first time (ZFS reallocates at each 
write and I think btrfs does something similar).

In a distant past, I benchmarked a virtual machine running on btrfs over 
a fallocated+nocow files and the result was quite bleak. Maybe things 
have improved more than I can imagine... time for some more benchmark I 
suppose! Do you have any to share?

> Some databases are cow friendly, notably rocksdb. And sqlite with wal
> enabled is at least not cow unfriendly. The worst offender seems to be
> postgresql but I haven't seen any benchmarking since the multiple
> kernel series of fsync work done on btrfs to improve the performance
> of databases in general; that was kernel 5.8 through 5.11.

Yeah, both PostgreSQL and MySQL tend to be slow on btrfs.

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8