Setting 'nodatacow' on VM image when on Btrfs
lists at colorremedies.com
Thu Jul 9 20:15:50 UTC 2020
On Thu, Jul 9, 2020 at 12:27 PM Daniel P. Berrangé <berrange at redhat.com> wrote:
> On Thu, Jul 09, 2020 at 08:30:14AM -0600, Chris Murphy wrote:
> > It's generally recommended by upstream Btrfs development to set
> > 'nodatacow' using 'chattr +C' on the containing directory for VM
> > images. By setting it on the containing directory it means new VM
> > images inherit the attribute, including images copied to this
> > location.
> > But 'nodatacow' also implies no compression and no data checksums.
> > Could there be use cases in which it's preferred to do compression and
> > checksumming on VM images? In that case the fragmentation problem can
> > be mitigated by periodic defragmentation.
> Setting nodatacow makes sense particularly when using qcow2, since
> qcow2 is already potentially performing copy-on-write.
> Skipping compression and data checksums is reasonable in many cases,
> as the OS inside the VM is often going to be able todo this itself,
> so we don't want double checksums or double compression as it just
> wastes performance.
Good point. I am open to suggestion/recommendation by Felipe Borges
about GNOME Boxes' ability to do installs by injecting kickstarts. It
might sound nutty but it's quite sane to consider the guest do
something like plain XFS (no LVM). But all of my VM's are as you
describe: guest is btrfs with the checksums and compression, on a raw
file with chattr +C set.
> > Is this something libvirt can and should do? Possibly by default with
> > a way for users to opt out?
> The default /var/lib/libvirt/images directory is created by RPM
> at install time. Not sure if there's a way to get RPM to set
> attributes on the dir at this time ?
For lives it's rsync today. I'm not certain if rsync carries over
file attributes. tar does not. Also not sure if squashfs and
unsquashfs do either. So this might mean an anaconda post-install
script is a more reliable way to go, since Anaconda can support rsync
(Live) and rpm (netinstall,dvd) installs. And there is a proposal
dangling in the wind to use plain squashfs (no nested ext4 as today).
> Libvirt's storage pool APIs has support for setting "nodatacow"
> on a per-file basis when creating the images. This was added for
> btrfs benefit, but this is opt in and I'm not sure any mgmt tools
> actually use this right now. So in practice it probably dooesn't
> have any out of the box benefit.
> The storage pool APIs don't have any feature to set nodatacow
> for the directory as a whole, but probably we should add this.
systemd-journald checks for btrfs and sets +C on /var/log/journal if
it is. This can be inhibited by 'touch
> > Another option is to have the installer set 'chattr +C' in the short
> > term. But this doesn't help GNOME Boxes, since the user home isn't
> > created at installation time.
> > Three advantages of libvirt awareness of Btrfs:
> > (a) GNOME Boxes Cockpit, and other users of libvirt can then use this
> > same mechanism, and apply to their VM image locations.
> > (b) Create the containing directory as a subvolume instead of a directory
> > (1) btrfs snapshots are not recursive, therefore making this a
> > subvolume would prevent it from being snapshot, and thus (temporarily)
> > resuming datacow.
> > (2) in heavy workloads there can be lock contention on file
> > btrees; a subvolume is a dedicated file btree; so this would reduce
> > tendency for lock contention in heavy workloads (this is not likely a
> > desktop/laptop use case)
> Being able to create subvolumes sounds like a reasonable idea. We already
> have a ZFS specific storage driver that can do the ZFS equivalent.
> Again though we'll also need mgmt tools modified to take advantage of
> this. Not sure how we would make this all work out of the box, with
> the way we let RPM pre-create /var/lib/libvirt/images, as we'd need
> different behaviour depending on what filesystem you install the RPM
It seems reasonable to me that libvirtd can "own"
/var/lib/libvirt/images and make these decisions. i.e. if it's empty,
and if btrfs, then delete it and recreate as subvolume and also chattr
There's also precedent by systemd to create its own nested subvolumes
for containers run by nspawn.
More information about the libvir-list