[linux-lvm] LVM performance vs direct dm-thin

Thu Feb 3 04:48:47 UTC 2022

On Mon, Jan 31, 2022 at 10:29:04PM +0100, Marian Csontos wrote:
> On Sun, Jan 30, 2022 at 11:17 PM Demi Marie Obenour <
> demi at invisiblethingslab.com> wrote:
> 
>> On Sun, Jan 30, 2022 at 04:39:30PM -0500, Stuart D. Gathman wrote:
>>> Your VM usage is different from ours - you seem to need to clone and
>>> activate a VM quickly (like a vps provider might need to do).  We
>>> generally have to buy more RAM to add a new VM :-), so performance of
>>> creating a new LV is the least of our worries.
>>
>> To put it mildly, yes :).  Ideally we could get VM boot time down to
>> 100ms or lower.
>>
> 
> Out of curiosity, is snapshot creation the main culprit to boot a VM in
> under 100ms? Does Qubes OS use tweaked linux distributions, to achieve the
> desired boot time?

The goal is 100ms from user action until PID 1 starts in the guest.
After that, it’s the job of whatever distro the guest is running.
Storage management is one area that needs to be optimized to achieve
this, though it is not the only one.

> Back to business. Perhaps I missed an answer to this question: Are the
> Qubes OS VMs throw away?  Throw away in the sense like many containers are
> - it's just a runtime which can be "easily" reconstructed. If so, you can
> ignore the safety belts and try to squeeze more performance by sacrificing
> (meta)data integrity.

Why does a trade-off need to be made here?  More specifically, why is it
not possible to be reasonably fast (a few ms) AND safe?

> And the answer to that question seems to be both Yes and No. Classical pets
> vs cattle.
> 
> As I understand it, except of the system VMs, there are at least two kinds
> of user domains and these have different requirements:
> 
> 1. few permanent pet VMs (Work, Personal, Banking, ...), in Qubes OS called
> AppVMs,
> 2. and many transient cattle VMs (e.g. for opening an attachment from
> email, or browsing web, or batch processing of received files) called
> Disposable VMs.
> 
> For AppVMs, there are only "few" of those and these are running most of the
> time so start time may be less important than data safety. Certainly
> creation time is only once in a while operation so I would say use LVM for
> these. And where snapshots are not required, use plain linear LVs, one less
> thing which could go wrong. However, AppVMs are created from Template VMs,
> so snapshots seem to be part of the system.

Snapshots are used and required *everywhere*.  Qubes OS offers
copy-on-write cloning support, and users expect it to be cheap, not
least because renaming a qube is implemented using it.  By default,
AppVM private and TemplateVM root volumes always have at least one
snapshot, to support `qvm-volume revert`.  Start time really matters
too; a user may not wish to have every qube running at once.

In short, performance and safety *both* matter, and data AND metadata
operations are performance-critical.

> But data may be on linear LVs
> anyway as these are not shared and these are the most important part of the
> system. And you can still use old style snapshots for backing up the data
> (and by backup I mean snapshot, copy, delete snapshot. Not a long term
> snapshot. And definitely not multiple snapshots).

Creating a qube is intended to be a cheap operation, so thin
provisioning of storage is required.  Qubes OS also relies heavily
on over-provisioning of storage, so linear LVs and old style snapshots
won’t fly.  Qubes OS does have a storage driver that uses dm-snapshot on
top of loop devices, but that is deprecated, since it cannot provide the
features Qubes OS requires.  As just one example, the default private
volume size is 2GiB, but many qubes use nowhere near this amount of disk
space.

> Now I realized there is the third kind of user domains - Template VMs.
> Similarly to App VM, there are only few of those, and creating them
> requires downloading an image, upgrading system on an existing template, or
> even installation of the system, so any LVM overhead is insignificant for
> these. Use thin volumes.
> 
> For the Disposable VMs it is the creation + startup time which matters. Use
> whatever is the fastest method. These are created from template VMs too.
> What LVM/DM has to offer here is external origin. So the templates
> themselves could be managed by LVM, and Qubes OS could use them as external
> origin for Disposable VMs using device mapper directly. These could be held
> in a disposable thin pool which can be reinitialized from scratch on host
> reboot, after a crash, or on a problem with the pool. As a bonus this would
> also address the absence of thin pool shrinking.

That is an interesting idea I had not considered, but it would add
substantial complexity to the storage management system.  More
generally, the same approach could be used for all volatile volumes,
which are intended to be thrown away after qube shutdown.  Qubes OS even
supports encrypting volatile volumes with an ephemeral key to guarantee
they are unrecoverable.  (Disposable VM private volumes should support
this, but currently do not.)

> I wonder if a pool of ready to be used VMs could solve some of the startup
> time issues - keep $POOL_SIZE VMs (all using LVM) ready and just inject the
> data to one of the VMs when needed and prepare a new one asynchronously. So
> you could have to some extent both the quick start and data safety as a
> solution for the hypothetical third kind of domains requiring them - e.g. a
> Disposable VM spawn to edit a file from a third party - you want to keep
> the state on a reboot or a system crash.

That is also a good idea, but it is orthoganal to which storage driver
is in use.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20220202/aec2a965/attachment.sig>