[libvirt] [Qemu-devel] live snapshot wiki updated

Blue Swirl blauwirbel at gmail.com
Fri Jul 22 15:49:46 UTC 2011

On Fri, Jul 22, 2011 at 8:06 AM, Stefan Hajnoczi <stefanha at gmail.com> wrote:
> On Thu, Jul 21, 2011 at 8:42 PM, Blue Swirl <blauwirbel at gmail.com> wrote:
>> On Thu, Jul 21, 2011 at 6:01 PM, Stefan Hajnoczi <stefanha at gmail.com> wrote:
>>> On Thu, Jul 21, 2011 at 3:02 PM, Eric Blake <eblake at redhat.com> wrote:
>>>> Thank you for persisting - you've found another hole that needs to be
>>>> plugged.  It sounds like you are proposing that after a qemu process dies,
>>>> that libvirt re-reads the qcow2 metadata headers, and validates that the
>>>> backing file information has not changed in a manner unexpected by libvirt.
>>>>  If it has, then the qemu process that just died was compromised to the
>>>> point that restarting a new qemu process from the old image is now a
>>>> security risk.  So this is _yet another_ security aspect that needs to be
>>>> coded into libvirt as part of hardening sVirt.
>>> The backing file information changes when image streaming completes.
>>> Before: fedora.img <- my_vm.qed
>>> After: my_vm.qed (fedora.img is no longer referenced)
>>> The image streaming operation copies data out of fedora.img and
>>> populates my_vm.qed.  When image streaming completes, the backing file
>>> is no longer needed and my_vm.qed is updated to drop the backing file.
>>> I think we need to design carefully to prevent QEMU and libvirt making
>>> incorrect assumptions about who does what.  I really wish that all
>>> this image file business was outside QEMU and libvirt - that we had a
>>> separate storage management service which handled the details.  QEMU
>>> would only do block device operations (no image format manipulation),
>>> and libvirt would only delegate to the storage management service.
>>> Today we seem to be sprinkling a little bit of storage management into
>>> QEMU and a little bit into libvirt :(.
>>> In that spirit it is much nicer to think of storage like a SAN
>>> appliance where you have LUNs that you access as block devices.  It
>>> also provides an API for snapshotting, cloning LUNs, etc.
>>> Let's move to that model instead of worrying about how to spread
>>> storage logic across QEMU and libvirt.
>> Would NBD protocol fit to this purpose, or is it too simple? Then
>> libvirt would handle the storage format completely and present an NBD
>> interface to QEMU (or give an fd to an external service) and QEMU
>> would not care about the storage format in this mode at all.
> NBD does not support flush (fdatasync).  Therefore it only supports
> the slow cache=writethrough mode in a safe manner.

Maybe NBD could still be used in networked setups as a secondary alternative.

> It would be neat to use virtio-blk as the interface because it can be
> passed through to the guest.  The guest talks directly to the storage
> management service without going through QEMU.  The trick is to do
> something like vhost:
> 1. An ioeventfd for virtqueue (guest->host) kicks
> 2. An irqfd for host->guest kicks
> 3. Shared memory for vring and zero-copy data access
> The storage management service provides a UNIX domain socket over
> which fds can be passed to set up the vhost-like virtio-blk interface.
> Moving the image format code into a separate program makes it possible
> to safely write to a backing file while VMs are using it because the
> storage service can be host-wide, not per-VM.  For example, streaming
> a shared backing file over NFS while running VMs using copy-on-write
> images.  If we ever want to do deduplication or other global
> operations, then this approach is nice too.
> To summarize:
> The storage service manages image files including creation, deletion,
> snapshotting, and actual I/O.  QEMU uses a vhost-like virtio-blk
> interface and can pass it directly into the guest.  libvirt uses the
> storage service API without needing to parse image files or keep track
> of backing file relationships.

Excellent plan. If one day kernel provides builtin virtio-blk services
which can be passed via libvirt and QEMU to the guest, we'll even have
zero copy all the way.

More information about the libvir-list mailing list