[libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

Alexander Graf agraf at suse.de
Tue Sep 7 14:01:41 UTC 2010


On 07.09.2010, at 15:41, Anthony Liguori wrote:

> Hi,
> 
> We've got copy-on-read and image streaming working in QED and before going much further, I wanted to bounce some interfaces off of the libvirt folks to make sure our final interface makes sense.
> 
> Here's the basic idea:
> 
> Today, you can create images based on base images that are copy on write.  With QED, we also support copy on read which forces a copy from the backing image on read requests and write requests.
> 
> In additional to copy on read, we introduce a notion of streaming a block device which means that we search for an unallocated region of the leaf image and force a copy-on-read operation.
> 
> The combination of copy-on-read and streaming means that you can start a guest based on slow storage (like over the network) and bring in blocks on demand while also having a deterministic mechanism to complete the transfer.
> 
> The interface for copy-on-read is just an option within qemu-img create.  Streaming, on the other hand, requires a bit more thought.  Today, I have a monitor command that does the following:
> 
> stream <device> <sector offset>
> 
> Which will try to stream the minimal amount of data for a single I/O operation and then return how many sectors were successfully streamed.
> 
> The idea about how to drive this interface is a loop like:
> 
> offset = 0;
> while offset < image_size:
>   wait_for_idle_time()
>   count = stream(device, offset)
>   offset += count
> 
> Obviously, the "wait_for_idle_time()" requires wide system awareness.  The thing I'm not sure about is 1) would libvirt want to expose a similar stream interface and let management software determine idle time 2) attempt to detect idle time on it's own and provide a higher level interface.  If (2), the question then becomes whether we should try to do this within qemu and provide libvirt a higher level interface.

I'm torn here too. Why not expose both? Have a qemu internal daemon available that gets a sleep time as parameter and an external "pull sectors" command. We'll see which one is more useful, but I don't think it's too much code to justify only having one of the two. And the internal daemon could be started using a command line parameter, which helps non-managed users.

> 
> A related topic is block migration.  Today we support pre-copy migration which means we transfer the block device and then do a live migration.  Another approach is to do a live migration, and on the source, run a block server using image streaming on the destination to move the device.
> 
> With QED, to implement this one would:
> 
> 1) launch qemu-nbd on the source while the guest is running
> 2) create a qed file on the destination with copy-on-read enabled and a backing file using nbd: to point to the source qemu-nbd
> 3) run qemu -incoming on the destination with the qed file
> 4) execute the migration
> 5) when migration completes, begin streaming on the destination to complete the copy
> 6) when the streaming is complete, shut down the qemu-nbd instance on the source
> 
> This is a bit involved and we could potentially automate some of this in qemu by launching qemu-nbd and providing commands to do some of this.  Again though, I think the question is what type of interfaces would libvirt prefer?  Low level interfaces + recipes on how to do high level things or higher level interfaces?

Is there anything keeping us from making the QMP socket multiplexable? I was thinking of something like:

{ command = "nbd_server" ; block = "qemu_block_name" }
{ result = "done" }
<qmp socket turns into nbd socket>

This way we don't require yet another port, don't have to care about conflicts and get internal qemu block names for free.


Alex





More information about the libvir-list mailing list