[libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
Alexander Graf
agraf at suse.de
Tue Sep 7 14:01:41 UTC 2010
On 07.09.2010, at 15:41, Anthony Liguori wrote:
> Hi,
>
> We've got copy-on-read and image streaming working in QED and before going much further, I wanted to bounce some interfaces off of the libvirt folks to make sure our final interface makes sense.
>
> Here's the basic idea:
>
> Today, you can create images based on base images that are copy on write. With QED, we also support copy on read which forces a copy from the backing image on read requests and write requests.
>
> In additional to copy on read, we introduce a notion of streaming a block device which means that we search for an unallocated region of the leaf image and force a copy-on-read operation.
>
> The combination of copy-on-read and streaming means that you can start a guest based on slow storage (like over the network) and bring in blocks on demand while also having a deterministic mechanism to complete the transfer.
>
> The interface for copy-on-read is just an option within qemu-img create. Streaming, on the other hand, requires a bit more thought. Today, I have a monitor command that does the following:
>
> stream <device> <sector offset>
>
> Which will try to stream the minimal amount of data for a single I/O operation and then return how many sectors were successfully streamed.
>
> The idea about how to drive this interface is a loop like:
>
> offset = 0;
> while offset < image_size:
> wait_for_idle_time()
> count = stream(device, offset)
> offset += count
>
> Obviously, the "wait_for_idle_time()" requires wide system awareness. The thing I'm not sure about is 1) would libvirt want to expose a similar stream interface and let management software determine idle time 2) attempt to detect idle time on it's own and provide a higher level interface. If (2), the question then becomes whether we should try to do this within qemu and provide libvirt a higher level interface.
I'm torn here too. Why not expose both? Have a qemu internal daemon available that gets a sleep time as parameter and an external "pull sectors" command. We'll see which one is more useful, but I don't think it's too much code to justify only having one of the two. And the internal daemon could be started using a command line parameter, which helps non-managed users.
>
> A related topic is block migration. Today we support pre-copy migration which means we transfer the block device and then do a live migration. Another approach is to do a live migration, and on the source, run a block server using image streaming on the destination to move the device.
>
> With QED, to implement this one would:
>
> 1) launch qemu-nbd on the source while the guest is running
> 2) create a qed file on the destination with copy-on-read enabled and a backing file using nbd: to point to the source qemu-nbd
> 3) run qemu -incoming on the destination with the qed file
> 4) execute the migration
> 5) when migration completes, begin streaming on the destination to complete the copy
> 6) when the streaming is complete, shut down the qemu-nbd instance on the source
>
> This is a bit involved and we could potentially automate some of this in qemu by launching qemu-nbd and providing commands to do some of this. Again though, I think the question is what type of interfaces would libvirt prefer? Low level interfaces + recipes on how to do high level things or higher level interfaces?
Is there anything keeping us from making the QMP socket multiplexable? I was thinking of something like:
{ command = "nbd_server" ; block = "qemu_block_name" }
{ result = "done" }
<qmp socket turns into nbd socket>
This way we don't require yet another port, don't have to care about conflicts and get internal qemu block names for free.
Alex
More information about the libvir-list
mailing list