[libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

Avi Kivity avi at redhat.com
Sun Sep 12 10:55:44 UTC 2010


  On 09/07/2010 04:41 PM, Anthony Liguori wrote:
> Hi,
>
> We've got copy-on-read and image streaming working in QED and before 
> going much further, I wanted to bounce some interfaces off of the 
> libvirt folks to make sure our final interface makes sense.
>
> Here's the basic idea:
>
> Today, you can create images based on base images that are copy on 
> write.  With QED, we also support copy on read which forces a copy 
> from the backing image on read requests and write requests.

Is copy on read QED specific?  It looks very similar to the commit 
command, except with I/O directions reversed.

IIRC, commit looks like

   for each sector:
     if image.mapped(sector):
         backing_image.write(sector, image.read(sector))

whereas copy-on-read looks like:

   def copy_on_read():
     set_ioprio(idle)
     for each sector:
       if not image.mapped(sector):
           image.write(sector, backing_image.read(sector))
    run_in_thread(copy_on_read)

With appropriate locking.

>
> In additional to copy on read, we introduce a notion of streaming a 
> block device which means that we search for an unallocated region of 
> the leaf image and force a copy-on-read operation.
>
> The combination of copy-on-read and streaming means that you can start 
> a guest based on slow storage (like over the network) and bring in 
> blocks on demand while also having a deterministic mechanism to 
> complete the transfer.
>
> The interface for copy-on-read is just an option within qemu-img 
> create.  Streaming, on the other hand, requires a bit more thought.  
> Today, I have a monitor command that does the following:
>
> stream <device> <sector offset>
>
> Which will try to stream the minimal amount of data for a single I/O 
> operation and then return how many sectors were successfully streamed.
>
> The idea about how to drive this interface is a loop like:
>
> offset = 0;
> while offset < image_size:
>    wait_for_idle_time()
>    count = stream(device, offset)
>    offset += count
>

This is way too low level for the management stack.

Have you considered using the idle class I/O priority to implement 
this?  That would allow host-wide prioritization.  Not sure how to do 
cluster-wide, I don't think NFS has the concept of I/O priority.


-- 
error compiling committee.c: too many arguments to function




More information about the libvir-list mailing list