[libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

Sun Sep 12 17:19:31 UTC 2010

On 09/12/2010 11:45 AM, Avi Kivity wrote:
>> Streaming relies on copy-on-read to do the writing.
>
>
> Ah.  You can avoid the copy-on-read implementation in the block format 
> driver and do it completely in generic code.

Copy on read takes advantage of temporal locality.  You wouldn't want to 
stream without copy on read because you decrease your idle I/O time by 
not effectively caching.

>>>     stream_4():
>>>         increment offset
>>>         if more:
>>>              bdrv_aio_stream()
>>>
>>>
>>> Of course, need to serialize wrt guest writes, which adds a bit more 
>>> complexity.  I'll leave it to you to code the state machine for that.
>>
>> http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d 
>>
>
> Clever - it pushes all the synchronization into the copy-on-read 
> implementation.  But the serialization there hardly jumps out of the 
> code.
>
> Do I understand correctly that you can only have one allocating read 
> or write running?

Cluster allocation, L2 cache allocation, or on-disk L2 allocation?

You only have one on-disk L2 allocation at one time.  That's just an 
implementation detail at the moment.  An on-disk L2 allocation happens 
only when writing to a new cluster that requires a totally new L2 
entry.  Since L2s cover 2GB of logical space, it's a rare event so this 
turns out to be pretty reasonable for a first implementation.

Parallel on-disk L2 allocations is not that difficult, it's just a 
future TODO.

>>
>> Generally, I think the block layer makes more sense if the interface 
>> to the formats are high level and code sharing is achieved not by 
>> mandating a world view but rather but making libraries of common 
>> functionality.   This is more akin to how the FS layer works in Linux.
>>
>> So IMHO, we ought to add a bdrv_aio_commit function, turn the current 
>> code into a generic_aio_commit, implement a qed_aio_commit, then 
>> somehow do qcow2_aio_commit, and look at what we can refactor into 
>> common code.
>
> What Linux does if have an equivalent of bdrv_generic_aio_commit() 
> which most implementations call (or default to), and only do something 
> if they want something special.  Something like commit (or 
> copy-on-read, or copy-on-write, or streaming) can be implement 100% in 
> terms of the generic functions (and indeed qcow2 backing files can be 
> any format).

Yes, what I'm really saying is that we should take the 
bdrv_generic_aio_commit() approach.  I think we're in agreement here.

Regards,

Anthony Liguori