[libvirt] [PATCH 1/6] Add new API virDomainStreamDisk[Info] to header and drivers

Mon Apr 11 22:06:54 UTC 2011

On 04/11/2011 04:45 PM, Daniel P. Berrange wrote:
> On Fri, Apr 08, 2011 at 02:26:48PM -0500, Anthony Liguori wrote:
>> On 04/08/2011 11:02 AM, Stefan Hajnoczi wrote:
>>> On Fri, Apr 8, 2011 at 2:31 PM, Daniel P. Berrange<berrange at redhat.com>   wrote:
>>>
>>> I have CCed Anthony and Kevin.  Anthony drove the QED image streaming
>>> and Kevin will probably be interested in the idea of allocating raw
>>> images as a background activity while QEMU runs.
>>>
>>>>     /*
>>>>      * @path: fully qualified filename of the virtual disk
>>>>      * @nregions: filled in the number of @region structs
>>>>      * @regions: filled with a list of allocated regions
>>>>      *
>>>>      * Query the extents of allocated regions within the
>>>>      * virtual disk file. The offsets in the list of regions
>>>>      * are not guarenteed to be sorted in any explicit order.
>>>>      */
>>>>     int virDomainBlockGetAllocationMap(virDomainPtr dom,
>>>>                                        const char *path,
>>>>                                        unsigned int *nregions,
>>>>                                        virDomainBlockRegionPtr *regions);
>>> QEMU can provide this with its existing .bdrv_is_allocated() function.
>>>   Kevin, do you have any thoughts on whether this API will work well?
>> I think the trouble with this API proposal is that it's overloading
>> concepts.
>>
>> Sparse is not the same thing as CoW to a backing file.
> I don't like to use the term "sparse", since that implies a specific disk
> format (raw file with holes). Rather I use the term 'thin provisioned'
> to refer to any disk format, where the not all physical sectors have
> yet been allocated. A thin-provisioned disk, can trivially be thought
> of as a disk, with a backing file whose sectors are all filled with
> zeros.

It's not so black and white today.

Imagine that you had a qcow2 file, and you "streamed" it such that it 
was no longer "thin provisioned", as soon as the guest starts issuing 
trim/discards, QEMU could conceivably start defragmenting the image and 
truncating resulting in a sparse file.

The only time the concept of "fully allocated" really makes sense is for 
a raw image on a simple file system.   Once you start dealing with 
things like btrfs and deduplication, and of those useful guarantees are 
thrown out the window.

I think the real question is, why do you care about what physical 
sectors reside where?  What problem are you trying to solve?

>> For instance, when you expose streaming, the result is still a
>> sparse file.  So you'd have a rather curious API where you called to
>> "allocate" a region in the file which resulted in having a sparse
>> file which you then called again to make it non sparse.  But AFAICT,
>> the API doesn't really tell you these details.
> Copy-on-read streaming does not imply that the result is still
> thin-provisioned. That is a policy decision by the management
> application.

I think your notion of thin-provision doesn't quite map to how things 
work today.  Unless you're in a very constrained environment, you're 
always thin provisioned.

>> Having to related APIs to expand a copy-on-read image and then to
>> fill in a sparse file is certainly a reasonable thing to do.  I
>> think trying to make a single API that does both without having a
>> flag that basically makes it two APIs is going to be cumbersome.
> On the contrary, having a single API makes life *simpler*. It doesn't
> require any special flag to distinguish the two use cases, since they
> are fundamentally the same thing. Some examples, which include the
> implicit "all zeros" backing file that every disk has, should illustrate
> this
>
>   - Make a brand new thin-provisioned disk, no backing store,
>     fully allocated
>
>     |0|0|0|0|0|0|0|0|0|
>     | | | | | | | | | |   ->      |0|0|0|0|0|0|0|0|0|
>
>   - Make a brand new thin-provisioned disk, no backing store,
>     1/2 allocated
>
>     |0|0|0|0|0|0|0|0|0|          |0|0|0|0|0|0|0|0|0|
>     | | | | | | | | | |   ->      |0|0|0|0|0| | | | |
>
>   - Make a existing, thin-provisioned disk, no backing store,
>     fully allocated
>
>     |0|0|0|0|0|0|0|0|0|
>     |X| |X|X| | |X| |X|   ->      |X|0|X|X|0|0|X|0|X|
>
>   - Make a existing, thin-provisioned disk, no backing store,
>     1/2 allocated
>
>     |0|0|0|0|0|0|0|0|0|          |0|0|0|0|0|0|0|0|0|
>     |X| |X|X| | |X| |X|   ->      |X|0|X|X|0| |X| |X|
>
>   - Make a brand new thin-provisioned disk, with backing store,
>     independant of backing store, but still thin:
>
>     |0|0|0|0|0|0|0|0|0|
>     |X| |X|X| | |X| |X|          |0|0|0|0|0|0|0|0|0|
>     | | | | | | | | | |   ->      |X| |X|X| | |X| |X|
>
>   - Make a existing thin-provisioned disk, with backing store,
>     independant of backing store, but still thin
>
>     |0|0|0|0|0|0|0|0|0|
>     |X| |X|X| | |X| |X|          |0|0|0|0|0|0|0|0|0|
>     |Y|Y|Y| | | | | | |   ->      |X| |X|X| | |X| |X|
>
>   - Make a existing thin-provisioned disk, with backing store,
>     independant of backing store, fully allocated
>
>     |0|0|0|0|0|0|0|0|0|
>     |X| |X|X| | |X| |X|
>     |Y|Y|Y| | | | | | |   ->      |X|0|X|X|0|0|X|0|X|
>
>   - Make a brand new thin-provisioned disk, with 2 backing stores,
>     independant of backing stores&  fully allocated:
>
>     |0|0|0|0|0|0|0|0|0|
>     | | |Z|Z| | | |Z| |
>     |X| |X| | | |X| |X|
>     |Y|Y| |Y| | | | | |   ->      |Y|Y|X|Y|0|0|X|Z|X|
>
>
> etc, etc for many more example scenarios. Cow-on-read streaming is really
> not a special case - it is just one of many example scenarios, all of
> which can be managed via the pair of APIs mentioned earlier.

It's just not this simple with modern file systems unfortunately.

The problem is your mixing a filesystem concept (sparseness) with a 
purely QEMU concept (backing file).  Streaming is the process of merging 
a backing file into the current image without disrupting the backing 
file.  When it is completed and the two are fully merged, the current 
image no longer has a dependency on the backing file.

It's essentially a reverse snapshot merge and is probably close to 
snapshot merging conceptually than image sparseness.

Regards,

Anthony Liguori

> Regards,
> Daniel