[libvirt] [PATCH 1/6] Add new API virDomainStreamDisk[Info] to header and drivers
Anthony Liguori
anthony at codemonkey.ws
Mon Apr 11 22:06:54 UTC 2011
On 04/11/2011 04:45 PM, Daniel P. Berrange wrote:
> On Fri, Apr 08, 2011 at 02:26:48PM -0500, Anthony Liguori wrote:
>> On 04/08/2011 11:02 AM, Stefan Hajnoczi wrote:
>>> On Fri, Apr 8, 2011 at 2:31 PM, Daniel P. Berrange<berrange at redhat.com> wrote:
>>>
>>> I have CCed Anthony and Kevin. Anthony drove the QED image streaming
>>> and Kevin will probably be interested in the idea of allocating raw
>>> images as a background activity while QEMU runs.
>>>
>>>> /*
>>>> * @path: fully qualified filename of the virtual disk
>>>> * @nregions: filled in the number of @region structs
>>>> * @regions: filled with a list of allocated regions
>>>> *
>>>> * Query the extents of allocated regions within the
>>>> * virtual disk file. The offsets in the list of regions
>>>> * are not guarenteed to be sorted in any explicit order.
>>>> */
>>>> int virDomainBlockGetAllocationMap(virDomainPtr dom,
>>>> const char *path,
>>>> unsigned int *nregions,
>>>> virDomainBlockRegionPtr *regions);
>>> QEMU can provide this with its existing .bdrv_is_allocated() function.
>>> Kevin, do you have any thoughts on whether this API will work well?
>> I think the trouble with this API proposal is that it's overloading
>> concepts.
>>
>> Sparse is not the same thing as CoW to a backing file.
> I don't like to use the term "sparse", since that implies a specific disk
> format (raw file with holes). Rather I use the term 'thin provisioned'
> to refer to any disk format, where the not all physical sectors have
> yet been allocated. A thin-provisioned disk, can trivially be thought
> of as a disk, with a backing file whose sectors are all filled with
> zeros.
It's not so black and white today.
Imagine that you had a qcow2 file, and you "streamed" it such that it
was no longer "thin provisioned", as soon as the guest starts issuing
trim/discards, QEMU could conceivably start defragmenting the image and
truncating resulting in a sparse file.
The only time the concept of "fully allocated" really makes sense is for
a raw image on a simple file system. Once you start dealing with
things like btrfs and deduplication, and of those useful guarantees are
thrown out the window.
I think the real question is, why do you care about what physical
sectors reside where? What problem are you trying to solve?
>> For instance, when you expose streaming, the result is still a
>> sparse file. So you'd have a rather curious API where you called to
>> "allocate" a region in the file which resulted in having a sparse
>> file which you then called again to make it non sparse. But AFAICT,
>> the API doesn't really tell you these details.
> Copy-on-read streaming does not imply that the result is still
> thin-provisioned. That is a policy decision by the management
> application.
I think your notion of thin-provision doesn't quite map to how things
work today. Unless you're in a very constrained environment, you're
always thin provisioned.
>> Having to related APIs to expand a copy-on-read image and then to
>> fill in a sparse file is certainly a reasonable thing to do. I
>> think trying to make a single API that does both without having a
>> flag that basically makes it two APIs is going to be cumbersome.
> On the contrary, having a single API makes life *simpler*. It doesn't
> require any special flag to distinguish the two use cases, since they
> are fundamentally the same thing. Some examples, which include the
> implicit "all zeros" backing file that every disk has, should illustrate
> this
>
> - Make a brand new thin-provisioned disk, no backing store,
> fully allocated
>
> |0|0|0|0|0|0|0|0|0|
> | | | | | | | | | | -> |0|0|0|0|0|0|0|0|0|
>
> - Make a brand new thin-provisioned disk, no backing store,
> 1/2 allocated
>
> |0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|
> | | | | | | | | | | -> |0|0|0|0|0| | | | |
>
> - Make a existing, thin-provisioned disk, no backing store,
> fully allocated
>
> |0|0|0|0|0|0|0|0|0|
> |X| |X|X| | |X| |X| -> |X|0|X|X|0|0|X|0|X|
>
> - Make a existing, thin-provisioned disk, no backing store,
> 1/2 allocated
>
> |0|0|0|0|0|0|0|0|0| |0|0|0|0|0|0|0|0|0|
> |X| |X|X| | |X| |X| -> |X|0|X|X|0| |X| |X|
>
> - Make a brand new thin-provisioned disk, with backing store,
> independant of backing store, but still thin:
>
> |0|0|0|0|0|0|0|0|0|
> |X| |X|X| | |X| |X| |0|0|0|0|0|0|0|0|0|
> | | | | | | | | | | -> |X| |X|X| | |X| |X|
>
> - Make a existing thin-provisioned disk, with backing store,
> independant of backing store, but still thin
>
> |0|0|0|0|0|0|0|0|0|
> |X| |X|X| | |X| |X| |0|0|0|0|0|0|0|0|0|
> |Y|Y|Y| | | | | | | -> |X| |X|X| | |X| |X|
>
> - Make a existing thin-provisioned disk, with backing store,
> independant of backing store, fully allocated
>
> |0|0|0|0|0|0|0|0|0|
> |X| |X|X| | |X| |X|
> |Y|Y|Y| | | | | | | -> |X|0|X|X|0|0|X|0|X|
>
> - Make a brand new thin-provisioned disk, with 2 backing stores,
> independant of backing stores& fully allocated:
>
> |0|0|0|0|0|0|0|0|0|
> | | |Z|Z| | | |Z| |
> |X| |X| | | |X| |X|
> |Y|Y| |Y| | | | | | -> |Y|Y|X|Y|0|0|X|Z|X|
>
>
> etc, etc for many more example scenarios. Cow-on-read streaming is really
> not a special case - it is just one of many example scenarios, all of
> which can be managed via the pair of APIs mentioned earlier.
It's just not this simple with modern file systems unfortunately.
The problem is your mixing a filesystem concept (sparseness) with a
purely QEMU concept (backing file). Streaming is the process of merging
a backing file into the current image without disrupting the backing
file. When it is completed and the two are fully merged, the current
image no longer has a dependency on the backing file.
It's essentially a reverse snapshot merge and is probably close to
snapshot merging conceptually than image sparseness.
Regards,
Anthony Liguori
> Regards,
> Daniel
More information about the libvir-list
mailing list