[libvirt] [PATCH 1/6] Add new API virDomainStreamDisk[Info] to header and drivers

Adam Litke agl at us.ibm.com
Mon May 9 17:51:16 UTC 2011



On 05/09/2011 11:09 AM, Daniel P. Berrange wrote:
> On Mon, May 02, 2011 at 04:29:49PM -0500, Adam Litke wrote:
>> After several long distractions, I am back to working on disk streaming.
>> Before I hit the list with a new series of patches, I was hoping to
>> reach some middle ground on the proposed streaming API.
>>
>> On Fri, 2011-04-08 at 14:31 +0100, Daniel P. Berrange wrote:
>>> On Thu, Apr 07, 2011 at 04:31:59PM -0500, Adam Litke wrote:
>>>>   /*
>>>> + * Disk Streaming
>>>> + */
>>>> +typedef enum {
>>>> +    VIR_STREAM_DISK_ONE   = 1,  /* Stream a single disk unit */
>>>> +    VIR_STREAM_DISK_START = 2,  /* Stream the entire disk */
>>>> +    VIR_STREAM_DISK_STOP  = 4,  /* Stop streaming a disk */
>>>> +} virDomainStreamDiskFlags;
>>>
>>> Using flags to combine two separate tasks into one single API
>>> is rather unpleasant. As raised in the previous patch, the API
>>> should also be taking a offset+length in bytes, then there is
>>> no need for a special case transfer of an individual sector.
>>
>> This is a fair point.  I will work with Stefan to support an
>> offset/length qemu API.  Since there doesn't seem to be a great way to
>> query device sizes, I think we do need a convenient way to request that
>> the entire disk be streamed.  We could either do this with a flag or by
>> overriding (offset==0&&  length==0) to mean stream the entire device.
>> Do you have a preference?
>
> Since length==0 is otherwise meaningless, it is fine to  use that
> to indicate "until end of disk". This is consistent with
> what we do for virStorageVolUpload/Download which allow length=0
> to indicate "until end of disk".
>
>>>> +#define VIR_STREAM_PATH_BUFLEN 1024
>>>> +#define VIR_STREAM_DISK_MAX_STREAMS 10
>>>> +
>>>> +typedef struct _virStreamDiskState virStreamDiskState;
>>>> +struct _virStreamDiskState {
>>>> +    char path[VIR_STREAM_PATH_BUFLEN];
>>>> +    /*
>>>> +     * The unit of measure for size and offset is unspecified.  These fields
>>>> +     * are meant to indicate the progress of a continuous streaming operation.
>>>> +     */
>>>> +    unsigned long long offset; /* Current offset of active streaming */
>>>> +    unsigned long long size;   /* Disk size */
>>>> +};
>>>> +typedef virStreamDiskState *virStreamDiskStatePtr;
>>>> +
>>>> +unsigned long long       virDomainStreamDisk(virDomainPtr dom,
>>>> +                                             const char *path,
>>>> +                                             unsigned long long offset,
>>>> +                                             unsigned int flags);
>>>> +
>>>> +int                      virDomainStreamDiskInfo(virDomainPtr dom,
>>>> +                                                 virStreamDiskStatePtr states,
>>>> +                                                 unsigned int nr_states,
>>>> +                                                 unsigned int flags);
>>>
>>> I would have liked it if we could use the existing JobInfo APIs for
>>> getting progress information, but if we need to allow concurrent
>>> usage for multiple disks per-VM, we can't. I think we should still
>>> use a similar style of API though.
>>
>> The goal is to eventually support concurrent streams.  Therefore, we
>> will probably need to roll our own Status and Abort APIs (see my
>> proposed API below).
>>
>>> There also doesn't appear to be a precise way to determine if the
>>> copying of an entire disk failed part way through, and if so, how
>>> much was actually copied.
>>>
>>> Taking all the previous points together, I think the API needs to
>>> look like this:
>>>
>>>      typedef enum {
>>>          /* If set, virDomainBlockAllocate() will return immediately
>>>           * allowing polling for operation completion&  status
>>>           */
>>>          VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
>>>      } virDomainBlockAllocateFlags;
>>>
>>>      /*
>>>       * @path: fully qualified filename of the virtual disk
>>
>> I probably misnamed it, but path is actually the device alias, not a
>> path to an image file.
>
> Hmm, I wonder whether that is a good choice or not. The other
> APIs all use the disk path. Perhaps we could use that as default
> and have a flag to indicate whether the path should be treated as
> a device alias instead. Thus getting both options.

Does libvirt have an option to lookup a device alias by disk path?  If 
so, then I am happy to use the file path and convert it to the form that 
qemu expects.

>>
>>>       * @offset: logical position in bytes, within the virtual disk
>>>       * @length: amount of data to copy, in bytes starting from @offset
>>>       * @flags: One of virDomainBlockAllocateFlags, or zero
>>>       *
>>>       * Ensure the virtual disk @path is fully allocated
>>>       * between @offset and @offset+ at length. If a backing
>>>       * store is present, data will be filled from the
>>>       * backing store, otherwise data will be fileld with
>>>       * zeros
>>>       *
>>>       * If @flags contains VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
>>>       * this API will return immediately after initiating the
>>>       * copy, otherwise it will block until copying is complete
>>>       *
>>>       */
>>>      int virDomainBlockAllocate(virDomainPtr dom,
>>>                                 const char *path,
>>>                                 unsigned long long offset,
>>>                                 unsigned long long length,
>>>                                 unsigned int flags);
>>>
>>>
>>>      /*
>>>       * @path: fully qualified filename of the virtual disk
>>>       * @info: allocated struct to return progress info
>>>       *
>>>       * Query the progress of a disk allocation job. This
>>>       * API must be used when virDomainBlockAllocate() was
>>>       * invoked with the VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK
>>>       * flag set.
>>>       *
>>>       * The @info.type field will indicate whether the job
>>>       * was completed successfully, or failed part way
>>>       * through.
>>>       *
>>>       * The @info data progress fields will contain current
>>>       * progress information.
>>>       *
>>>       * The hypervisor driver may optionally chose to also
>>>       * fillin a time estimate for completion.
>>>       */
>>>      int virDomainBlockGetJobInfo(virDomainPtr dom,
>>>                                   const char *path,
>>>                                   virDomainJobInfoPtr info);
>>>
>>>      /*
>>>       * @path: fully qualified filename of the virtual disk
>>>       *
>>>       * Request that a disk allocation job be aborted at
>>>       * the soonest opportunity. This API can only be used
>>>       * when virDomainBlockAllocate() was invoked with the
>>>       * VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK flag set.
>>>       */
>>>      int virDomainBlockAbortJob(virDomainPtr dom,
>>>                                 const char *path);
>>>
>>>
>>>      typedef struct _virDomainBlockRegion virDomainBlockRegion;
>>>      typedef virDomainBlockRegion *virDomainBlockRegionPtr;
>>>      struct _virDomainBlockRegion {
>>>          /* The logical offset within the file of the allocated region */
>>>          unsigned long long offset;
>>>          /* The length of the allocated region */
>>>          unsigned long long length;
>>>      };
>>>
>>>      /*
>>>       * @path: fully qualified filename of the virtual disk
>>>       * @nregions: filled in the number of @region structs
>>>       * @regions: filled with a list of allocated regions
>>>       *
>>>       * Query the extents of allocated regions within the
>>>       * virtual disk file. The offsets in the list of regions
>>>       * are not guarenteed to be sorted in any explicit order.
>>>       */
>>>      int virDomainBlockGetAllocationMap(virDomainPtr dom,
>>>                                         const char *path,
>>>                                         unsigned int *nregions,
>>>                                         virDomainBlockRegionPtr *regions);
>>
>> I am not convinced that we really need the block allocation map stuff.
>> It's a fun toy, but the layout of data within a device is far too
>> low-level of a concept to be exposing to users.  In my opinion,
>> streaming should really be done sequentially from the start of the
>> device to the end (with arbitrary chunk sizes allowed for throttling
>> purposes).  If an application wants to stream according to some special
>> pattern, let them maintain a data structure outside of libvirt to manage
>> that extra complexity.  It's not an error to stream a chunk that is
>> already present.  The populated areas will just complete immediately.
>> Therefore, there isn't a need to maintain a strict record of outstanding
>> regions.
>
> Well we can always add something like this in at a later
> date, so it is fine to drop it.
>
>>> This takes care of things for running guests. It would be
>>> desirable to have the same functionality available when a
>>> guest is not running, via the virStorageVol APIs. Indeed,
>>> this would allow access to the allocation functionality
>>> for disks not explicitly associated with any VM yet.
>>
>> In light of previous discussion about the complexities around storage
>> formats and filesystems, I am not sure how useful such an offline API is
>> going to be in practice.  At any rate, I would prefer to consider that
>> issue separately.
>>
>> So, with my points taken into account I would like to counter with the
>> following API proposal:
>>
>> ==== snip ====
>>
>>      typedef enum {
>>           /* If set, virDomainBlockAllocate() will return immediately
>>            * allowing polling for operation completion&  status
>>            */
>>           VIR_DOMAIN_DISK_STREAM_NONBLOCK,
>>       } virDomainBlockStreamFlags;
>>
>>       /*
>>        * @device: alias of the target block device
>>        * @offset: logical position in bytes, within the virtual disk
>>        * @length: amount of data to copy, in bytes starting from @offset
>>        * @flags: One of virDomainBlockAllocateFlags, or zero
>>        *
>>        * Ensure the virtual disk @device is fully allocated
>>        * between @offset and @offset+ at length. If a backing
>>        * store is present, data will be filled from the
>>        * backing store, otherwise data will be fileld with
>>        * zeros.
>>        *
>>        * If @flags contains VIR_DOMAIN_DISK_STREAM_NONBLOCK,
>>        * this API will return immediately after initiating the
>>        * copy, otherwise it will block until copying is complete
>>        *
>>        */
>>       int virDomainBlockStream(virDomainPtr dom,
>>                                const char *device,
>>                                unsigned long long offset,
>>                                unsigned long long length,
>>                                unsigned int flags);
>>
>>       /*
>>        * @device: alias of the target block device
>>        *
>>        * Request that a disk streaming job be aborted at
>>        * the soonest opportunity. This API can only be used
>>        * when virDomainBlockStream() was invoked with the
>>        * VIR_DOMAIN_DISK_STREAM_NONBLOCK flag set.
>>        */
>>       int virDomainBlockStreamAbort(virDomainPtr dom,
>>                                     const char *device);
>>
>>       typedef enum {
>>           VIR_BLOCK_STREAM_STATE_COMPLETED = 0,
>>           VIR_BLOCK_STREAM_STATE_ACTIVE = 1,
>>           VIR_BLOCK_STREAM_STATE_FAILED = 2,
>>       } virBlockStreamStatus;
>>
>>       typedef struct _virBlockStreamInfo virBlockStreamInfo;
>>       struct _virBlockStreamInfo {
>>           virBlockStreamStatus status;
>
> Coding guidelines don't allow use of enum types in structs
> or as API parameters, instead requiring 'int'.

OK.

>>           unsigned long long remaining; /* number of bytes remaining */
>>       };
>>       typedef virBlockStreamInfo *virBlockStreamInfoPtr;
>
> s/virBlock/virDomainBlock/ in several places here.
>
>>
>>       /*
>>        * @device: alias of the target block device
>>        * @info: allocated struct to return progress info
>>        *
>>        * Query the progress of a disk streaming job. This
>>        * API must be used when virDomainBlockStream() was
>>        * invoked with the VIR_DOMAIN_DISK_STREAM_NONBLOCK
>>        * flag set.
>>        *
>>        */
>>       int virDomainBlockStreamInfo(virDomainPtr dom,
>>                                    const char *device,
>>                                    virBlockStreamInfoPtr info);
>
> Any reason you didn't just use the existing 'virDomainJobInfoPtr'
> struct, with this new API ? In particular I think we want many
> of the other fields from that struct beyond just 'remaining'.

It looks like I can use this structure without problems but given the 
desire to eventually support simultaneous streams, we'll still need to 
write our own start/stop/status routines.




More information about the libvir-list mailing list