[Libvir] PATCH: 0/7 Implementation of storage APIs

Wed Oct 31 13:16:26 UTC 2007

"Daniel P. Berrange" <berrange at redhat.com> wrote:
> Since the previous discussions didn't really end up anywhere conclusive
> I decided it would be better to have a crack at getting some working code
> to illustrate my ideas. Thus,  the following series of 7 patches provide

Hi Dan,

Impressive work!
It's taken me a while just to digest all of this.

...
> Some open questions
>
>  - Is it worth bothering with a UUID for individual storage volumes. It
>    is possible to construct a globally unique identifier for most volumes,
>    by combing various bits of metadata we have such as device unique ID,
>    inode, iSCSI target & LUN, etc There isn't really any UUID that fits
>    into the classic libvirt 16 byte UUID.  I've implemented (randomly
>    generated) UUIDs for the virStorageVolPtr object, but I'm inclined
>    to remove them, since its not much use if they change each time the
>    libvirtd daemon is restarted.
>
>    The 'name' field provides a unique identifier scoped to the storage
>    pool. I think we could add a 'char *key' field, as an arbitrary opaque
>    string, forming a globally unique identifier for the volume. This
>    would serve same purpose as UUID, but without the 16 bytes constraint
>    which we can't usefully provide.

That sounds good.  And with it being opaque, no one will
be tempted (or able) to rely on it.

>  - For the local directory backend, I've got the ability to choose
>    between file formats on a per-volume basis. eg, /var/lib/xen/images can
>    contain a mix of raw, qcow, vmdk, etc files. This is nice & flexible
>    for the user, but a more complex thing to implement, since it means
>    we have to probe each volume and try & figure out its format each

Have there been requests for this feature?
The probe-and-recognize part doesn't sound too hard, but if
a large majority of use cases have homogeneous volumes-per-pool,
then for starters at least, maybe we can avoid the complexity.

A possible compromise (albeit ugly), _if_ we can dictate naming policy:
let part of a volume name (suffix, substring, component, whatever)
tell libvirtd its type.  As I said, ugly, and hence probably not
worth considering, but I had to say it :-)

>    time we list volumes. If we constrained the choice between formats
>    to be at the pool level instead of the volume level we could avoid
>    probing & thus simplify the code. This is what XenAPI does.
>
>  - If creating non-sparse files, it can take a very long time to do the
>    I/O to fill in the image. In virt-intsall/virt-manager we have nice
>    incremental progress display. The API I've got currently though is
>    blocking. This blocks the calling application. It also blocks the
>    entire libvirtd daemon since we are single process. There are a couple
>    of ways we can address this:
>
>      1 Allow the libvirtd daemon to serve each client connection in
>        a separate thread. We'd need to adding some mutex locking to the
>        QEMU driver and the storage driver to handle this. It would have
>        been nice to avoid threads, but I don't think we can much longer.
>
>      2 For long running operations, spawn off a worker thread (or
>        process) to perform the operation. Don't send a reply to the RPC
>        message, instead just put the client on a 'wait queue', and get
>        on with serving other clients. When the worker thread completes,
>        send the RPC reply to the original client.
>
>      3 Having the virStorageVolCreate()  method return immediately,
>        giving back the client app some kind of 'job id'. The client app
>        can poll on another API  virStorageVolJob() method to determine
>        how far through the task has got. The implementation in the
>        driver would have to spawn a worker thread to do the actual
>        long operation.

I like the idea of spawning off a thread for a very precise
and limited-scope task.

On first reading, I preferred your #2 worker-thread-based solution.
Then, client apps simply wait -- i.e., don't have to poll.
But we'd still need another interface for progress feedback, so #3
starts to look better: client progress feedback might come almost
for free, while polling to check for completion.

>        Possibly we can allow creation to be async or blocking by
>        making use of the 'flags' field to virStorageVolCreate() method,
>        eg VIR_STORAGE_ASYNC.  If we make async behaviour optional, we
>        still need some work in the daemon to avoid blocking all clients.
>
>    This problem will also impact us when we add cloning of existing
>    volumes.  It already sort of hits us when saving & restoring VMs
>    if they have large amounts of memory. So perhaps we need togo
>    for the general solution of making the daemon threaded per client
>    connection. The ASYNC flag may still be useful anyway to get the
>    incremental progress feedback in the UI.

Could we just treat that as another type of task to hand out to
a worker thread?

Otherwise, this (#1) sounds a lot more invasive, but that's just my
relatively uninformed impression.