[libvirt] [RFC] New domain job control and stat APIs

Eric Blake eblake at redhat.com
Wed Jul 10 13:38:20 UTC 2019


On 7/10/19 7:27 AM, Peter Krempa wrote:
> Currently we don't have a consolidated approach for managing
> asynchronous long-running domain jobs. Historically there were
> long-running jobs which interlocked with each other and thus there was
> only one such job possible at given time (migration, save, restore, dump)

Yes, I agree with the problem statement. In fact, one of my questions
when starting on incremental backup was whether I'd have to implement
any of this first, or punt it to later. Well, here we are; it's later,
and I still don't have incremental backup in.

> 
> These jobs have a not very flexible set of APIs:
> virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.
> 
> These don't really allow selecting which job to terminate since there's
> only one, thus if we wanted to add different kinds of jobs which not
> necessarily interlock but are able to run in parallel we had to
> introduce another set of APIs.
> 
> This resulted into creation of block job APIs:
> virDomainBlockJobAbort, virDomainGetBlockJobInfo
> 
> These allow parallel jobs (discriminated by disk to which the job
> belongs) but are not universal and nor allow parallel jobs on a single
> disk.
> 
> Similarly blockjobs can also become detached from the disk e.g. if the
> guest unplugs the disk fronted. That way the job would linger in a limbo
> and would not be controllable. (This is certainly a possibility with
> -blockdev).
> 
> With -blockdev we also get a potentially long-running blockdev-create
> job which is not bound to any disk as part of kicking of a snapshot or
> block copy job. This one might also get stuck and in the current state
> is not really controllable.
> 
> Additionally the upcomming block-backup job will be a combination of the
> above. It's a job which spans multiple disks (thus not really a block
> job in libvirt terminology) but not a domain job either as there
> can be potentially more than one block backup job. The proposal for
> block-backup introduces it's own single-purpose set of APIs for managing
> the backup job only, but abuses the block job and domain job events to
> distribute the async state updates.

At the bare minimum, a push job implementation has to send async state
update events, but a pull job does not. If the backup API goes into
5.6.0 but not your new job control APIs, we can limit the qemu
implementation to JUST pull backups for now (the XML has been shown to
be extensible to add in push support once we also have sane APIs for
managing async state updates with push support).

> 
> With this series I want to introduce a set of APIs for managing the jobs
> which are designed to be universal enough and a new event so that noting
> will try to retrofit onto existing infrastructure.
> 
> An example of the job XML would be:
> 
> <job type='block-commit-active' state='ready'>
>   <config>
>     <disk>vda</disk>
>     <top>vda[1]</top>
>     <base>vda[5]</base>
>   </config>

Where what forms a valid <config> is dependent on the type='...' of the
<job>, correct?

>   <stats>
>    <current>12345</current>
>    <end>12345</current>
>   </stats>
> </job>

Looks reasonable.

> 
> but this will be mostly a topic for the second part of this excercise
> after we discuss the APIs.
> 
> The new infrastructure will also allow adding a flag for all the
> existing APIs which kick-off a job so that the job will persist even
> after it finishes. This will also properly implement the statistics for
> a finished migration and similar.
> 
> Obviously we will need to take special care when wiring up these so that
> the old APIs work for old situations and also the events are reported
> correctly.
> 
> The initial idea would be to implement the stats XML both for the domain
> jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
> so that they won't have to infer whether the given job type is already
> reported in the new API.
> 
> Additionally we can also implement flags for the XML getter API that
> will skip the stats gathering as that may require monitor interactions.
> Also one possibility would be to return an abbreviated XML in the
> listing API.

Makes sense.

> ---
>  include/libvirt/libvirt-domain.h | 91 +++++++++++++++++++++++++++++++
>  src/libvirt-domain.c             | 94 ++++++++++++++++++++++++++++++++
>  2 files changed, 185 insertions(+)
> 
> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> index 2dbd74d4f3..dac77771be 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -4485,6 +4485,28 @@ typedef void (*virConnectDomainEventBlockThresholdCallback)(virConnectPtr conn,
>                                                              unsigned long long excess,
>                                                              void *opaque);
> 
> +/**
> + * virConnectDomainEventJobStateCallback:
> + * @conn: connection object
> + * @dom: domain on which the event occurred
> + * @jobname: name of job which changed state
> + * @jobtype: type of the job
> + * @newstate: the new state the job entered
> + * @opaque: application specified data
> + *
> + * The callback occurs when a long running domain job (see virDomainJobList)
> + * changes state.
> + *
> + * The callback signature to use when registering for an event of type
> + * VIR_DOMAIN_EVENT_ID_JOB_STATE with virConnectDomainEventRegisterAny()
> + */
> +typedef void (*virConnectDomainEventJobStateCallback)(virConnectPtr conn,
> +                                                      virDomainPtr dom,
> +                                                      const char *jobname,
> +                                                      virDomainJobType jobtype,
> +                                                      virDomainJobState newstate,
> +                                                      void *opaque);
> +
>  /**
>   * VIR_DOMAIN_EVENT_CALLBACK:
>   *
> @@ -4527,6 +4549,7 @@ typedef enum {
>      VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED = 22, /* virConnectDomainEventDeviceRemovalFailedCallback */
>      VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* virConnectDomainEventMetadataChangeCallback */
>      VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* virConnectDomainEventBlockThresholdCallback */
> +    VIR_DOMAIN_EVENT_ID_JOB_STATE = 25, /* virConnectDomainEventJobStateCallback */
> 
>  # ifdef VIR_ENUM_SENTINELS
>      VIR_DOMAIN_EVENT_ID_LAST
> @@ -4896,4 +4919,72 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
>                                     int *nparams,
>                                     unsigned int flags);
> 
> +typedef enum {
> +    VIR_DOMAIN_JOB_TYPE_NONE = 0,
> +    VIR_DOMAIN_JOB_TYPE_MIGRATION = 1,
> +    VIR_DOMAIN_JOB_TYPE_BLOCK_PULL = 2,
> +    [...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_TYPE_LAST
> +# endif
> +} virDomainJobType;
> +
> +
> +typedef enum {
> +    VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
> +    VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
> +    VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state and may be finalized */
> +    VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
> +    VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully */
> +    VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
> +    [...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_STATE_LAST
> +# endif
> +} virDomainJobState;

Not all job types will utilize all of the possible job states, but I
don't see that as an issue.

> +
> +
> +typedef struct _virDomainJob virDomainJob;
> +typedef virDomainJob *virDomainJobPtr;
> +struct _virDomainJob {
> +    char *name;
> +    virDomainJobType type;
> +    virDomainJobState state;
> +
> +    /* possibly overkill? - currently empty*/
> +    virTypedParameterPtr data;

Can the XML can provide all the same information as what you would place
in typed parameters?

> +    size_t ndata;
> +};
> +
> +
> +void virDomainJobFree(virDomainJobPtr job);
> +
> +int virDomainJobList(virDomainPtr domain,
> +                     virDomainJobPtr **jobs,
> +                     unsigned int flags);
> +
> +int virDomainJobGetXMLDesc(virDomainPtr domain,
> +                           const char *jobname,
> +                           unsigned int flags);
> +
> +typedef enum {
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> +
> +# ifdef VIR_ENUM_SENTINELS
> +    VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> +# endif
> +} virDomainJobControlOperation;
> +
> +int virDomainJobControl(virDomainPtr domain,
> +                        const char *jobname,

Do we want const char *jobname or int jobid?  Which is easier to work
with? My backup code currently proposed the use of int jobid, switching
to a job name may be more effort. Also, I was able to document that a
jobid==0 implies "the current backup job, if there is exactly one",
regardless of what other job id it may have (I guess a jobname==NULL
would serve the same purpose - but passing around char* instead of int
means you also have to start worrying about malloc lifetimes).

> +                        virDomainJobControlOperation op,
> +                        unsigned int flags);
> +
>  #endif /* LIBVIRT_DOMAIN_H */
> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> index 3d12e7c125..aa5571818f 100644
> --- a/src/libvirt-domain.c
> +++ b/src/libvirt-domain.c
> @@ -12362,3 +12362,97 @@ int virDomainGetLaunchSecurityInfo(virDomainPtr domain,
>      virDispatchError(domain->conn);
>      return -1;
>  }
> +
> +
> +/**
> + * virDomainJobFree:
> + * @job: pointer to virDomainJob object
> + *
> + * Frees the memory associated with @job.
> + */
> +void
> +virDomainJobFree(virDomainJobPtr job)
> +{
> +    [...]
> +}
> +
> +
> +/**
> + * virDomainJobList:
> + * @domain: pointer to a domain
> + * @jobs: Pointer to a variable to store the array containing job description
> + *        objects or NULL if the list is not required.
> + * @flags: optional flags (currently unused, callers should always pass 0)

One possible use for flags - as a way to filter on specific job types.
With one flag bit per supported job type, I could do virDomainJobList(,
JOB_LIST_MIGRATE | JOB_LIST_BACKUP).

Another filter might be on whether to list all jobs active or otherwise,
vs. just the jobs that have completed but are not yet cleaned up (if I
only want to know about the jobs that have a pending status waiting for
me to collect).

> + *
> + * Collects a list of background jobs associated with @domain and returns it in
> + * an allocated array of virDomainJobPtr structs. The jobs include migration jobs

s/$/,/

> + * block jobs and any other possibly long running asynchronous operation.
> + *
> + * The caller is responsible for freeing the members of the returned @jobs array
> + * using virDomainJobFree and the whole array using free();

s/;/./

> + *
> + * Returns the number of jobs running on @domain on success (optionally filling
> + * @jobs if non-NULL) or -1 on error (value of @jobs is set to NULL).
> + */
> +int
> +virDomainJobList(virDomainPtr domain,
> +                 virDomainJobPtr **jobs,
> +                 unsigned int flags)
> +{
> +    [...]
> +}
> +
> +
> +/**
> + * virDomainJobGetXMLDesc:
> + * @domain: pointer to a domain
> + * @jobname: name of the domain job to operate on
> + * @flags: optional flags (currently unused, callers should always pass 0)
> + *
> + * Returns a string containing an UTF-8 encoded XML document describing the
> + * configuration, state and progress of domain job @jobname. Please refer to the
> + * job XML documentation for information on the format of the returned document.
> + *
> + * In case of error NULL is returned. Caller is responsible for free()-ing the
> + * returned string.
> + */
> +char *
> +virDomainJobGetXMLDesc(virDomainPtr domain,
> +                       const char *jobname,
> +                       unsigned int flags)
> +{
> +    [...]
> +}
> +
> +
> +/**
> + * virDomainJobControl:
> + * @domain: pointer to a domain
> + * @jobname: name of the domain job to operate on
> + * @op: operation to perform on @jobname
> + * @flags: optional flags (currently unused, callers should always pass 0)
> + *
> + * Requests change of state of @jobname. Note that it depends on the type of
> + * @jobname whether @op is supported.
> + *
> + * VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE are supported only with

s/are/is/

> + * VIR_DOMAIN_JOB_TYPE_BLOCK_COPY and VIR_DOMAIN_JOB_TYPE_BLOCK_COMMIT_ACTIVE.
> + *
> + * VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE and VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME
> + * are supported only with VIR_DOMAIN_JOB_TYPE_BLOCK_* type jobs.
> + *
> + * The request to change state is asynchronous and callers should install an
> + * event callback for VIR_DOMAIN_EVENT_ID_JOB_STATE if they wish to be notified
> + * when the state change occured. (Note that the callback may fire before this

either 'occurs' or 'occurred'

> + * API returns).
> + *
> + * Returns 0 on success or -1 on error.
> + */
> +int
> +virDomainJobControl(virDomainPtr domain,
> +                    const char *jobname,
> +                    virDomainJobControlOperations op,
> +                    unsigned int flags)
> +{
> +    [...]
> +}
> 

Overall seems reasonable. Figuring out how to connect existing jobs
(both domain jobs and block jobs) into the new scheme may be an
interesting effort.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190710/7fc1715e/attachment-0001.sig>


More information about the libvir-list mailing list