[libvirt] [PATCHv2 1/5] lib: Introduce API for retrieving bulk domain stats

Eric Blake eblake at redhat.com
Tue Aug 26 16:49:59 UTC 2014


On 08/26/2014 08:14 AM, Peter Krempa wrote:
> The motivation for this API is that management layers that use libvirt
> usually poll for statistics using various split up APIs we currently
> provide. To get all the necessary stuff, the app needs to issue a lot of
> calls and agregate the results.

s/agregate/aggregate/

> 
> The APIs I'm introducing here:
> 1) Returns data in a format that we can expand in the future and is
> (pseudo) hierarchical. The data is returned as typed parameters where
> the fields are constructed as dot-separated strings containing names and
> other stuff in a list of typed params.
> 
> 2) Stats for multiple (all) domains can be queried at once and are
> returned in one call. This will allow to decrease the overhead necessary
> to issue multiple calls per domain multiplied by the count of domains.
> 
> 3) Selectable (bit mask) fields in the returned format. This will allow
> to retrieve only specific stats according to the apps need.
> 
> The stats groups will be enabled using a bit field @stats passed as the
> function argument. A few sample stats groups that this API will support:
> 
> VIR_DOMAIN_STATS_STATE
> VIR_DOMAIN_STATS_CPU
> VIR_DOMAIN_STATS_BLOCK
> VIR_DOMAIN_STATS_INTERFACE
> 
> the returned typed params will use the following scheme
> 
> state.state = running
> state.reason = started
> cpu.count = 8
> cpu.0.state = running
> cpu.0.time = 1234

Not in this patch, but based on that description, I'd expect something like:
block.count = 2
block.0.dev = vda
block.0.rd_req = 123
block.0.rd_bytes = 10000
...
block.1.dev = vdb

It's been through a few rounds of back and forth on design, but this
sounds reasonable to commit to.

> ---
>  include/libvirt/libvirt.h.in |  26 +++++++
>  src/driver.h                 |   9 +++
>  src/libvirt.c                | 180 +++++++++++++++++++++++++++++++++++++++++++
>  src/libvirt_public.syms      |   7 ++
>  4 files changed, 222 insertions(+)
> 
> diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
> index 47ea695..94eba86 100644
> --- a/include/libvirt/libvirt.h.in
> +++ b/include/libvirt/libvirt.h.in
> @@ -2501,6 +2501,32 @@ int virDomainDetachDeviceFlags(virDomainPtr domain,
>  int virDomainUpdateDeviceFlags(virDomainPtr domain,
>                                 const char *xml, unsigned int flags);
> 
> +typedef struct _virDomainStatsRecord virDomainStatsRecord;
> +typedef virDomainStatsRecord *virDomainStatsRecordPtr;
> +struct _virDomainStatsRecord {
> +    virDomainPtr dom;
> +    int nparams;
> +    virTypedParameterPtr params;
> +};

Too bad we already have the precedence of 'int nparams' instead of
'size_t nparams', but at least it is consistent, and unlikely to overflow.

All our functions list params/nparams, but this struct lists in reverse
order.  It shouldn't hurt to swap the order for consistency.

> +
> +typedef enum {
> +    VIR_DOMAIN_STATS_ALL = (1 << 0), /* return all stats fields
> +                                       implemented in the daemon */
> +    VIR_DOMAIN_STATS_STATE = (1 << 1), /* return domain state */
> +} virDomainStatsTypes;

This is fewer stats groups than what the commit message. Adding more in
later patches is okay, and can happen post-freeze, if the API makes it
in pre-freeze, but you might want to make it clear in the commit message
that you have saved other stats for later patches.

> +
> +int virConnectGetAllDomainStats(virConnectPtr conn,
> +                                unsigned int stats,
> +                                virDomainStatsRecordPtr **retStats,
> +                                unsigned int flags);

Hmmm. If I understand right, stats is a bitmask, and you error out if a
bit is requested that the daemon doesn't support.  On the other hand,
VIR_DOMAIN_STATS_ALL is not a mask representing all other bits, but more
of a flag that says "give me all available stats without regards to the
bitmask".  Would it be better to have the following semantics:

If stats is 0, provide all possible stats.  If stats is non-zero, it is
the bitmask of stats that MUST be provided, where VIR_DOMAIN_STATS_STATE
= 1, VIR_DOMAIN_STATS_CPU = 2, VIR_DOMAIN_STATS_BLOCK = 4.
For convenience, the value VIR_DOMAIN_STATS_ALL is a bitmask of all the
other bits (with these three categories, it would be 7)

Then, in the flags argument, a flags of 0 says to ignore bitmask members
that cannot be provided, while a flags of VIR_DOMAIN_STATS_ENFORCE says
to fail if a bitmask is provided but that stat is not available (and of
course, makes no difference if the stats bitmask was 0).

Thus, these three are the same:
virConnectGetAllDomainStats(conn, 0, &array, 0)
virConnectGetAllDomainStats(conn, 0, &array, VIR_DOMAIN_STATS_ENFORCE)
virConnectGetAllDomainStats(conn, VIR_DOMAIN_STATS_ALL, &array, 0)

while this one guarantees that ALL stats are provided, or that the API
will outright fail:
virConnectGetAllDomainStats(conn, VIR_DOMAIN_STATS_ALL, &array,
VIR_DOMAIN_STATS_ENFORCE)


> +
> +int virDomainListGetStats(virDomainPtr *doms,

I take it *doms is a NULL-terminated array?  (I guess I'll get to the
docs later in the patch).

> +                          unsigned int stats,
> +                          virDomainStatsRecordPtr **retStats,
> +                          unsigned int flags);
> +
> +void virDomainStatsRecordListFree(virDomainStatsRecordPtr *stats);
> +
>  /*
>   * BlockJob API
>   */
> diff --git a/src/driver.h b/src/driver.h
> index ba7c1fc..d5596ab 100644
> --- a/src/driver.h
> +++ b/src/driver.h
> @@ -1191,6 +1191,14 @@ typedef int
>                            unsigned int flags);
> 
> 
> +typedef int
> +(*virDrvDomainListGetStats)(virConnectPtr conn,
> +                            virDomainPtr *doms,
> +                            unsigned int ndoms,
> +                            unsigned int stats,
> +                            virDomainStatsRecordPtr **retStats,
> +                            unsigned int flags);

Ah, so under the hood, we only have to implement one callback, where
doms is NULL for virConnectGetAllDomainStats and non-NULL for
virDomainListGetStats?  I suppose that works, although it might make the
RPC interesting.  I guess I'll see later in the series, but as this is
internal-only, it shouldn't matter for what we commit to in the public API.

> +
> +
> +/**
> + * virConnectGetAllDomainStats:
> + * @conn: pointer to the hypervisor connection
> + * @stats: stats to return, binary-OR of virDomainStatsTypes
> + * @retStats: Pointer that will be filled with the array of returned stats.
> + * @flags: extra flags; not used yet, so callers should always pass 0

Would it make sense for @flags to provide the same filtering as
virConnectListAllDomains(), for selecting a subset of domains to get
stats on?  But can definitely be added later.

> + *
> + * Query statistics for all domains on a given connection.
> + *
> + * Report statistics of various parameters for a running VM according to @stats
> + * field. The statistics are returned as an array of structures for each queried
> + * domain. The structure contains an array of typed parameters containing the
> + * individual statistics. The typed parameter name for each statistic field
> + * consists of a dot-separated string containing name of the requested group
> + * followed by a group specific description of the statistic value.
> + *
> + * The statistic groups are enabled using the @stats parameter which is a
> + * binary-OR of enum virDomainStatsTypes. The following groups are available
> + * (although not necessarily implemented for each storage driver):

s/storage driver/hypervisor/

> + *
> + * VIR_DOMAIN_STATS_ALL: Return all statistics supported by the hypervisor
> + * driver. This allows to query everything the driver supports without getting
> + * an error.

See my above comments about how this feels like it should be something
in the @flags parameter, rather than the @stats parameter (that is, a
flag _ENFORCE that says whether the hypervisor lacking a particular stat
group is fatal).

> + *
> + * VIR_DOMAIN_STATS_STATE: Return domain state and reason for entering that
> + * state. The typed parameter keys are in format:
> + * "state.state" - state of the VM, returned as int from virDomainState enum
> + * "state.reason" - reason for entering given state, returned as in from

s/in/int/

> + *                  virDomain*Reason enmum corresponding to given state.

s/enmum/enum/

> + *
> + * Returns the count of returned statistics strucutres on success, -1 on error.

s/strucutres/structures/

> + * The requested data are returned in the @retStats parameter. The returned
> + * array should be freed by the caller. See virDomainStatsRecordListFree.
> + */
> +int
> +virConnectGetAllDomainStats(virConnectPtr conn,
> +                            unsigned int stats,
> +                            virDomainStatsRecordPtr **retStats,
> +                            unsigned int flags)
> +{
> +    int ret = -1;
> +
> +    VIR_DEBUG("conn=%p, stats=0x%x, retStats=%p, flags=0x%x",
> +              conn, stats, retStats, flags);
> +
> +    virResetLastError();
> +
> +    virCheckConnectReturn(conn, -1);

virCheckNonNullArg(retStats)?  Matters depending on whether you plan to
allow NULL retStats across RPC.

Right now, it looks like you are planning on allowing NULL for retStats
to use the return value to see how many domains there are (or get an
error that a requested stat is impossible).  Is it worth that
complexity, given that virConnectListAllDomains can already return a
count of the number of domains?  Then again, consistency is easier to
copy-and-paste and therefore maintain.


> +/**
> + * virDomainListGetStats:
> + * @doms: NULL terminated array of domains

I'm a little bit surprised that we aren't requiring the user to just
pass a length; but since virConnectListAllDomains returns a
NULL-terminated array, having one less parameter does seem easier on the
caller.

> + * @stats: stats to return, binary-OR of virDomainStatsTypes
> + * @retStats: Pointer that will be filled with the array of returned stats.
> + * @flags: extra flags; not used yet, so callers should always pass 0
> + *
> + * Query statistics for domains provided by @doms. Note that all domains in
> + * @doms must share the same connection.

Yes, that better be enforced :)

> + *
> + * VIR_DOMAIN_STATS_STATE: Return domain state and reason for entering that
> + * state. The typed parameter keys are in format:
> + * "state.state" - state of the VM, returned as int from virDomainState enum
> + * "state.reason" - reason for entering given state, returned as in from
> + *                  virDomain*Reason enmum corresponding to given state.

Same typos as above.  I'm okay with the duplication (it's easier for a
user to read the right docs on the first hit than to have to chase
links), but be sure to fix both places identically.

> + *
> + * Returns the count of returned statistics strucutres on success, -1 on error.

Probably worth a simple mention that the return count may be smaller
than the count of doms passed in.  (Probably not worth mentioning why,
but possible reasons might include: domain no longer exists, duplicate
domains on input consolidated into common domain on output, driver can
only list stats for a running domain so it omits an entire structure for
an offline domain, ...)

> +int
> +virDomainListGetStats(virDomainPtr *doms,
> +                      unsigned int stats,
> +                      virDomainStatsRecordPtr **retStats,
> +                      unsigned int flags)
> +{
> +    virConnectPtr conn = NULL;
> +    virDomainPtr *nextdom = doms;
> +    unsigned int ndoms = 0;
> +    int ret = -1;
> +
> +    VIR_DEBUG("doms=%p, stats=0x%x, retStats=%p, flags=0x%x",
> +              doms, stats, retStats, flags);
> +
> +    virResetLastError();
> +
> +    if (!*doms) {

Missing virCheckNonNullArgReturn(doms, -1) prior to dereferencing *doms.
 (I would have suggested virCheckNonNullArgGoto(doms, error), except
that the error label only works if we have validated the connection).

> +        virReportError(VIR_ERR_INVALID_ARG,
> +                       _("doms array in %s must contain at least one domain"),
> +                       __FUNCTION__);
> +        goto cleanup;

Ouch.  You can't use the cleanup label unless you know the conn is
valid, but you don't validate the conn until...

> +    }
> +
> +    conn = doms[0]->conn;
> +    virCheckConnectReturn(conn, -1);

...here.  All failures before this point have to be direct 'return -1'.
 But you want as few of those as possible, so that the bulk of the error
detection can report the error through the connection.

> +void
> +virDomainStatsRecordListFree(virDomainStatsRecordPtr *stats)
> +{
> +    virDomainStatsRecordPtr *next = stats;
> +
> +    while (*next) {
> +        virTypedParamsFree((*next)->params, (*next)->nparams);
> +        virDomainFree((*next)->dom);
> +        VIR_FREE((*next));

Extra () on that last line (only needed on the first 2 of the three lines).

Looking close, but I'd feel more comfortable with a v3.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 539 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140826/54542ff0/attachment-0001.sig>


More information about the libvir-list mailing list