[libvirt] [PATCH 1/1] perf: add more perf events support

John Ferlan jferlan at redhat.com
Tue Jul 12 20:01:32 UTC 2016



On 06/29/2016 08:10 PM, Qiaowei Ren wrote:
> With current perf framework, this patch adds support to more perf
                                                       ^^^^^^^
for more perf

> events, including cache missing, cache peference, cpu cycles,

A quick google search turns up "cache references" - there's just too
many peference or peferences references to comment on them all, but they
all need to be "references"

> instrction, etc..

instructions

> 
> Signed-off-by: Qiaowei Ren <qiaowei.ren at intel.com>
> ---
>  docs/formatdomain.html.in                   | 24 +++++++++++
>  docs/schemas/domaincommon.rng               |  4 ++
>  include/libvirt/libvirt-domain.h            | 39 +++++++++++++++++
>  src/libvirt-domain.c                        |  8 ++++
>  src/qemu/qemu_driver.c                      | 23 +++++-----
>  src/util/virperf.c                          | 65 ++++++++++++++++++++++++++++-
>  src/util/virperf.h                          |  4 ++
>  tests/genericxml2xmlindata/generic-perf.xml |  4 ++
>  8 files changed, 158 insertions(+), 13 deletions(-)
> 

I see no changes for virsh.pod, see commit id '3110363d' for a recent
change Peter made in this space...

I think perhaps it may also be worthwhile to "in a separate patch" alter
the 'domstats --perf' description to simply reference the 'perf'
description where each of the collect perf.* events can be listed and
described.

Each of the collectible events could have some sort of tabular output -
see how 'vol-wipe' describes the various supported algorithms.  So much
easier to read than one long sentence.


> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index f660aa6..7999e43 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -1839,6 +1839,10 @@
>      <event name='cmt' enabled='yes'/>
>      <event name='mbmt' enabled='no'/>
>      <event name='mbml' enabled='yes'/>
> +    <event name='cache_misses' enabled='no'/>
> +    <event name='cache_peferences' enabled='no'/>
> +    <event name='instructions' enabled='no'/>
> +    <event name='cpu_cycles' enabled='no'/>
>    </perf>
>    ...
>  </pre>
> @@ -1864,6 +1868,26 @@
>        <td>bandwidth of memory traffic for a memory controller</td>
>        <td><code>perf.mbml</code></td>
>      </tr>
> +    <tr>
> +      <td><code>cache_misses</code></td>
> +      <td>the amount of cache missing by applications running on the platform</td>

is this the count of caches misses?  amount implies perhaps other things.

> +      <td><code>perf.cache_misses</code></td>
> +    </tr>
> +    <tr>
> +      <td><code>cache_peferences</code></td>
> +      <td>the amount of cache hit by applications running on the platform</td>
> +      <td><code>perf.cache_peferences</code></td>

similar is this the count of cache hits, right?

> +    </tr>
> +    <tr>
> +      <td><code>instructions</code></td>
> +      <td>the amount of instructions by applications running on the platform</td>

the count of instructions executed...

> +      <td><code>perf.instructions</code></td>
> +    </tr>
> +    <tr>
> +      <td><code>cpu_cycles</code></td>
> +      <td>the amount of cycles one instruction needs</td>

the number/count of cpu cycles

> +      <td><code>perf.cpu_cycles</code></td>
> +    </tr>
>    </table>
>  
>      <h3><a name="elementsDevices">Devices</a></h3>
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index 563cb3c..e41dc3a 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -414,6 +414,10 @@
>                <value>cmt</value>
>                <value>mbmt</value>
>                <value>mbml</value>
> +              <value>cache_misses</value>
> +              <value>cache_peferences</value>
> +              <value>instructions</value>
> +              <value>cpu_cycles</value>
>              </choice>
>            </attribute>
>            <attribute name="enabled">
> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> index 7ea93aa..b79cdb0 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -1947,6 +1947,45 @@ void virDomainStatsRecordListFree(virDomainStatsRecordPtr *stats);
>   */
>  # define VIR_PERF_PARAM_MBML "mbml"
>  
> +/**
> + * VIR_PERF_PARAM_CACHE_MISSES:
> + *
> + * Macro for typed parameter name that represents cache_misses perf
> + * event which can be used to measure the amount of cache missing by

s/amount/count

s/cache missing/cache misses/  (missing is a very different context!)

> + * applications running on the platform. It corresponds to the
> + * "perf.cache_misses" field in the *Stats APIs.
> + */
> +# define VIR_PERF_PARAM_CACHE_MISSES "cache_misses"
> +
> +/**
> + * VIR_PERF_PARAM_CACHE_REFERENCES:
> + *
> + * Macro for typed parameter name that represents cache_peferences
> + * perf event which can be used to measure the amount of cache hit

similar... amount/count ... hit/hits

> + * by applications running on the platform. It corresponds to the
> + * "perf.cache_peferences" field in the *Stats APIs.
> + */
> +# define VIR_PERF_PARAM_CACHE_REFERENCES "cache_peferences"
> +
> +/**
> + * VIR_PERF_PARAM_INSTRUCTIONS:
> + *
> + * Macro for typed parameter name that represents instructions perf
> + * event which can be used to measure the amount of instructions

similar amount/count

> + * by applications running on the platform. It corresponds to the
> + * "perf.instructions" field in the *Stats APIs.
> + */
> +# define VIR_PERF_PARAM_INSTRUCTIONS "instructions"
> +
> +/**
> + * VIR_PERF_PARAM_CPU_CYCLES:
> + *
> + * Macro for typed parameter name that represents cpu_cycles perf event
> + * which can be used to measure how many cycles one instruction needs.
> + * It corresponds to the "perf.cpu_cycles" field in the *Stats APIs.
> + */

The cycles and instructions seem to me to be things that could be very
large and awkward to print
> +# define VIR_PERF_PARAM_CPU_CYCLES "cpu_cycles"
> +
>  int virDomainGetPerfEvents(virDomainPtr dom,
>                             virTypedParameterPtr *params,
>                             int *nparams,
> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> index 4e71a94..b817e4b 100644
> --- a/src/libvirt-domain.c
> +++ b/src/libvirt-domain.c
> @@ -11452,6 +11452,14 @@ virConnectGetDomainCapabilities(virConnectPtr conn,
>   * "perf.mbml" - the amount of data (bytes/s) sent through the memory controller
>   *               on the socket as unsigned long long. It is produced by mbml
>   *               perf event.
> + * "perf.cache_misses" - the amount of cache missing as unsigned long long.
> + * It is produced by cache_misses perf event.
> + * "perf.cache_peferences" - the amount of cache hit as unsigned long long.
> + * It is produced by cache_peferences perf event.
> + * "perf.instructions" - the amount of instructions as unsigned long long.
> + * It is produced by instructions perf event.
> + * "perf.cpu_cycles" - the amount of cycles one instruction needs as unsigned
> + * long long. It is produced by cpu_cycles perf event.

Similar "amount" vs. "count"

>   *
>   * Note that entire stats groups or individual stat fields may be missing from
>   * the output in case they are not supported by the given hypervisor, are not
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 61d184b..bea753f 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -9613,6 +9613,10 @@ qemuDomainSetPerfEvents(virDomainPtr dom,
>                                 VIR_PERF_PARAM_CMT, VIR_TYPED_PARAM_BOOLEAN,
>                                 VIR_PERF_PARAM_MBMT, VIR_TYPED_PARAM_BOOLEAN,
>                                 VIR_PERF_PARAM_MBML, VIR_TYPED_PARAM_BOOLEAN,
> +                               VIR_PERF_PARAM_CACHE_MISSES, VIR_TYPED_PARAM_BOOLEAN,
> +                               VIR_PERF_PARAM_CACHE_REFERENCES, VIR_TYPED_PARAM_BOOLEAN,
> +                               VIR_PERF_PARAM_INSTRUCTIONS, VIR_TYPED_PARAM_BOOLEAN,
> +                               VIR_PERF_PARAM_CPU_CYCLES, VIR_TYPED_PARAM_BOOLEAN,
>                                 NULL) < 0)
>          return -1;
>  
> @@ -18941,10 +18945,10 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr driver,
>  #undef QEMU_ADD_COUNT_PARAM
>  
>  static int
> -qemuDomainGetStatsPerfRdt(virPerfPtr perf,
> -                          virPerfEventType type,
> -                          virDomainStatsRecordPtr record,
> -                          int *maxparams)
> +qemuDomainGetStatsPerfOneEvent(virPerfPtr perf,
> +                               virPerfEventType type,
> +                               virDomainStatsRecordPtr record,
> +                               int *maxparams)

This change seems separable. That is it's own patch to change the name
of the function because it's going to be multiple/general purpose soon.

>  {
>      char param_name[VIR_TYPED_PARAM_FIELD_LENGTH];
>      uint64_t value = 0;
> @@ -18980,14 +18984,9 @@ qemuDomainGetStatsPerf(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
>          if (!virPerfEventIsEnabled(priv->perf, i))
>               continue;
>  
> -        switch (i) {
> -        case VIR_PERF_EVENT_CMT:
> -        case VIR_PERF_EVENT_MBMT:
> -        case VIR_PERF_EVENT_MBML:

And removing the need for a switch could be separate too...  Trying to
think if it's still necessary though. virPerfEventIsEnabled will return
NULL if i >= VIR_PERF_EVENT_LAST, so it doesn't seem some future client
could request an event that some older driver doesn't support.

> -            if (qemuDomainGetStatsPerfRdt(priv->perf, i, record, maxparams) < 0)
> -                goto cleanup;
> -            break;
> -        }
> +        if (qemuDomainGetStatsPerfOneEvent(priv->perf, i,
> +                                           record, maxparams) < 0)
> +            goto cleanup;
>      }
>  
>      ret = 0;
> diff --git a/src/util/virperf.c b/src/util/virperf.c
> index 4661ba3..a3d2bc6 100644
> --- a/src/util/virperf.c
> +++ b/src/util/virperf.c
> @@ -38,7 +38,9 @@ VIR_LOG_INIT("util.perf");
>  #define VIR_FROM_THIS VIR_FROM_PERF
>  
>  VIR_ENUM_IMPL(virPerfEvent, VIR_PERF_EVENT_LAST,
> -              "cmt", "mbmt", "mbml");
> +              "cmt", "mbmt", "mbml",
> +              "cache_misses", "cache_peferences",
> +              "instructions", "cpu_cycles");
>  
>  struct virPerfEvent {
>      int type;
> @@ -189,6 +191,60 @@ virPerfRdtEnable(virPerfEventPtr event,
>      return -1;
>  }
>  
> +static int
> +virPerfGeneralEnable(virPerfEventPtr event,
> +                     pid_t pid)

Currently, these are less "General" and more "Hardware" events

Based on what I see in perf_event.h, I suspect the future could hold
getting software, tracepoint, hw_cache, raw, and breakpoint events too.

Perhaps in order to be more "General" the "type" and "config" parameters
could be passed...

> +{
> +    struct perf_event_attr attr;
> +
> +    memset(&attr, 0, sizeof(attr));
> +    attr.size = sizeof(attr);
> +    attr.inherit = 1;
> +    attr.disabled = 1;
> +    attr.enable_on_exec = 0;
> +
> +    switch (event->type) {
> +    case VIR_PERF_EVENT_CACHE_MISSES:
> +        attr.type = PERF_TYPE_HARDWARE;
           ^^^^^^^^^
this doesn't change for any of the cases

> +        attr.config = PERF_COUNT_HW_CACHE_MISSES;
> +        break;
> +    case VIR_PERF_EVENT_CACHE_REFERENCES:
> +        attr.type = PERF_TYPE_HARDWARE;
> +        attr.config = PERF_COUNT_HW_CACHE_REFERENCES;
> +        break;
> +    case VIR_PERF_EVENT_INSTRUCTIONS:
> +        attr.type = PERF_TYPE_HARDWARE;
> +        attr.config = PERF_COUNT_HW_INSTRUCTIONS;
> +        break;
> +    case VIR_PERF_EVENT_CPU_CYCLES:
> +        attr.type = PERF_TYPE_HARDWARE;
> +        attr.config = PERF_COUNT_HW_CPU_CYCLES;
> +        break;
> +    }

...Seems like it would be possible to create some sort of static
table/matrix that would be able to convert the VIR_PERF_EVENT_* into
their respective "attr.type" and "attr.config", so that this function
doesn't have the switch and the calling function passes by value the
'type' and 'config'.  Assuming of course the future is to get other events.

> +
> +    event->fd = syscall(__NR_perf_event_open, &attr, pid, -1, -1, 0);
> +    if (event->fd < 0) {
> +        virReportSystemError(errno,
> +                             _("Unable to open perf event for %s"),
> +                             virPerfEventTypeToString(event->type));
> +        goto error;
> +    }
> +
> +    if (ioctl(event->fd, PERF_EVENT_IOC_ENABLE) < 0) {
> +        virReportSystemError(errno,
> +                             _("Unable to enable perf event for %s"),
> +                             virPerfEventTypeToString(event->type));
> +        goto error;
> +    }
> +
> +    event->enabled = true;
> +    return 0;
> +
> + error:
> +    VIR_FORCE_CLOSE(event->fd);
> +    return -1;
> +}
> +
>  int
>  virPerfEventEnable(virPerfPtr perf,
>                     virPerfEventType type,
> @@ -205,6 +261,13 @@ virPerfEventEnable(virPerfPtr perf,
>          if (virPerfRdtEnable(event, pid) < 0)

I see this PerfRdt reference here which led me to wonder why both aren't
being changed, but I think I understand now. Of course, now I wonder
what the Rdt acronym means (is it Resource Director Technology?). If
it's an acronym, it's nice to see it spelled out at least once. Probably
in some earlier commit which I didn't chase. The lack of function
comments is, well, frustrating.

>              return -1;
>          break;
> +    case VIR_PERF_EVENT_CACHE_MISSES:
> +    case VIR_PERF_EVENT_CACHE_REFERENCES:
> +    case VIR_PERF_EVENT_INSTRUCTIONS:
> +    case VIR_PERF_EVENT_CPU_CYCLES:
> +        if (virPerfGeneralEnable(event, pid) < 0)
> +            return -1;
> +        break;

here is where there could be some sort of tabular/matrix reference to
get the 'type' and 'config' values filled in to pass.

>      case VIR_PERF_EVENT_LAST:
>          virReportError(VIR_ERR_INTERNAL_ERROR,
>                         _("Unexpected perf event type=%d"), type);
> diff --git a/src/util/virperf.h b/src/util/virperf.h
> index 7163410..7129370 100644
> --- a/src/util/virperf.h
> +++ b/src/util/virperf.h
> @@ -28,6 +28,10 @@ typedef enum {
>      VIR_PERF_EVENT_CMT,
>      VIR_PERF_EVENT_MBMT,
>      VIR_PERF_EVENT_MBML,
> +    VIR_PERF_EVENT_CACHE_MISSES,
> +    VIR_PERF_EVENT_CACHE_REFERENCES,
> +    VIR_PERF_EVENT_INSTRUCTIONS,
> +    VIR_PERF_EVENT_CPU_CYCLES,

Not that it matters too much since the numbers are different, but the
order here is different than perf_hw_id. "Sometimes" it's kinder to
access memory sequentially rather than somewhat randomly. Easier to see
for future changers what was already too...

Any reason to not go after the other 6 events in 'perf_hw_id'?

>  
>      VIR_PERF_EVENT_LAST
>  } virPerfEventType;
> diff --git a/tests/genericxml2xmlindata/generic-perf.xml b/tests/genericxml2xmlindata/generic-perf.xml
> index 394d2a6..6428ebd 100644
> --- a/tests/genericxml2xmlindata/generic-perf.xml
> +++ b/tests/genericxml2xmlindata/generic-perf.xml
> @@ -16,6 +16,10 @@
>      <event name='cmt' enabled='yes'/>
>      <event name='mbmt' enabled='no'/>
>      <event name='mbml' enabled='yes'/>
> +    <event name='cache_misses' enabled='no'/>
> +    <event name='cache_peferences' enabled='no'/>
> +    <event name='instructions' enabled='no'/>
> +    <event name='cpu_cycles' enabled='no'/>

All 4 are 'no', maybe make 1 or 2 'yes' (not that it matters, but they
are a different table underneath the covers)


John
>    </perf>
>    <devices>
>    </devices>
> 




More information about the libvir-list mailing list