[libvirt] [PATCH 1/1] perf: add more perf events support

Ren, Qiaowei qiaowei.ren at intel.com
Sat Jul 16 08:37:44 UTC 2016


> -----Original Message-----
> From: John Ferlan [mailto:jferlan at redhat.com]
> Sent: Wednesday, July 13, 2016 4:02 AM
> To: Ren, Qiaowei <qiaowei.ren at intel.com>; libvir-list at redhat.com
> Cc: Peter Krempa <pkrempa at redhat.com>
> Subject: Re: [libvirt] [PATCH 1/1] perf: add more perf events support
> 
> 
> 
> On 06/29/2016 08:10 PM, Qiaowei Ren wrote:
> > With current perf framework, this patch adds support to more perf
>                                                        ^^^^^^^ for more perf
> 
> > events, including cache missing, cache peference, cpu cycles,
> 
> A quick google search turns up "cache references" - there's just too many
> peference or peferences references to comment on them all, but they all need
> to be "references"
> 

John, according perf code from linux kernel, 'cache peference' is from linux kernel code. Certainly 'perf list' command show the 'references', and I will change it to this.

> > instrction, etc..
> 
> instructions
> 
> >
> > Signed-off-by: Qiaowei Ren <qiaowei.ren at intel.com>
> > ---
> >  docs/formatdomain.html.in                   | 24 +++++++++++
> >  docs/schemas/domaincommon.rng               |  4 ++
> >  include/libvirt/libvirt-domain.h            | 39 +++++++++++++++++
> >  src/libvirt-domain.c                        |  8 ++++
> >  src/qemu/qemu_driver.c                      | 23 +++++-----
> >  src/util/virperf.c                          | 65 ++++++++++++++++++++++++++++-
> >  src/util/virperf.h                          |  4 ++
> >  tests/genericxml2xmlindata/generic-perf.xml |  4 ++
> >  8 files changed, 158 insertions(+), 13 deletions(-)
> >
> 
> I see no changes for virsh.pod, see commit id '3110363d' for a recent change
> Peter made in this space...
> 
> I think perhaps it may also be worthwhile to "in a separate patch" alter the
> 'domstats --perf' description to simply reference the 'perf'
> description where each of the collect perf.* events can be listed and described.
> 
> Each of the collectible events could have some sort of tabular output - see how
> 'vol-wipe' describes the various supported algorithms.  So much easier to read
> than one long sentence.
> 
Sure. I will add one patch about this.

> 
> > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > index f660aa6..7999e43 100644
> > --- a/docs/formatdomain.html.in
> > +++ b/docs/formatdomain.html.in
> > @@ -1839,6 +1839,10 @@
> >      <event name='cmt' enabled='yes'/>
> >      <event name='mbmt' enabled='no'/>
> >      <event name='mbml' enabled='yes'/>
> > +    <event name='cache_misses' enabled='no'/>
> > +    <event name='cache_peferences' enabled='no'/>
> > +    <event name='instructions' enabled='no'/>
> > +    <event name='cpu_cycles' enabled='no'/>
> >    </perf>
> >    ...
> >  </pre>
> > @@ -1864,6 +1868,26 @@
> >        <td>bandwidth of memory traffic for a memory controller</td>
> >        <td><code>perf.mbml</code></td>
> >      </tr>
> > +    <tr>
> > +      <td><code>cache_misses</code></td>
> > +      <td>the amount of cache missing by applications running on the
> > + platform</td>
> 
> is this the count of caches misses?  amount implies perhaps other things.
> 
Yes. I will change it.

> > +      <td><code>perf.cache_misses</code></td>
> > +    </tr>
> > +    <tr>
> > +      <td><code>cache_peferences</code></td>
> > +      <td>the amount of cache hit by applications running on the platform</td>
> > +      <td><code>perf.cache_peferences</code></td>
> 
> similar is this the count of cache hits, right?
> 
Yes.

> > +    </tr>
> > +    <tr>
> > +      <td><code>instructions</code></td>
> > +      <td>the amount of instructions by applications running on the
> > + platform</td>
> 
> the count of instructions executed...
> 
> > +      <td><code>perf.instructions</code></td>
> > +    </tr>
> > +    <tr>
> > +      <td><code>cpu_cycles</code></td>
> > +      <td>the amount of cycles one instruction needs</td>
> 
> the number/count of cpu cycles
> 
> > +      <td><code>perf.cpu_cycles</code></td>
> > +    </tr>
> >    </table>
> >
> >      <h3><a name="elementsDevices">Devices</a></h3>
> > diff --git a/docs/schemas/domaincommon.rng
> > b/docs/schemas/domaincommon.rng index 563cb3c..e41dc3a 100644
> > --- a/docs/schemas/domaincommon.rng
> > +++ b/docs/schemas/domaincommon.rng
> > @@ -414,6 +414,10 @@
> >                <value>cmt</value>
> >                <value>mbmt</value>
> >                <value>mbml</value>
> > +              <value>cache_misses</value>
> > +              <value>cache_peferences</value>
> > +              <value>instructions</value>
> > +              <value>cpu_cycles</value>
> >              </choice>
> >            </attribute>
> >            <attribute name="enabled">
> > diff --git a/include/libvirt/libvirt-domain.h
> > b/include/libvirt/libvirt-domain.h
> > index 7ea93aa..b79cdb0 100644
> > --- a/include/libvirt/libvirt-domain.h
> > +++ b/include/libvirt/libvirt-domain.h
> > @@ -1947,6 +1947,45 @@ void
> virDomainStatsRecordListFree(virDomainStatsRecordPtr *stats);
> >   */
> >  # define VIR_PERF_PARAM_MBML "mbml"
> >
> > +/**
> > + * VIR_PERF_PARAM_CACHE_MISSES:
> > + *
> > + * Macro for typed parameter name that represents cache_misses perf
> > + * event which can be used to measure the amount of cache missing by
> 
> s/amount/count
> 
> s/cache missing/cache misses/  (missing is a very different context!)
> 
> > + * applications running on the platform. It corresponds to the
> > + * "perf.cache_misses" field in the *Stats APIs.
> > + */
> > +# define VIR_PERF_PARAM_CACHE_MISSES "cache_misses"
> > +
> > +/**
> > + * VIR_PERF_PARAM_CACHE_REFERENCES:
> > + *
> > + * Macro for typed parameter name that represents cache_peferences
> > + * perf event which can be used to measure the amount of cache hit
> 
> similar... amount/count ... hit/hits
> 
> > + * by applications running on the platform. It corresponds to the
> > + * "perf.cache_peferences" field in the *Stats APIs.
> > + */
> > +# define VIR_PERF_PARAM_CACHE_REFERENCES "cache_peferences"
> > +
> > +/**
> > + * VIR_PERF_PARAM_INSTRUCTIONS:
> > + *
> > + * Macro for typed parameter name that represents instructions perf
> > + * event which can be used to measure the amount of instructions
> 
> similar amount/count
> 
> > + * by applications running on the platform. It corresponds to the
> > + * "perf.instructions" field in the *Stats APIs.
> > + */
> > +# define VIR_PERF_PARAM_INSTRUCTIONS "instructions"
> > +
> > +/**
> > + * VIR_PERF_PARAM_CPU_CYCLES:
> > + *
> > + * Macro for typed parameter name that represents cpu_cycles perf
> > +event
> > + * which can be used to measure how many cycles one instruction needs.
> > + * It corresponds to the "perf.cpu_cycles" field in the *Stats APIs.
> > + */
> 
> The cycles and instructions seem to me to be things that could be very large and
> awkward to print
> > +# define VIR_PERF_PARAM_CPU_CYCLES "cpu_cycles"
> > +
> >  int virDomainGetPerfEvents(virDomainPtr dom,
> >                             virTypedParameterPtr *params,
> >                             int *nparams, diff --git
> > a/src/libvirt-domain.c b/src/libvirt-domain.c index 4e71a94..b817e4b
> > 100644
> > --- a/src/libvirt-domain.c
> > +++ b/src/libvirt-domain.c
> > @@ -11452,6 +11452,14 @@ virConnectGetDomainCapabilities(virConnectPtr
> conn,
> >   * "perf.mbml" - the amount of data (bytes/s) sent through the memory
> controller
> >   *               on the socket as unsigned long long. It is produced by mbml
> >   *               perf event.
> > + * "perf.cache_misses" - the amount of cache missing as unsigned long long.
> > + * It is produced by cache_misses perf event.
> > + * "perf.cache_peferences" - the amount of cache hit as unsigned long long.
> > + * It is produced by cache_peferences perf event.
> > + * "perf.instructions" - the amount of instructions as unsigned long long.
> > + * It is produced by instructions perf event.
> > + * "perf.cpu_cycles" - the amount of cycles one instruction needs as
> > + unsigned
> > + * long long. It is produced by cpu_cycles perf event.
> 
> Similar "amount" vs. "count"
> 
> >   *
> >   * Note that entire stats groups or individual stat fields may be missing from
> >   * the output in case they are not supported by the given hypervisor,
> > are not diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > index 61d184b..bea753f 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -9613,6 +9613,10 @@ qemuDomainSetPerfEvents(virDomainPtr dom,
> >                                 VIR_PERF_PARAM_CMT, VIR_TYPED_PARAM_BOOLEAN,
> >                                 VIR_PERF_PARAM_MBMT, VIR_TYPED_PARAM_BOOLEAN,
> >                                 VIR_PERF_PARAM_MBML,
> > VIR_TYPED_PARAM_BOOLEAN,
> > +                               VIR_PERF_PARAM_CACHE_MISSES,
> VIR_TYPED_PARAM_BOOLEAN,
> > +                               VIR_PERF_PARAM_CACHE_REFERENCES,
> VIR_TYPED_PARAM_BOOLEAN,
> > +                               VIR_PERF_PARAM_INSTRUCTIONS,
> VIR_TYPED_PARAM_BOOLEAN,
> > +                               VIR_PERF_PARAM_CPU_CYCLES,
> > + VIR_TYPED_PARAM_BOOLEAN,
> >                                 NULL) < 0)
> >          return -1;
> >
> > @@ -18941,10 +18945,10 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr
> > driver,  #undef QEMU_ADD_COUNT_PARAM
> >
> >  static int
> > -qemuDomainGetStatsPerfRdt(virPerfPtr perf,
> > -                          virPerfEventType type,
> > -                          virDomainStatsRecordPtr record,
> > -                          int *maxparams)
> > +qemuDomainGetStatsPerfOneEvent(virPerfPtr perf,
> > +                               virPerfEventType type,
> > +                               virDomainStatsRecordPtr record,
> > +                               int *maxparams)
> 
> This change seems separable. That is it's own patch to change the name of the
> function because it's going to be multiple/general purpose soon.
> 
Ok. I will separate it from this patch.

> >  {
> >      char param_name[VIR_TYPED_PARAM_FIELD_LENGTH];
> >      uint64_t value = 0;
> > @@ -18980,14 +18984,9 @@ qemuDomainGetStatsPerf(virQEMUDriverPtr
> driver ATTRIBUTE_UNUSED,
> >          if (!virPerfEventIsEnabled(priv->perf, i))
> >               continue;
> >
> > -        switch (i) {
> > -        case VIR_PERF_EVENT_CMT:
> > -        case VIR_PERF_EVENT_MBMT:
> > -        case VIR_PERF_EVENT_MBML:
> 
> And removing the need for a switch could be separate too...  Trying to think if
> it's still necessary though. virPerfEventIsEnabled will return NULL if i >=
> VIR_PERF_EVENT_LAST, so it doesn't seem some future client could request an
> event that some older driver doesn't support.
> 
virPerfEventIsEnabled will return false (not NULL) if I >= VIR_PERF_EVENT_LAST. Some future clients could not enable those events that older driver doesn't support.

> > -            if (qemuDomainGetStatsPerfRdt(priv->perf, i, record, maxparams) < 0)
> > -                goto cleanup;
> > -            break;
> > -        }
> > +        if (qemuDomainGetStatsPerfOneEvent(priv->perf, i,
> > +                                           record, maxparams) < 0)
> > +            goto cleanup;
> >      }
> >
> >      ret = 0;
> > diff --git a/src/util/virperf.c b/src/util/virperf.c index
> > 4661ba3..a3d2bc6 100644
> > --- a/src/util/virperf.c
> > +++ b/src/util/virperf.c
> > @@ -38,7 +38,9 @@ VIR_LOG_INIT("util.perf");  #define VIR_FROM_THIS
> > VIR_FROM_PERF
> >
> >  VIR_ENUM_IMPL(virPerfEvent, VIR_PERF_EVENT_LAST,
> > -              "cmt", "mbmt", "mbml");
> > +              "cmt", "mbmt", "mbml",
> > +              "cache_misses", "cache_peferences",
> > +              "instructions", "cpu_cycles");
> >
> >  struct virPerfEvent {
> >      int type;
> > @@ -189,6 +191,60 @@ virPerfRdtEnable(virPerfEventPtr event,
> >      return -1;
> >  }
> >
> > +static int
> > +virPerfGeneralEnable(virPerfEventPtr event,
> > +                     pid_t pid)
> 
> Currently, these are less "General" and more "Hardware" events
> 
> Based on what I see in perf_event.h, I suspect the future could hold getting
> software, tracepoint, hw_cache, raw, and breakpoint events too.
> 
> Perhaps in order to be more "General" the "type" and "config" parameters could
> be passed...
> 
> > +{
> > +    struct perf_event_attr attr;
> > +
> > +    memset(&attr, 0, sizeof(attr));
> > +    attr.size = sizeof(attr);
> > +    attr.inherit = 1;
> > +    attr.disabled = 1;
> > +    attr.enable_on_exec = 0;
> > +
> > +    switch (event->type) {
> > +    case VIR_PERF_EVENT_CACHE_MISSES:
> > +        attr.type = PERF_TYPE_HARDWARE;
>            ^^^^^^^^^
> this doesn't change for any of the cases
> 
> > +        attr.config = PERF_COUNT_HW_CACHE_MISSES;
> > +        break;
> > +    case VIR_PERF_EVENT_CACHE_REFERENCES:
> > +        attr.type = PERF_TYPE_HARDWARE;
> > +        attr.config = PERF_COUNT_HW_CACHE_REFERENCES;
> > +        break;
> > +    case VIR_PERF_EVENT_INSTRUCTIONS:
> > +        attr.type = PERF_TYPE_HARDWARE;
> > +        attr.config = PERF_COUNT_HW_INSTRUCTIONS;
> > +        break;
> > +    case VIR_PERF_EVENT_CPU_CYCLES:
> > +        attr.type = PERF_TYPE_HARDWARE;
> > +        attr.config = PERF_COUNT_HW_CPU_CYCLES;
> > +        break;
> > +    }
> 
> ...Seems like it would be possible to create some sort of static table/matrix that
> would be able to convert the VIR_PERF_EVENT_* into their respective
> "attr.type" and "attr.config", so that this function doesn't have the switch and
> the calling function passes by value the 'type' and 'config'.  Assuming of course
> the future is to get other events.
> 
Yes. This is a good idea. And I will add the static table.

> > +
> > +    event->fd = syscall(__NR_perf_event_open, &attr, pid, -1, -1, 0);
> > +    if (event->fd < 0) {
> > +        virReportSystemError(errno,
> > +                             _("Unable to open perf event for %s"),
> > +                             virPerfEventTypeToString(event->type));
> > +        goto error;
> > +    }
> > +
> > +    if (ioctl(event->fd, PERF_EVENT_IOC_ENABLE) < 0) {
> > +        virReportSystemError(errno,
> > +                             _("Unable to enable perf event for %s"),
> > +                             virPerfEventTypeToString(event->type));
> > +        goto error;
> > +    }
> > +
> > +    event->enabled = true;
> > +    return 0;
> > +
> > + error:
> > +    VIR_FORCE_CLOSE(event->fd);
> > +    return -1;
> > +}
> > +
> >  int
> >  virPerfEventEnable(virPerfPtr perf,
> >                     virPerfEventType type, @@ -205,6 +261,13 @@
> > virPerfEventEnable(virPerfPtr perf,
> >          if (virPerfRdtEnable(event, pid) < 0)
> 
> I see this PerfRdt reference here which led me to wonder why both aren't being
> changed, but I think I understand now. Of course, now I wonder what the Rdt
> acronym means (is it Resource Director Technology?). If it's an acronym, it's nice
> to see it spelled out at least once. Probably in some earlier commit which I didn't
> chase. The lack of function comments is, well, frustrating.
> 
The comments about RDT will be added.

> >              return -1;
> >          break;
> > +    case VIR_PERF_EVENT_CACHE_MISSES:
> > +    case VIR_PERF_EVENT_CACHE_REFERENCES:
> > +    case VIR_PERF_EVENT_INSTRUCTIONS:
> > +    case VIR_PERF_EVENT_CPU_CYCLES:
> > +        if (virPerfGeneralEnable(event, pid) < 0)
> > +            return -1;
> > +        break;
> 
> here is where there could be some sort of tabular/matrix reference to get the
> 'type' and 'config' values filled in to pass.
> 
> >      case VIR_PERF_EVENT_LAST:
> >          virReportError(VIR_ERR_INTERNAL_ERROR,
> >                         _("Unexpected perf event type=%d"), type);
> > diff --git a/src/util/virperf.h b/src/util/virperf.h index
> > 7163410..7129370 100644
> > --- a/src/util/virperf.h
> > +++ b/src/util/virperf.h
> > @@ -28,6 +28,10 @@ typedef enum {
> >      VIR_PERF_EVENT_CMT,
> >      VIR_PERF_EVENT_MBMT,
> >      VIR_PERF_EVENT_MBML,
> > +    VIR_PERF_EVENT_CACHE_MISSES,
> > +    VIR_PERF_EVENT_CACHE_REFERENCES,
> > +    VIR_PERF_EVENT_INSTRUCTIONS,
> > +    VIR_PERF_EVENT_CPU_CYCLES,
> 
> Not that it matters too much since the numbers are different, but the order here
> is different than perf_hw_id. "Sometimes" it's kinder to access memory
> sequentially rather than somewhat randomly. Easier to see for future changers
> what was already too...
> 
> Any reason to not go after the other 6 events in 'perf_hw_id'?
> 
There are a lot of different perf events, but these 4 events will be used by OpenStack, and so currently I only add them in this patch.

> >
> >      VIR_PERF_EVENT_LAST
> >  } virPerfEventType;
> > diff --git a/tests/genericxml2xmlindata/generic-perf.xml
> > b/tests/genericxml2xmlindata/generic-perf.xml
> > index 394d2a6..6428ebd 100644
> > --- a/tests/genericxml2xmlindata/generic-perf.xml
> > +++ b/tests/genericxml2xmlindata/generic-perf.xml
> > @@ -16,6 +16,10 @@
> >      <event name='cmt' enabled='yes'/>
> >      <event name='mbmt' enabled='no'/>
> >      <event name='mbml' enabled='yes'/>
> > +    <event name='cache_misses' enabled='no'/>
> > +    <event name='cache_peferences' enabled='no'/>
> > +    <event name='instructions' enabled='no'/>
> > +    <event name='cpu_cycles' enabled='no'/>
> 
> All 4 are 'no', maybe make 1 or 2 'yes' (not that it matters, but they are a
> different table underneath the covers)
> 
Yes.

Thanks,
Qiaowei




More information about the libvir-list mailing list