[libvirt] [PATCH 1/1] perf: add more perf events support
Ren, Qiaowei
qiaowei.ren at intel.com
Sat Jul 16 08:37:44 UTC 2016
> -----Original Message-----
> From: John Ferlan [mailto:jferlan at redhat.com]
> Sent: Wednesday, July 13, 2016 4:02 AM
> To: Ren, Qiaowei <qiaowei.ren at intel.com>; libvir-list at redhat.com
> Cc: Peter Krempa <pkrempa at redhat.com>
> Subject: Re: [libvirt] [PATCH 1/1] perf: add more perf events support
>
>
>
> On 06/29/2016 08:10 PM, Qiaowei Ren wrote:
> > With current perf framework, this patch adds support to more perf
> ^^^^^^^ for more perf
>
> > events, including cache missing, cache peference, cpu cycles,
>
> A quick google search turns up "cache references" - there's just too many
> peference or peferences references to comment on them all, but they all need
> to be "references"
>
John, according perf code from linux kernel, 'cache peference' is from linux kernel code. Certainly 'perf list' command show the 'references', and I will change it to this.
> > instrction, etc..
>
> instructions
>
> >
> > Signed-off-by: Qiaowei Ren <qiaowei.ren at intel.com>
> > ---
> > docs/formatdomain.html.in | 24 +++++++++++
> > docs/schemas/domaincommon.rng | 4 ++
> > include/libvirt/libvirt-domain.h | 39 +++++++++++++++++
> > src/libvirt-domain.c | 8 ++++
> > src/qemu/qemu_driver.c | 23 +++++-----
> > src/util/virperf.c | 65 ++++++++++++++++++++++++++++-
> > src/util/virperf.h | 4 ++
> > tests/genericxml2xmlindata/generic-perf.xml | 4 ++
> > 8 files changed, 158 insertions(+), 13 deletions(-)
> >
>
> I see no changes for virsh.pod, see commit id '3110363d' for a recent change
> Peter made in this space...
>
> I think perhaps it may also be worthwhile to "in a separate patch" alter the
> 'domstats --perf' description to simply reference the 'perf'
> description where each of the collect perf.* events can be listed and described.
>
> Each of the collectible events could have some sort of tabular output - see how
> 'vol-wipe' describes the various supported algorithms. So much easier to read
> than one long sentence.
>
Sure. I will add one patch about this.
>
> > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > index f660aa6..7999e43 100644
> > --- a/docs/formatdomain.html.in
> > +++ b/docs/formatdomain.html.in
> > @@ -1839,6 +1839,10 @@
> > <event name='cmt' enabled='yes'/>
> > <event name='mbmt' enabled='no'/>
> > <event name='mbml' enabled='yes'/>
> > + <event name='cache_misses' enabled='no'/>
> > + <event name='cache_peferences' enabled='no'/>
> > + <event name='instructions' enabled='no'/>
> > + <event name='cpu_cycles' enabled='no'/>
> > </perf>
> > ...
> > </pre>
> > @@ -1864,6 +1868,26 @@
> > <td>bandwidth of memory traffic for a memory controller</td>
> > <td><code>perf.mbml</code></td>
> > </tr>
> > + <tr>
> > + <td><code>cache_misses</code></td>
> > + <td>the amount of cache missing by applications running on the
> > + platform</td>
>
> is this the count of caches misses? amount implies perhaps other things.
>
Yes. I will change it.
> > + <td><code>perf.cache_misses</code></td>
> > + </tr>
> > + <tr>
> > + <td><code>cache_peferences</code></td>
> > + <td>the amount of cache hit by applications running on the platform</td>
> > + <td><code>perf.cache_peferences</code></td>
>
> similar is this the count of cache hits, right?
>
Yes.
> > + </tr>
> > + <tr>
> > + <td><code>instructions</code></td>
> > + <td>the amount of instructions by applications running on the
> > + platform</td>
>
> the count of instructions executed...
>
> > + <td><code>perf.instructions</code></td>
> > + </tr>
> > + <tr>
> > + <td><code>cpu_cycles</code></td>
> > + <td>the amount of cycles one instruction needs</td>
>
> the number/count of cpu cycles
>
> > + <td><code>perf.cpu_cycles</code></td>
> > + </tr>
> > </table>
> >
> > <h3><a name="elementsDevices">Devices</a></h3>
> > diff --git a/docs/schemas/domaincommon.rng
> > b/docs/schemas/domaincommon.rng index 563cb3c..e41dc3a 100644
> > --- a/docs/schemas/domaincommon.rng
> > +++ b/docs/schemas/domaincommon.rng
> > @@ -414,6 +414,10 @@
> > <value>cmt</value>
> > <value>mbmt</value>
> > <value>mbml</value>
> > + <value>cache_misses</value>
> > + <value>cache_peferences</value>
> > + <value>instructions</value>
> > + <value>cpu_cycles</value>
> > </choice>
> > </attribute>
> > <attribute name="enabled">
> > diff --git a/include/libvirt/libvirt-domain.h
> > b/include/libvirt/libvirt-domain.h
> > index 7ea93aa..b79cdb0 100644
> > --- a/include/libvirt/libvirt-domain.h
> > +++ b/include/libvirt/libvirt-domain.h
> > @@ -1947,6 +1947,45 @@ void
> virDomainStatsRecordListFree(virDomainStatsRecordPtr *stats);
> > */
> > # define VIR_PERF_PARAM_MBML "mbml"
> >
> > +/**
> > + * VIR_PERF_PARAM_CACHE_MISSES:
> > + *
> > + * Macro for typed parameter name that represents cache_misses perf
> > + * event which can be used to measure the amount of cache missing by
>
> s/amount/count
>
> s/cache missing/cache misses/ (missing is a very different context!)
>
> > + * applications running on the platform. It corresponds to the
> > + * "perf.cache_misses" field in the *Stats APIs.
> > + */
> > +# define VIR_PERF_PARAM_CACHE_MISSES "cache_misses"
> > +
> > +/**
> > + * VIR_PERF_PARAM_CACHE_REFERENCES:
> > + *
> > + * Macro for typed parameter name that represents cache_peferences
> > + * perf event which can be used to measure the amount of cache hit
>
> similar... amount/count ... hit/hits
>
> > + * by applications running on the platform. It corresponds to the
> > + * "perf.cache_peferences" field in the *Stats APIs.
> > + */
> > +# define VIR_PERF_PARAM_CACHE_REFERENCES "cache_peferences"
> > +
> > +/**
> > + * VIR_PERF_PARAM_INSTRUCTIONS:
> > + *
> > + * Macro for typed parameter name that represents instructions perf
> > + * event which can be used to measure the amount of instructions
>
> similar amount/count
>
> > + * by applications running on the platform. It corresponds to the
> > + * "perf.instructions" field in the *Stats APIs.
> > + */
> > +# define VIR_PERF_PARAM_INSTRUCTIONS "instructions"
> > +
> > +/**
> > + * VIR_PERF_PARAM_CPU_CYCLES:
> > + *
> > + * Macro for typed parameter name that represents cpu_cycles perf
> > +event
> > + * which can be used to measure how many cycles one instruction needs.
> > + * It corresponds to the "perf.cpu_cycles" field in the *Stats APIs.
> > + */
>
> The cycles and instructions seem to me to be things that could be very large and
> awkward to print
> > +# define VIR_PERF_PARAM_CPU_CYCLES "cpu_cycles"
> > +
> > int virDomainGetPerfEvents(virDomainPtr dom,
> > virTypedParameterPtr *params,
> > int *nparams, diff --git
> > a/src/libvirt-domain.c b/src/libvirt-domain.c index 4e71a94..b817e4b
> > 100644
> > --- a/src/libvirt-domain.c
> > +++ b/src/libvirt-domain.c
> > @@ -11452,6 +11452,14 @@ virConnectGetDomainCapabilities(virConnectPtr
> conn,
> > * "perf.mbml" - the amount of data (bytes/s) sent through the memory
> controller
> > * on the socket as unsigned long long. It is produced by mbml
> > * perf event.
> > + * "perf.cache_misses" - the amount of cache missing as unsigned long long.
> > + * It is produced by cache_misses perf event.
> > + * "perf.cache_peferences" - the amount of cache hit as unsigned long long.
> > + * It is produced by cache_peferences perf event.
> > + * "perf.instructions" - the amount of instructions as unsigned long long.
> > + * It is produced by instructions perf event.
> > + * "perf.cpu_cycles" - the amount of cycles one instruction needs as
> > + unsigned
> > + * long long. It is produced by cpu_cycles perf event.
>
> Similar "amount" vs. "count"
>
> > *
> > * Note that entire stats groups or individual stat fields may be missing from
> > * the output in case they are not supported by the given hypervisor,
> > are not diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > index 61d184b..bea753f 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -9613,6 +9613,10 @@ qemuDomainSetPerfEvents(virDomainPtr dom,
> > VIR_PERF_PARAM_CMT, VIR_TYPED_PARAM_BOOLEAN,
> > VIR_PERF_PARAM_MBMT, VIR_TYPED_PARAM_BOOLEAN,
> > VIR_PERF_PARAM_MBML,
> > VIR_TYPED_PARAM_BOOLEAN,
> > + VIR_PERF_PARAM_CACHE_MISSES,
> VIR_TYPED_PARAM_BOOLEAN,
> > + VIR_PERF_PARAM_CACHE_REFERENCES,
> VIR_TYPED_PARAM_BOOLEAN,
> > + VIR_PERF_PARAM_INSTRUCTIONS,
> VIR_TYPED_PARAM_BOOLEAN,
> > + VIR_PERF_PARAM_CPU_CYCLES,
> > + VIR_TYPED_PARAM_BOOLEAN,
> > NULL) < 0)
> > return -1;
> >
> > @@ -18941,10 +18945,10 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr
> > driver, #undef QEMU_ADD_COUNT_PARAM
> >
> > static int
> > -qemuDomainGetStatsPerfRdt(virPerfPtr perf,
> > - virPerfEventType type,
> > - virDomainStatsRecordPtr record,
> > - int *maxparams)
> > +qemuDomainGetStatsPerfOneEvent(virPerfPtr perf,
> > + virPerfEventType type,
> > + virDomainStatsRecordPtr record,
> > + int *maxparams)
>
> This change seems separable. That is it's own patch to change the name of the
> function because it's going to be multiple/general purpose soon.
>
Ok. I will separate it from this patch.
> > {
> > char param_name[VIR_TYPED_PARAM_FIELD_LENGTH];
> > uint64_t value = 0;
> > @@ -18980,14 +18984,9 @@ qemuDomainGetStatsPerf(virQEMUDriverPtr
> driver ATTRIBUTE_UNUSED,
> > if (!virPerfEventIsEnabled(priv->perf, i))
> > continue;
> >
> > - switch (i) {
> > - case VIR_PERF_EVENT_CMT:
> > - case VIR_PERF_EVENT_MBMT:
> > - case VIR_PERF_EVENT_MBML:
>
> And removing the need for a switch could be separate too... Trying to think if
> it's still necessary though. virPerfEventIsEnabled will return NULL if i >=
> VIR_PERF_EVENT_LAST, so it doesn't seem some future client could request an
> event that some older driver doesn't support.
>
virPerfEventIsEnabled will return false (not NULL) if I >= VIR_PERF_EVENT_LAST. Some future clients could not enable those events that older driver doesn't support.
> > - if (qemuDomainGetStatsPerfRdt(priv->perf, i, record, maxparams) < 0)
> > - goto cleanup;
> > - break;
> > - }
> > + if (qemuDomainGetStatsPerfOneEvent(priv->perf, i,
> > + record, maxparams) < 0)
> > + goto cleanup;
> > }
> >
> > ret = 0;
> > diff --git a/src/util/virperf.c b/src/util/virperf.c index
> > 4661ba3..a3d2bc6 100644
> > --- a/src/util/virperf.c
> > +++ b/src/util/virperf.c
> > @@ -38,7 +38,9 @@ VIR_LOG_INIT("util.perf"); #define VIR_FROM_THIS
> > VIR_FROM_PERF
> >
> > VIR_ENUM_IMPL(virPerfEvent, VIR_PERF_EVENT_LAST,
> > - "cmt", "mbmt", "mbml");
> > + "cmt", "mbmt", "mbml",
> > + "cache_misses", "cache_peferences",
> > + "instructions", "cpu_cycles");
> >
> > struct virPerfEvent {
> > int type;
> > @@ -189,6 +191,60 @@ virPerfRdtEnable(virPerfEventPtr event,
> > return -1;
> > }
> >
> > +static int
> > +virPerfGeneralEnable(virPerfEventPtr event,
> > + pid_t pid)
>
> Currently, these are less "General" and more "Hardware" events
>
> Based on what I see in perf_event.h, I suspect the future could hold getting
> software, tracepoint, hw_cache, raw, and breakpoint events too.
>
> Perhaps in order to be more "General" the "type" and "config" parameters could
> be passed...
>
> > +{
> > + struct perf_event_attr attr;
> > +
> > + memset(&attr, 0, sizeof(attr));
> > + attr.size = sizeof(attr);
> > + attr.inherit = 1;
> > + attr.disabled = 1;
> > + attr.enable_on_exec = 0;
> > +
> > + switch (event->type) {
> > + case VIR_PERF_EVENT_CACHE_MISSES:
> > + attr.type = PERF_TYPE_HARDWARE;
> ^^^^^^^^^
> this doesn't change for any of the cases
>
> > + attr.config = PERF_COUNT_HW_CACHE_MISSES;
> > + break;
> > + case VIR_PERF_EVENT_CACHE_REFERENCES:
> > + attr.type = PERF_TYPE_HARDWARE;
> > + attr.config = PERF_COUNT_HW_CACHE_REFERENCES;
> > + break;
> > + case VIR_PERF_EVENT_INSTRUCTIONS:
> > + attr.type = PERF_TYPE_HARDWARE;
> > + attr.config = PERF_COUNT_HW_INSTRUCTIONS;
> > + break;
> > + case VIR_PERF_EVENT_CPU_CYCLES:
> > + attr.type = PERF_TYPE_HARDWARE;
> > + attr.config = PERF_COUNT_HW_CPU_CYCLES;
> > + break;
> > + }
>
> ...Seems like it would be possible to create some sort of static table/matrix that
> would be able to convert the VIR_PERF_EVENT_* into their respective
> "attr.type" and "attr.config", so that this function doesn't have the switch and
> the calling function passes by value the 'type' and 'config'. Assuming of course
> the future is to get other events.
>
Yes. This is a good idea. And I will add the static table.
> > +
> > + event->fd = syscall(__NR_perf_event_open, &attr, pid, -1, -1, 0);
> > + if (event->fd < 0) {
> > + virReportSystemError(errno,
> > + _("Unable to open perf event for %s"),
> > + virPerfEventTypeToString(event->type));
> > + goto error;
> > + }
> > +
> > + if (ioctl(event->fd, PERF_EVENT_IOC_ENABLE) < 0) {
> > + virReportSystemError(errno,
> > + _("Unable to enable perf event for %s"),
> > + virPerfEventTypeToString(event->type));
> > + goto error;
> > + }
> > +
> > + event->enabled = true;
> > + return 0;
> > +
> > + error:
> > + VIR_FORCE_CLOSE(event->fd);
> > + return -1;
> > +}
> > +
> > int
> > virPerfEventEnable(virPerfPtr perf,
> > virPerfEventType type, @@ -205,6 +261,13 @@
> > virPerfEventEnable(virPerfPtr perf,
> > if (virPerfRdtEnable(event, pid) < 0)
>
> I see this PerfRdt reference here which led me to wonder why both aren't being
> changed, but I think I understand now. Of course, now I wonder what the Rdt
> acronym means (is it Resource Director Technology?). If it's an acronym, it's nice
> to see it spelled out at least once. Probably in some earlier commit which I didn't
> chase. The lack of function comments is, well, frustrating.
>
The comments about RDT will be added.
> > return -1;
> > break;
> > + case VIR_PERF_EVENT_CACHE_MISSES:
> > + case VIR_PERF_EVENT_CACHE_REFERENCES:
> > + case VIR_PERF_EVENT_INSTRUCTIONS:
> > + case VIR_PERF_EVENT_CPU_CYCLES:
> > + if (virPerfGeneralEnable(event, pid) < 0)
> > + return -1;
> > + break;
>
> here is where there could be some sort of tabular/matrix reference to get the
> 'type' and 'config' values filled in to pass.
>
> > case VIR_PERF_EVENT_LAST:
> > virReportError(VIR_ERR_INTERNAL_ERROR,
> > _("Unexpected perf event type=%d"), type);
> > diff --git a/src/util/virperf.h b/src/util/virperf.h index
> > 7163410..7129370 100644
> > --- a/src/util/virperf.h
> > +++ b/src/util/virperf.h
> > @@ -28,6 +28,10 @@ typedef enum {
> > VIR_PERF_EVENT_CMT,
> > VIR_PERF_EVENT_MBMT,
> > VIR_PERF_EVENT_MBML,
> > + VIR_PERF_EVENT_CACHE_MISSES,
> > + VIR_PERF_EVENT_CACHE_REFERENCES,
> > + VIR_PERF_EVENT_INSTRUCTIONS,
> > + VIR_PERF_EVENT_CPU_CYCLES,
>
> Not that it matters too much since the numbers are different, but the order here
> is different than perf_hw_id. "Sometimes" it's kinder to access memory
> sequentially rather than somewhat randomly. Easier to see for future changers
> what was already too...
>
> Any reason to not go after the other 6 events in 'perf_hw_id'?
>
There are a lot of different perf events, but these 4 events will be used by OpenStack, and so currently I only add them in this patch.
> >
> > VIR_PERF_EVENT_LAST
> > } virPerfEventType;
> > diff --git a/tests/genericxml2xmlindata/generic-perf.xml
> > b/tests/genericxml2xmlindata/generic-perf.xml
> > index 394d2a6..6428ebd 100644
> > --- a/tests/genericxml2xmlindata/generic-perf.xml
> > +++ b/tests/genericxml2xmlindata/generic-perf.xml
> > @@ -16,6 +16,10 @@
> > <event name='cmt' enabled='yes'/>
> > <event name='mbmt' enabled='no'/>
> > <event name='mbml' enabled='yes'/>
> > + <event name='cache_misses' enabled='no'/>
> > + <event name='cache_peferences' enabled='no'/>
> > + <event name='instructions' enabled='no'/>
> > + <event name='cpu_cycles' enabled='no'/>
>
> All 4 are 'no', maybe make 1 or 2 'yes' (not that it matters, but they are a
> different table underneath the covers)
>
Yes.
Thanks,
Qiaowei
More information about the libvir-list
mailing list