[libvirt] [PATCH 2/3] Qemu: add CMT support

Ren, Qiaowei qiaowei.ren at intel.com
Mon Jul 20 13:50:54 UTC 2015


> -----Original Message-----
> From: Daniel P. Berrange [mailto:berrange at redhat.com]
> Sent: Monday, July 20, 2015 5:32 PM
> To: Ren, Qiaowei
> Cc: libvir-list at redhat.com
> Subject: Re: [libvirt] [PATCH 2/3] Qemu: add CMT support
> 
> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote:
> > One RFC in
> > https://www.redhat.com/archives/libvir-list/2015-June/msg01509.html
> >
> > CMT (Cache Monitoring Technology) can be used to measure the usage of
> > cache by VM running on the host. This patch will extend the bulk stats
> > API (virDomainListGetStats) to add this field. Applications based on
> > libvirt can use this API to achieve cache usage of VM. Because CMT
> > implementation in Linux kernel is based on perf mechanism, this patch
> > will enable perf event for CMT when VM is created and disable it when
> > VM is destroyed.
> >
> > Signed-off-by: Qiaowei Ren <qiaowei.ren at intel.com>
> 
> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index
> > 4cfae03..8c678c9 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -19320,6 +19320,53 @@ qemuDomainGetStatsBlock(virQEMUDriverPtr
> > driver,
> >
> >  #undef QEMU_ADD_COUNT_PARAM
> >
> > +static int
> > +qemuDomainGetStatsCache(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
> > +                        virDomainObjPtr dom,
> > +                        virDomainStatsRecordPtr record,
> > +                        int *maxparams,
> > +                        unsigned int privflags ATTRIBUTE_UNUSED)
> 
> So this is a method that is used to collect per-domain information
> 
> > +{
> > +    qemuDomainObjPrivatePtr priv = dom->privateData;
> > +    FILE *fd;
> > +    unsigned long long cache = 0;
> > +    int scaling_factor = 0;
> > +
> > +    if (priv->cmt_fd <= 0)
> > +        return -1;
> > +
> > +    if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("Unable to read cache data"));
> > +        return -1;
> > +    }
> > +
> > +    fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", "r");
> > +    if (!fd) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("Unable to open CMT scale file"));
> > +        return -1;
> > +    }
> > +    if (fscanf(fd, "%d", &scaling_factor) != 1) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("Unable to read CMT scale file"));
> > +        VIR_FORCE_FCLOSE(fd);
> > +        return -1;
> > +    }
> > +    VIR_FORCE_FCLOSE(fd);
> 
> But this data you are reading is global to the entire host.
> 

In fact this data is only for per-domain.  When the perf syscall is called to enable perf event for domain, the pid of that domain is passed.

> > +
> > +    cache *= scaling_factor;
> > +
> > +    if (virTypedParamsAddULLong(&record->params,
> > +                                &record->nparams,
> > +                                maxparams,
> > +                                "cache.current",
> > +                                cache) < 0)
> > +        return -1;
> > +
> > +    return 0;
> > +}
> > +
> 
> 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index
> > ba84182..00b889d 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> 
> > +/*
> > + * Enable CMT(Cache Monitoring Technology) to measure the usage of
> > + * cache by VM running on the node.
> > + *
> > + * Because the hypervisor implement CMT support basedon perf
> > +mechanism,
> > + * we should enable perf event for CMT. The function 'sys_erf_event_open'
> > + * is perf syscall wrapper.
> > + */
> > +#ifdef __linux__
> > +static long sys_perf_event_open(struct perf_event_attr *hw_event,
> > +                                pid_t pid, int cpu, int group_fd,
> > +                                unsigned long flags) {
> > +    return syscall(__NR_perf_event_open, hw_event, pid, cpu,
> > +                        group_fd, flags); } static int
> > +qemuCmtEnable(virDomainObjPtr vm) {
> > +    qemuDomainObjPrivatePtr priv = vm->privateData;
> > +    struct perf_event_attr cmt_attr;
> > +    int event_type;
> > +    FILE *fp;
> > +
> > +    fp = fopen("/sys/devices/intel_cqm/type", "r");
> > +    if (!fp) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("CMT is not available on this host"));
> > +        return -1;
> > +    }
> > +    if (fscanf(fp, "%d", &event_type) != 1) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("Unable to read event type file."));
> > +        VIR_FORCE_FCLOSE(fp);
> > +        return -1;
> > +    }
> > +    VIR_FORCE_FCLOSE(fp);
> > +
> > +    memset(&cmt_attr, 0, sizeof(struct perf_event_attr));
> > +    cmt_attr.size = sizeof(struct perf_event_attr);
> > +    cmt_attr.type = event_type;
> > +    cmt_attr.config = 1;
> > +    cmt_attr.inherit = 1;
> > +    cmt_attr.disabled = 1;
> > +    cmt_attr.enable_on_exec = 0;
> > +
> > +    priv->cmt_fd = sys_perf_event_open(&cmt_attr, vm->pid, -1, -1, 0);
> > +    if (priv->cmt_fd < 0) {
> > +        virReportSystemError(errno,
> > +                             _("Unable to open perf type=%d for pid=%d"),
> > +                             event_type, vm->pid);
> > +        return -1;
> > +    }
> > +
> > +    if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_ENABLE) < 0) {
> > +        virReportSystemError(errno, "%s",
> > +                             _("Unable to enable perf event for CMT"));
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +#else
> > +static int qemuCmtEnable(virDomainObjPtr vm) {
> > +    virReportUnsupportedError();
> > +    return -1;
> > +}
> > +#endif
> > +
> >  int qemuProcessStart(virConnectPtr conn,
> >                       virQEMUDriverPtr driver,
> >                       virDomainObjPtr vm, @@ -4954,6 +5026,11 @@ int
> > qemuProcessStart(virConnectPtr conn,
> >      if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0)
> >          goto cleanup;
> >
> > +    VIR_DEBUG("Setting CMT perf counter");
> > +    if (qemuCmtEnable(vm) < 0)
> > +        virReportSystemError(errno, "%s",
> > +                             _("CMT is not available on this host"));
> > +
> >      /* finally we can call the 'started' hook script if any */
> >      if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
> >          char *xml = qemuDomainDefFormatXML(driver, vm->def, 0); @@
> > -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver,
> >      virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort);
> >      priv->nbdPort = 0;
> >
> > +    /* Disable CMT */
> > +    if (priv->cmt_fd > 0) {
> 
> You can't rely on keeping an open file descriptor for the guest because libvirtd
> may be restarted.
> 

Sorry, I don't really get the meaning of this. You mean that when libvirtd is restarted, those resource which the domain opened should be closed, right? 

> > +        if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) {
> > +            virReportSystemError(errno, "%s",
> > +                                 _("Unable to disable perf event for CMT"));
> > +        }
> > +        VIR_FORCE_CLOSE(priv->cmt_fd);
> > +    }
> > +
> >      if (priv->agent) {
> >          qemuAgentClose(priv->agent);
> >          priv->agent = NULL;
> 
> 
> Conceptually I think this approach to implementation is flawed. While you are
> turning on/off the perf events for each QEMU process, the data collection does
> not distinguish data from each QEMU process - the data reported is host wide.
> So this really doesn't make much sense IMHO.
> 

As mentioned above, the data reported is only for domain.

> I'm also wondering whether this is really going to be sufficiently useful on its
> own. CPUs have countless other performance counters that I would imagine
> apps/admins will want to read in order to analyse QEMU performance, beyond
> this new CMT feature. The domain stats API won't really scale up to dealing with
> arbitrary perf event counter reporting so I'm not much of a fan of just special
> casing CMT in this way.
> 
> IOW, if we want to support host performance analysis in libvirt, then we
> probably want to design an set of APIs specifically for this purpose, but I could
> well see us saying that this is out of scope for libvirt and apps shoud just use the
> linux perf interfaces directly.
> 

Yes. I can get what you mean. Maybe libvirt doesn't have to be responsible for supporting host performance.

But I guess cache usage should be important for each domain, if those apps based on libvirt can achieve this information they will be able to better check and confirm the domain works normally, like the stats of cpu/memory/block/... which have be supported in libvirt now. Do you think so?

In fact, CMT support in kernel is not initially based on perf, and it used cgroup to report cache usage. But cgroup implementation is rejected by kernel community, and finally the implementation based on perf was merged. 

Thanks,
Qiaowei

 




More information about the libvir-list mailing list