[libvirt] [PATCH 2/3] Qemu: add CMT support

Ren, Qiaowei qiaowei.ren at intel.com
Fri Sep 25 14:14:41 UTC 2015


> -----Original Message-----
> From: libvir-list-bounces at redhat.com [mailto:libvir-list-bounces at redhat.com]
> On Behalf Of Ren, Qiaowei
> Sent: Monday, August 10, 2015 9:06 AM
> To: 'Daniel P. Berrange'
> Cc: 'libvir-list at redhat.com'
> Subject: Re: [libvirt] [PATCH 2/3] Qemu: add CMT support
> 
> 
> > -----Original Message-----
> > From: Ren, Qiaowei
> > Sent: Tuesday, July 21, 2015 4:00 PM
> > To: Daniel P. Berrange
> > Cc: libvir-list at redhat.com
> > Subject: RE: [libvirt] [PATCH 2/3] Qemu: add CMT support
> >
> > On Jul 20, 2015 22:34, Daniel P. Berrange wrote:
> > > On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote:
> > >>
> > >> Daniel P. Berrange wrote on Jul 20, 2015 17:32:
> > >>> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote:
> > >>>> One RFC in
> > >>>> https://www.redhat.com/archives/libvir-list/2015-June/msg01509.ht
> > >>>> m
> > >>>> l
> > >>>>
> > >>>> CMT (Cache Monitoring Technology) can be used to measure the
> > >>>> usage of cache by VM running on the host. This patch will extend
> > >>>> the bulk stats API (virDomainListGetStats) to add this field.
> > >>>> Applications based on libvirt can use this API to achieve cache
> > >>>> usage of VM. Because CMT implementation in Linux kernel is based
> > >>>> on perf mechanism, this patch will enable perf event for CMT when
> > >>>> VM is created and disable it when VM is destroyed.
> > >>>>
> > >>>> Signed-off-by: Qiaowei Ren <qiaowei.ren at intel.com>
> > >>>>
> > >>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> > >>>> index
> > >>>> 4cfae03..8c678c9 100644 --- a/src/qemu/qemu_driver.c +++
> > >>>> b/src/qemu/qemu_driver.c @@ -19320,6 +19320,53 @@
> > >>>> qemuDomainGetStatsBlock(virQEMUDriverPtr driver,
> > >>>>
> > >>>>  #undef QEMU_ADD_COUNT_PARAM
> > >>>> +static int +qemuDomainGetStatsCache(virQEMUDriverPtr driver
> > >>>> ATTRIBUTE_UNUSED, +                        virDomainObjPtr dom, +
> > >>>>                    virDomainStatsRecordPtr record, +
> > >>>>       int *maxparams, +                        unsigned int privflags
> > >>>> ATTRIBUTE_UNUSED)
> > >>>
> > >>> So this is a method that is used to collect per-domain information
> > >>>
> > >>>> +{
> > >>>> +    qemuDomainObjPrivatePtr priv = dom->privateData;
> > >>>> +    FILE *fd;
> > >>>> +    unsigned long long cache = 0;
> > >>>> +    int scaling_factor = 0;
> > >>>> +
> > >>>> +    if (priv->cmt_fd <= 0)
> > >>>> +        return -1;
> > >>>> +
> > >>>> +    if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) {
> > >>>> +        virReportSystemError(errno, "%s",
> > >>>> +                             _("Unable to read cache data"));
> > >>>> +        return -1;
> > >>>> +    }
> > >>>> +
> > >>>> +    fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", "r");
> > >>>> +    if (!fd) {
> > >>>> +        virReportSystemError(errno, "%s",
> > >>>> +                             _("Unable to open CMT scale file"));
> > >>>> +        return -1;
> > >>>> +    }
> > >>>> +    if (fscanf(fd, "%d", &scaling_factor) != 1) {
> > >>>> +        virReportSystemError(errno, "%s",
> > >>>> +                             _("Unable to read CMT scale file"));
> > >>>> +        VIR_FORCE_FCLOSE(fd);
> > >>>> +        return -1;
> > >>>> +    }
> > >>>> +    VIR_FORCE_FCLOSE(fd);
> > >>>
> > >>> But this data you are reading is global to the entire host.
> > >>>
> > >>
> > >> In fact this data is only for per-domain.  When the perf syscall is
> > >> called to enable perf event for domain, the pid of that domain is
> > >> passed.
> > >
> > > Ah, I see - you rely on the open file descriptor to be associated with the VM
> pid.
> > >
> > >
> > >>>> -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver,
> > >>>>      virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort);
> > >>>>      priv->nbdPort = 0;
> > >>>> +    /* Disable CMT */
> > >>>> +    if (priv->cmt_fd > 0) {
> > >>>
> > >>> You can't rely on keeping an open file descriptor for the guest
> > >>> because libvirtd may be restarted.
> > >>>
> > >>
> > >> Sorry, I don't really get the meaning of this. You mean that when
> > >> libvirtd is restarted, those resource which the domain opened
> > >> should be closed, right?
> > >
> > > No, when libvirtd is restarted, the domains must all continuing
> > > running without loss of state. You open the FD when starting the
> > > guest, then libvirtd is restarted, now someone wants to query the
> > > perf data. The perf FD will not be open anymore because libvirtd was
> > > restarted. At least you'd need to re-open the file descriptor when
> > > libvirtd starts up again, for any running guest. I'm not really
> > > convinced we want to keep the perf file descriptors open for all the
> > > domains for the entire time they are running. Should really only
> > > open them
> > when we actually want to read the collected data.
> > >
> >
> > Got it! Should open/disable them when read the data.
> >
> > >>
> > >>>> +        if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) {
> > >>>> +            virReportSystemError(errno, "%s",
> > >>>> +                                 _("Unable to disable perf event for CMT"));
> > >>>> +        }
> > >>>> +        VIR_FORCE_CLOSE(priv->cmt_fd);
> > >>>> +    }
> > >>>> +
> > >>>>      if (priv->agent) {
> > >>>>          qemuAgentClose(priv->agent);
> > >>>>          priv->agent = NULL;
> > >>>
> > >>>
> > >>> Conceptually I think this approach to implementation is flawed.
> > >>> While you are turning on/off the perf events for each QEMU
> > >>> process, the data collection does not distinguish data from each
> > >>> QEMU process
> > >>> - the data reported is host wide. So this really doesn't make much
> > >>> sense
> > IMHO.
> > >>>
> > >>
> > >> As mentioned above, the data reported is only for domain.
> > >>
> > >>> I'm also wondering whether this is really going to be sufficiently
> > >>> useful on its own. CPUs have countless other performance counters
> > >>> that I would imagine apps/admins will want to read in order to
> > >>> analyse QEMU performance, beyond this new CMT feature. The domain
> > >>> stats API won't really scale up to dealing with arbitrary perf
> > >>> event counter reporting so I'm not much of a fan of just special
> > >>> casing CMT in this
> > way.
> > >>>
> > >>> IOW, if we want to support host performance analysis in libvirt,
> > >>> then we probably want to design an set of APIs specifically for
> > >>> this purpose, but I could well see us saying that this is out of
> > >>> scope for libvirt and apps shoud just use the linux perf interfaces directly.
> > >>>
> > >>
> > >> Yes. I can get what you mean. Maybe libvirt doesn't have to be
> > >> responsible for supporting host performance.
> > >>
> > >> But I guess cache usage should be important for each domain, if
> > >> those apps based on libvirt can achieve this information they will
> > >> be able to better check and confirm the domain works normally, like
> > >> the stats of cpu/memory/block/... which have be supported in
> > >> libvirt now. Do you think so?
> > >
> > > I'm not saying cache usage is unimportant. There are quite a lot of
> > > other hardware event counters in modern CPUs though, so I'm asking
> > > why we should add just this new special intel event, and not any of
> > > the other existing performance counters that are useful in
> > > diagnosing
> > performance issues.
> >
> > Ah, I guess you mean that the best way is providing a set of utility
> > method for perf operation, like cgroup support. Then we can use these
> > interface to support a lot of necessary perf counters.
> >
> > If we have to do so, maybe I can try to firstly implement such methods
> > and then you can help review them.
> >
> > > Also, I'm thinking that QEMU has many different threads - VCPU
> > > threads, I/O threads, emulator threads and so on. I could see users
> > > want to have distinct profiling for these different functional areas
> > > of QEMU too instead of just whole- QEMU granularity.
> > >
> >
> > Yes. I believe such features should be useful for some users. I am
> > currently working on adding some features like CMT into OpenStack, I
> > only know profiling for VCPU/ IO / emulator threads should be not
> > necessary for OpenStack. ^-^
> >
> 
> Hi Daniel,
> 
> what do you think about it now? If we have to add CMT support in Nova, do you
> think what is the best way? Can we directly use linux perf interface in OpenStack?
> 

Hi Daniel,

If we firstly add utility methods for perf operation into libvirt like cgroup, and then implement CMT feature based on this, do you think it is OK?

Thanks,
Qiaowei




More information about the libvir-list mailing list