[libvirt] [PATCH 2/3] Qemu: add CMT support
Ren, Qiaowei
qiaowei.ren at intel.com
Tue Jul 21 07:59:40 UTC 2015
On Jul 20, 2015 22:34, Daniel P. Berrange wrote:
> On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote:
>>
>> Daniel P. Berrange wrote on Jul 20, 2015 17:32:
>>> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote:
>>>> One RFC in
>>>> https://www.redhat.com/archives/libvir-list/2015-June/msg01509.htm
>>>> l
>>>>
>>>> CMT (Cache Monitoring Technology) can be used to measure the usage
>>>> of cache by VM running on the host. This patch will extend the
>>>> bulk stats API (virDomainListGetStats) to add this field.
>>>> Applications based on libvirt can use this API to achieve cache
>>>> usage of VM. Because CMT implementation in Linux kernel is based
>>>> on perf mechanism, this patch will enable perf event for CMT when
>>>> VM is created and disable it when VM is destroyed.
>>>>
>>>> Signed-off-by: Qiaowei Ren <qiaowei.ren at intel.com>
>>>>
>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index
>>>> 4cfae03..8c678c9 100644 --- a/src/qemu/qemu_driver.c +++
>>>> b/src/qemu/qemu_driver.c @@ -19320,6 +19320,53 @@
>>>> qemuDomainGetStatsBlock(virQEMUDriverPtr driver,
>>>>
>>>> #undef QEMU_ADD_COUNT_PARAM
>>>> +static int +qemuDomainGetStatsCache(virQEMUDriverPtr driver
>>>> ATTRIBUTE_UNUSED, + virDomainObjPtr dom, +
>>>> virDomainStatsRecordPtr record, +
>>>> int *maxparams, + unsigned int privflags
>>>> ATTRIBUTE_UNUSED)
>>>
>>> So this is a method that is used to collect per-domain information
>>>
>>>> +{
>>>> + qemuDomainObjPrivatePtr priv = dom->privateData;
>>>> + FILE *fd;
>>>> + unsigned long long cache = 0;
>>>> + int scaling_factor = 0;
>>>> +
>>>> + if (priv->cmt_fd <= 0)
>>>> + return -1;
>>>> +
>>>> + if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) {
>>>> + virReportSystemError(errno, "%s",
>>>> + _("Unable to read cache data"));
>>>> + return -1;
>>>> + }
>>>> +
>>>> + fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", "r");
>>>> + if (!fd) {
>>>> + virReportSystemError(errno, "%s",
>>>> + _("Unable to open CMT scale file"));
>>>> + return -1;
>>>> + }
>>>> + if (fscanf(fd, "%d", &scaling_factor) != 1) {
>>>> + virReportSystemError(errno, "%s",
>>>> + _("Unable to read CMT scale file"));
>>>> + VIR_FORCE_FCLOSE(fd);
>>>> + return -1;
>>>> + }
>>>> + VIR_FORCE_FCLOSE(fd);
>>>
>>> But this data you are reading is global to the entire host.
>>>
>>
>> In fact this data is only for per-domain. When the perf syscall is
>> called to enable perf event for domain, the pid of that domain is
>> passed.
>
> Ah, I see - you rely on the open file descriptor to be associated with the VM pid.
>
>
>>>> -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>>>> virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort);
>>>> priv->nbdPort = 0;
>>>> + /* Disable CMT */
>>>> + if (priv->cmt_fd > 0) {
>>>
>>> You can't rely on keeping an open file descriptor for the guest
>>> because libvirtd may be restarted.
>>>
>>
>> Sorry, I don't really get the meaning of this. You mean that when
>> libvirtd is restarted, those resource which the domain opened should
>> be closed, right?
>
> No, when libvirtd is restarted, the domains must all continuing running without
> loss of state. You open the FD when starting the guest, then libvirtd is restarted,
> now someone wants to query the perf data. The perf FD will not be open
> anymore because libvirtd was restarted. At least you'd need to re-open the file
> descriptor when libvirtd starts up again, for any running guest. I'm not really
> convinced we want to keep the perf file descriptors open for all the domains for
> the entire time they are running. Should really only open them when we actually
> want to read the collected data.
>
Got it! Should open/disable them when read the data.
>>
>>>> + if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) {
>>>> + virReportSystemError(errno, "%s",
>>>> + _("Unable to disable perf event for CMT"));
>>>> + }
>>>> + VIR_FORCE_CLOSE(priv->cmt_fd);
>>>> + }
>>>> +
>>>> if (priv->agent) {
>>>> qemuAgentClose(priv->agent);
>>>> priv->agent = NULL;
>>>
>>>
>>> Conceptually I think this approach to implementation is flawed. While
>>> you are turning on/off the perf events for each QEMU process, the data
>>> collection does not distinguish data from each QEMU process - the data
>>> reported is host wide. So this really doesn't make much sense IMHO.
>>>
>>
>> As mentioned above, the data reported is only for domain.
>>
>>> I'm also wondering whether this is really going to be sufficiently
>>> useful on its own. CPUs have countless other performance counters that
>>> I would imagine apps/admins will want to read in order to analyse QEMU
>>> performance, beyond this new CMT feature. The domain stats API won't
>>> really scale up to dealing with arbitrary perf event counter reporting
>>> so I'm not much of a fan of just special casing CMT in this way.
>>>
>>> IOW, if we want to support host performance analysis in libvirt,
>>> then we probably want to design an set of APIs specifically for this
>>> purpose, but I could well see us saying that this is out of scope
>>> for libvirt and apps shoud just use the linux perf interfaces directly.
>>>
>>
>> Yes. I can get what you mean. Maybe libvirt doesn't have to be
>> responsible for supporting host performance.
>>
>> But I guess cache usage should be important for each domain, if those
>> apps based on libvirt can achieve this information they will be able
>> to better check and confirm the domain works normally, like the stats
>> of cpu/memory/block/... which have be supported in libvirt now. Do you
>> think so?
>
> I'm not saying cache usage is unimportant. There are quite a lot of other
> hardware event counters in modern CPUs though, so I'm asking why we should
> add just this new special intel event, and not any of the other existing
> performance counters that are useful in diagnosing performance issues.
Ah, I guess you mean that the best way is providing a set of utility method for perf operation, like cgroup support. Then we can use these interface to support a lot of necessary perf counters.
If we have to do so, maybe I can try to firstly implement such methods and then you can help review them.
> Also, I'm thinking that QEMU has many different threads - VCPU threads, I/O
> threads, emulator threads and so on. I could see users want to have distinct
> profiling for these different functional areas of QEMU too instead of just whole-
> QEMU granularity.
>
Yes. I believe such features should be useful for some users. I am currently working on adding some features like CMT into OpenStack, I only know profiling for VCPU/ IO / emulator threads should be not necessary for OpenStack. ^-^
Thanks,
Qiaowei
More information about the libvir-list
mailing list