[libvirt] [PATCHv5 13/19] conf: Add resctrl monitor configuration

John Ferlan jferlan at redhat.com
Mon Oct 15 17:03:28 UTC 2018



On 10/15/18 11:25 AM, Wang, Huaqiang wrote:
> 
> On 10/13/2018 6:29 AM, John Ferlan wrote:
>>
>> On 10/12/18 3:10 AM, Wang, Huaqiang wrote:
>>>> -----Original Message-----

[...]

>>>> IOW: What is cache_occupancy measuring? Each cache? The entire
>>>> thing? If
>>>> there's no cache elements, then what?
>>> cache_occupancy is measuring based on cache bank. For Intel 2 socket
>>> xeon CPU,
>>> it is considered as two cache banks, one cache bank per socket. The
>>> typical
>>> output for each monitor of this case is:
>>>
>>>       cpu.cache.0.name=vcpus_1
>>>       cpu.cache.0.vcpus=1
>>>       cpu.cache.0.bank.count=2          <--- 2 cache banks
>>>       cpu.cache.0.bank.0.id=0           <--- bank.id.0 cache_occypancy
>>>       cpu.cache.0.bank.0.bytes=9371648    _|
>>>       cpu.cache.0.bank.1.id=1           <--- bank.id.1 cache_occypancy
>>>       cpu.cache.0.bank.1.bytes=1081344    _|
>>>
>>> If you want to know the total cache occupancy for VM vcpu threads of
>>> this
>>> monitor, you need to add them up.
>>>
>> So if you have:
>>
>>     <monitor... vcpus=0-1>
>>
>> what do you get in output for cache_occupancy? 0 + 1?
> 
> Yes. Output is sum of two vcpus.
> 
> for cache bank 0
>     vcpus_0-1.bank.0.bytes =   vcpus_0.bank.0.bytes + vcpus_1.bank.0.bytes
> for cache bank 1
>     vcpus_0-1.bank.1.bytes =   vcpus_0.bank.1.bytes + vcpus_1.bank.1.bytes
> 
>>
>>>> I honestly think this just needs to be simplified as much as possible.
> 
> "I honestly think this just needs to be simplified as much as possible."
> 
> I reconsidered your comment ( in above line), do you mean the XML
> configuration for 'monitor' need to be simplified also?
> 

This is/was a comment regarding default stuff which you are removing.

> What I think is, even after the removal of 'default monitor' and
> 'default allocation' concepts, the XML
> configuration for monitors (with type 'all', 'm-to-n', 'one to one')
> still need such kind of arrangement.
> 
> Take an example, a VM has 4 vcpus, vcpu 0 and 1 run cache sensitive
> workload, and wants to hold
> private L3 caches, and there is no specific requirement for left vcpus
> but still need a monitoring on
> the cache usage.
> 
> Then we could create an cache allocation for vcpu 0 and 1 as well as a
> monitor on getting the
> actual cache that these two vcpus used. For vcpu 2 and 3, create a
> monitor for it.
> 
> The XML configurations are: (no change in general rules comparing to my
> previous examples)
> 
>     <cachetune vcpus='0-1'>
>       <cache id='0' level='3' type='both' size='3' unit='MiB'/>
>       <cache id='1' level='3' type='both' size='3' unit='MiB'/>
>       <monitor level=3 vcpus='0-1'/>
>     </cachetune>
>     <cachetune vcpus='2-3'>
>       <monitor level=3 vcpus='2-3'/>
>     </cachetune>
>  
> Any suggestion from you is welcome.
> 


I'm not sure what the question is and I'm not sure it matters at this
point. If you only create an allocation for any <cachetune> or
<memorytune> entry, then that's all that'll be reported which is what I
was trying to point out. Its not that something else may or may not
exist, it's what gets reported and can be queried via the XML.

> 
>>>> When you monitor specific vcpus within a cachetune, then you get what?
>>> In this case, the monitor you created only monitors the specific vcpus
>>> you added for monitor.
>>>
>>> Following two configurations satisfy your scenario, and the only monitor
>>> will detect the cache usage of thread of vcpu 2.
>>>
>>>       <cachetune vcpus='2-4'>
>>>         <cache id='0' level='3' type='both' size='3' unit='MiB'/>
>>>         <cache id='1' level='3' type='both' size='3' unit='MiB'/>
>>>         <monitor level=3 vcpus='2'/>
>>>       </cachetune>
>>>
>>>       <cachetune vcpus='2-4'>
>>>         <monitor level=3 vcpus='2'/>
>>>       </cachetune>
>>>
>> Perhaps my question was mistyped or misinterpreted. In the above top
>> example, if we have <monitor ... vcpus='2-4'>, then do the values in
>> <cache> have any impact on the calculation as opposed to if they weren't
>> there?
> 
> I perhaps still not understand you well ...
> There will have significant influence for the output of monitor if
> <cache> entry
> exist and if vcpu2-4 demands much more caches that allocation can offer;
> If the
> cache that the allocation offers is much bigger than vcpu2-4 actually
> used, the
> influence will be tiny.
> 
> But in another case, that, if there is no 'cache' entries, just showing
> in the second
> example, it still influenced by the cache that the 'allocation' offers.
> Its difference
> with the first example is: the top example is using the cache resources
> allocated
> by the allocation of itself, while the second example uses the
> allocation of resources
> defined in /sys/fs/resctrl/schemata, and this cache is shared by
> multiple system tasks.
> 

The question was related to how <monitor> is defined and trying to
further describe my feeling that default was necessary.

>>
>>>
>>>> If the cachetune has no specific cache entries, you get what?
>>> If no cache entry in cachetune, it will also get vcpu threads' cache
>>> utilization information based on cache bank.
>>> No cache entry specified for the cachetune, means it will use the cache
>>> allocating policy of default cache allocation, which is file
>>> /sys/fs/resctrl/schemata.
>>>
>>> If valid cache entries are provided in cachetune, then an allocation
>>> will
>>> be created for the threads of vcpu listed in <cachetune> 'vcpus'
>>> attribute. Supposing the allocation is the directory /sys/fs/resctrl/p0,
>>> then the cache resource limitation was applied on these threads.
>>>
>>> For monitor, it does not care if vcpu threads are allowed or not
>>> alloowed to
>>> access a limit amount of cache-lines. Monitor only reports the amount of
>>> cache has been accesses.
>>>
>>>> If you monitor
>>>> multiple vcpus within a cachetune then you get what? (?An
>>>> aggregation of all?).
>>> Yes.
>>> supposing you have this vcpus setting for <cachetune>
>>>      <cachetune vcpus='0-4,8' ..../>
>>>
>>> and you choose to monitor the cache usage for vcpu 0,3,8, then you
>>> create
>>> following monitor entry inside the cachetune entry, with the output of
>>> monitor, you will get an aggregative cache occupancy information for
>>> threads
>>> of vcpu 0,3,8.
>>>
>>>      <cachetune vcpus='0-4,8'/>
>>>        <monitor level='3' vcpus='0,3,8'/>
>>>      </cachetune>
>>>
>>>> This whole default and specific description doesn't make sense.
>>> Sorry for make you confused, I'll try to refine the descriptions.
>>>
>> In this last case if you also had
>>
>>     <monitor level='3', vcpus='4'/>
>>     <monitor level='3', vcpus='0-4,8'/>
>>
>> then I'd expect that the values output in "0-4,8" to match those that I
>> could add by myself with "4" and "0-3,8".  True?
> 
> Yes.
> 

and this essentially solidifies the point I was making above.

>>
>> Is it apparent yet why I'm saying mentioning default just confuses
>> things?  If so, I'm not sure what else I can do to explain.
> 
> Agree with the conclusion that 'default xxx' is a confusing things.
> 
> But hope you understand that, a monitor has same vcpu list with the
> allocation is
> created along with the creation of allocation, no matter you defined a
> <monitor>
> in <cachetune> and has a same 'vcpus' setting with allocation in the XML
> configuration
> or not. This is the behavior of kernel resctrl fs.
> To get the cache utilization information for whole allocation, enable
> this system created
> monitor is most economic way in terms of saving RMID.
> 

Sure, one cannot have too many monitors because there are limitations.


[...]

>>> I forget to free it. Will be added.
>>>
>> Again, Coverity
> 
> Thank you again. Hope someday I can hold the power of Coverity ...
> 

It's nice to have, but it has it's own issues. Getting to know what's a
real issue and some false positive takes a while. I'm sure there's other
code analyzers out there.

[...]

> As stated in prior paragraph. Will remove 'default  monitor' and
> 'default allocation' and
> make cleaning for code and comments.
> 
> Do I miss anything?
> 

I hope not, it's time consuming to read/comprehend everything. I see the
need to post more because it doesn't necessarily make sense without
understanding the future, but long series mean long reviews and long
reviews mean more questions and more questions mean deeper responses in
the mail list.  In the long run I hope we get something acceptable to be
used by/for libvirt to describe/summarize the depths that is CAT. I
think we're getting closer that's for sure.

> BTW, I find the 'virsh domstats --cpu-total' output for monitors,
> introduced in patch18,
> is not good enough.
> current is
> "
> Domain: 'ubuntu16.04-base'
>   cpu.cache.monitor.count=2
>   cpu.cache.0.name=vcpus_0
>   cpu.cache.0.vcpus=0
>   cpu.cache.0.bank.count=2
>   cpu.cache.0.bank.0.id=0
>   cpu.cache.0.bank.0.bytes=9371648
>   cpu.cache.0.bank.1.id=1
>   cpu.cache.0.bank.1.bytes=1081344
>   cpu.cache.1.name=vcpus_3
>   cpu.cache.1.vcpus=3
>   cpu.cache.1.bank.count=2
>   cpu.cache.1.bank.0.id=0
>   cpu.cache.1.bank.0.bytes=630784
>   cpu.cache.1.bank.1.id=1
>   cpu.cache.1.bank.1.bytes=10452992
> "
> I may change the output to following by adding 'monitor' for each line:
> 
> Domain: 'ubuntu16.04-base'
>   cpu.cache.monitor.count=2
>   cpu.cache.monitor.0.name=vcpus_0
>   cpu.cache.monitor.0.vcpus=0
>   cpu.cache.monitor.0.bank.count=2
>   cpu.cache.monitor.0.bank.0.id=0
>   cpu.cache.monitor.0.bank.0.bytes=9371648
>   cpu.cache.monitor.0.bank.1.id=1
>   cpu.cache.monitor.0.bank.1.bytes=1081344
>   cpu.cache.monitor.1.name=vcpus_3
>   cpu.cache.monitor.1.vcpus=3
>   cpu.cache.monitor.1.bank.count=2
>   cpu.cache.monitor.1.bank.0.id=0
>   cpu.cache.monitor.1.bank.0.bytes=630784
>   cpu.cache.monitor.1.bank.1.id=1
>   cpu.cache.monitor.1.bank.1.bytes=10452992
> 
> Please take this change in consideration when you make review for patch 18.

Some day we'll get there.

John

BTW: Next week is KVM Forum - so that usually means less activity on
this list and less time for reviews.

[...]




More information about the libvir-list mailing list