[libvirt] [V4] RFC for support cache tune(CAT) in libvirt

Daniel P. Berrange berrange at redhat.com
Fri Jan 13 09:41:22 UTC 2017


On Fri, Jan 13, 2017 at 09:38:44AM +0800, 乔立勇(Eli Qiao) wrote:
> 
> virsh capabilities
> 
> 
> 
>     <cache>
> 
>       <bank id='0, 'type="l3" size="56320" units="KiB" cpus="0,1,2,6,7,8"/>
>  <--------------------- level 3 cache is per socket, so group them by
> socket id
> 
>            <control unit="KiB" min="2816"/>
> 
>       <bank id='1', type="l3" size="56320" units="KiB"
> cpus="3,4,5,9,10,11"/>
> 
>       <bank id='2' type="l2" size="256" units="KiB" cpus="0"/>
> 
>       <bank id='3' type="l2" size="256" units="KiB" cpus="1"/>
> 
>       <bank id='4' type="l2" size="256" units="KiB" cpus="2"/>
> 
>       <bank id='5' type="l2" size="256" units="KiB" cpus="3"/>
> 
>       <bank id='6' type="l2" size="256" units="KiB" cpus="4"/>
> 
> ...
> 
>      <cache>
> 
> 
> 
> Opens
> 
> 1.      how about add socket id to bank for bank type = l3 ?

This isn't needed - with the 'cpu' IDs here, the application can
look at the topology info in the capabilities to find out what
socket the logical CPU is part of.

> 2.      do we really want to expose l2/l3 cache for now , they are per core
> resource and linux kernel don't support l2 yet (depend no hardware)?

We dont't need to report all levels of cache - we just need the XML
schema to allow it by design.

> 3.      if enable CDP in resctrl, for bank type=l3 , it will be split to
> l3data l3code, should expose this ability.
> 
>       <bank type="l3" size="56320" units="KiB" cpus="0,1,2,6,7,8"/>
>  <--------------------- level 3 cache is per socket, so group them by
> socket id
> 
>            <control unit="KiB" min="2816" cdp="enabled"/>

'cdp' is intel specific terminology. We need to use some more generic
description. Perhaps we want this when CDP is enabled:

            <control unit="KiB" min="2816" scope="data"/>
            <control unit="KiB" min="2816" scope="code"/>

and when its disabled just

            <control unit="KiB" min="2816" scope="both"/>

If we have this scope option, then we'll need it when reporting too...

> ## Provide a new API to get the avail cache on each bank, such as the
> output are:
> 
> 
> 
>    id=0
> 
>    type=l3

...eg

  scope=data

>    avail=56320


> 
>    total = ?? <--------- do we need this?

That info is static and available from capabilities, so we
don't need to repeat it here IMHO.

> 
> 
> 
>    id=1
> 
>    type=l3
> 
>    avail=56320
> 
> 
> 
>    id=3
> 
>    type=l2
> 
>    avail=256
> 
> 
> 
> Opens:
> 
> ·         Don't expose the avail cache information if the host can not do
> the allocation of that type cache(eg, for l2 currently) ?

This api should only report info about cache banks that support allocation/.

> ·         We can not make all of the cache , the reservation amount is the
> min_cbm_len (=1) * min_unit .

If there is some minimum amount which is reserved and cannot be
allocated, we should report that in the capabilities XML too.
eg

            <control unit="KiB" min="2816" reserved="5632" scope="both"/>

> ·         do we need to expose total?

No, that's available in capabilities XML

> ##  enable CAT for a domain
> 
> 
> 
> 1 Domain XML changes
> 
> 
> 
>    <cputune>
> 
>        <cache id="1" host_id="0" type="l3" size="5632" unit="KiB"/>
> 
>        <cache id="2" host_id="1" type="l3" size="5632" unit="KiB"/>
> 
> 
> 
>        <cpu_cache vcpus="0-3" id="1"/>
> 
>        <cpu_cache vcpus="4-7" id="2"/>
> 
>        <iothread_cache iothreads="0-1" id="1"/>
> 
>        <emulator_cache id="2"/>
> 
>    </cputune>
> 
> 2. Extend cputune command ?

Do we need the ability to change cache allocation for a running
guest ?  If so, then we need to extend cputune command, if not
we can ignore it.

> Opens:
> 
> 
> 
> 1. Do we accept to extend existed API ? or using new API/virsh?
> 
> 2. How to calculate cache size -> CBM bit?
> 
> 
> 
> eg:
> 
> 5632/ 2816 = 2 bits
> 
> 5733/ 2816 = 2 bits or 3 bits?

In the capabilities XML we report the min unit granularity:

            <control unit="KiB" min="2816" scope="both"/>

So in the XML, we should report an error if the requested
size is *not* a multiple of the reported unit granulirty

> ## Restriction for using cache tune on multiple sockets' host.
> 
> 
> 
> The l3 cache is per socket resource, kernel need to know about what's
> affinity looks like, so for a VM which running on a multiple socket's host,
> it should have NUMA setting or vcpuset pin setting. Or cache tune will fail.

Yep, we need to report an error if cache allocation is requested
without CPU pinning being requested for the VM.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|




More information about the libvir-list mailing list