[libvirt] RFC for support Intel RDT/CAT in libvirt

Martin Kletzander mkletzan at redhat.com
Mon Jan 9 12:52:38 UTC 2017


On Wed, Dec 21, 2016 at 09:51:44AM +0000, Qiao, Liyong wrote:
> Hi folks
>
>I would like to start a discussion about how to support a new cpu feature in
>libvirt. CAT support is not fully merged into linux kernel yet, the target
>release is 4.10, and all patches has been merged into linux tip branch. So
>there won’t be interface/design changes.
>
>## Background
>
>Intel RDT is a toolkit to do resource Qos for cpu such as llc(l3) cache, memory
>bandwidth usage, these fine granularity resource control features are very
>useful in a cloud environment which will run logs of noisy instances.
>Currently, Libvirt has supported CAT/MBMT/MBML already, they are only for
>resource usage monitor, propose to supporting CAT to control VM’s l3 cache
>quota.
>
>## CAT interface in kernel
>
>In kernel, a new resource interface has been introduced under /sys/fs/resctrl,
>it’s used for resource control, for more information, refer
>Intel_rdt_ui [ https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/tree/Documentation/x86/intel_rdt_ui.txt?h=x86/cache ]
>
>Kernel requires to provide schemata for l3 cache before add a task to a new
>partition, these interface is too much detail for a virtual machine user, so
>propose to let Libvirt manage schemata on the host.
>

I don't quite understand this paragraph.

>## What will libvirt do?
>
>### Questions:
>
>To enable CAT support in libvirt, we need to think about follow questions
>
>  1.  Only set CAT when an VM has CPU pin, which is to say, l3 cache is per cpu
>      socket resources. On a host which has 2 cpu sockets, each cpu socket has it
>      own cache, and can not be shared..

It makes sense to only do it when vCPU is pinned.  It can happen that someone
will want to pin it on multiple threads that are on different sockets, and at
that point it's their fault.

>  2.  What the cache allocation policy should be used, this will be looks like:
>      a.  VM has it’s own dedicated l3 cache and also can share other l3 cache.
>      b.  VM can only use the caches which allocated to it.

I though we need to provide option for both of these ^^.  However the difference
is a setting for the default top-most hierarchical points, so, actually, the
admin needs to make that decision and set it outside of libvirt.

If VM should use only .25 of the cache and *can* use the rest, then (in cgroups
terms) / should have 0fff and /domain should be ffff.

If the VM should use just a .25, but the rest of the system can access it as
well, then / = ffff and /domain = f000.

If they are supposed to be exclusive, then / should be either not set or 0fff
and /domain = f000.  In all cases, libvirt should not touch /, just /domain.

That's as far as I understand it.

>      c.  Has some pre-defined policies and priority for a VM Like COB [1]
>
>  1.  Should reserve some l3 cache for host’s system usage (related to 2)
>  2.  What’s the unit for l3 cache allocation? (related to 2)
>

I think it should be size (as opposed to percentage).  We should add a way to
check for the size of the caches.

>### Propose Changes
>
>XML domain user interface changes:
>
>Option 1: explicit specify cache allocation for a VM
>

I vote for this option as it's more introspectable and predictable from my point
of view.

>1 work with numa node
>
>Some cloud orchestration software use numa + vcpu pin together, so we can
>enable cat supporting with numa infra.
>
>Expose how many l3 cache a VM want to reserved and we require that the l3 cache
>should be bind on some specify cpu socket, just like what we did for numa node.
>
>This is an domain xml example which is generated by OpenStack Nova for allocate
>llc(l3 cache) when booting a new VM
>
><domain>
>> <cputune>
>   <vcpupin vcpu='0' cpuset='19'/>
>   <vcpupin vcpu='1' cpuset='63'/>
>   <vcpupin vcpu='2' cpuset='83'/>
>   <vcpupin vcpu='3' cpuset='39'/>
>   <vcpupin vcpu='4' cpuset='40'/>
>   <vcpupin vcpu='5' cpuset='84'/>
>   <emulatorpin cpuset='19,39-40,63,83-84'/>
> </cputune>

This part ^^ describes the settings for the domain in the host.

>...
> <cpu mode='host-model'>
>   <model fallback='allow'/>
>   <topology sockets='3' cores='1' threads='2'/>
>   <numa>
>     <cell id='0' cpus='0-1' memory='2097152' l3cache='1408' unit='KiB'/>
>     <cell id='1' cpus='2-5' memory='4194304' l3cache='5632' unit='KiB'/>
>   </numa>
> </cpu>

This part ^^ is describing how the domain will look like from the guest's point
of view.  It looks like the domain has 1408 KiB of l3 cache.  It needs to be
somewhere else.  Like in the top part for example.

Since at some point it can be something else than L3, I would choose a slightly
different schema to allow for readable updates.  As to what place it should be
defined in (cputune/memtune/cachetune), I'm afraid of voting because it's
already so messy that I won't like my choise few minutes after making it.  It
needs to be done per-thread, thought.

>...
></domain>
>
>Refer to [http://libvirt.org/formatdomain.html#elementsCPUTuning]
>
>So finally we can calculate on which CPU socket(cell) we need to allocate how
>may l3cache for a VM.
>
>2. work with vcpu pin
>
>Forget numa part, CAT setting should have relationship with cpu core setting,
>we can apply CAT policy if VM has set cpu pin setting (only VM won’t be
>schedule to another CPU sockets)
>
>Cache allocation on which CPU socket can be calculate as just as 1.
>
>We may need to enable both 1 and 2.
>

No, please no setting in multiple locations with overlapping meanings.

I started getting lost in the rest of the mail, sorry for skipping that.

># Support CAT in libvirt itself or leverage other software
>
>COB
>
>COB is Intel Cache Orchestrator Balancer (COB). please refer
>http://clrgitlab.intel.com/rdt/cob/tree/master
>
>COB supports some pre-defined policies, it will monitor cpu/cache/cache missing
>and do cache allocation based on policy using.
>
>If COB support monitor some specified process (VM process) and accept priority
>defined, it will be good to reuse.
>
>At last the question came out:
>  *   Support a fine-grained llc cache control , let user specify cache allocation
>  *   Support pre-defined policies and user specify llc allocation priority.
>

I'm for the first option.

We eventually need to do it with another tool, because otherwise we need to
support the host settings and poeple won't want to install libvirt just to
manage CAT allocations.

The links for the COB you provided don't work, but there's a tiny little
helper [1] that manages cache allocations.

[1] https://lkml.org/lkml/2017/1/3/171

>Reference
>
>[1] COB http://clrgitlab.intel.com/rdt/cob/tree/master
>[2] CAT intro: https://software.intel.com/en-us/articles/software-enabling-for-cache-allocation-technology
>[3] kernel Intel_rdt_ui [ https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/tree/Documentation/x86/intel_rdt_ui.txt?h=x86/cache ]
>
>
>
>
>Best Regards
>
>Eli Qiao(乔立勇)OpenStack Core team OTC Intel.
>--
>

>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170109/636408b6/attachment-0001.sig>


More information about the libvir-list mailing list