[libvirt] "[V3] RFC for support cache tune in libvirt"

Marcelo Tosatti mtosatti at redhat.com
Thu Jan 12 10:48:01 UTC 2017


On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote:
> hi, It's really good to have you get involved to support CAT in
> libvirt/OpenStack.
> replied inlines.
> 
> 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti <mtosatti at redhat.com>:
> 
> >
> > Hi,
> >
> > Comments/questions related to:
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> >
> > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> >
> > How does allocation of code/data look like?
> >
> 
> My plan's expose new options:
> 
> virsh cachetune kvm02 --l3data.count 2 --l3code.count 2
> 
> Please notes, you can use only l3 or l3data/l3code(if enable cdp while
> mount resctrl fs)

Fine. However, you should be able to emulate a type=both reservation
(non cdp) by writing a schemata file with the same CBM bits:

		L3code:0=0x000ff;1=0x000ff
		L3data:0=0x000ff;1=0x000ff

(*)

I don't see how this interface enables that possibility.

I suppose it would be easier for mgmt software to have it
done automatically: 

virsh cachetune kvm02 --l3 size_in_kbytes.

Would create the reservations as (*) in resctrlfs, in 
case host is CDP enabled.

(also please use kbytes, or give a reason to not use
kbytes).

Note: exposing the unit size is fine as mgmt software might 
decide a placement of VMs which reduces the amount of L3
cache reservation rounding (although i doubt anyone is going
to care about that in practice).

> > 2) 'nodecachestats' command:
> >
> >         3. Add new virsh command 'nodecachestats':
> >         This API is to expose vary cache resouce left on each hardware (cpu
> >         socket).
> >         It will be formated as:
> >         <resource_type>.<resource_id>: left size KiB
> >
> > Does this take into account that only contiguous regions of cbm masks
> > can be used for allocations?
> >
> >
> yes, it is the contiguous regions cbm or in another word it's the default
> cbm represent's cache value.
> 
> resctrl doesn't allow set non-contiguous cbm (which is restricted by
> hardware)

OK.

> 
> 
> > Also, it should return the amount of free cache on each cacheid.
> >
> 
> yes, it is.  resource_id == cacheid

OK.
> >
> > 3) The interface should support different sizes for different
> > cache-ids. See the KVM-RT use case at
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
> > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".
> >
> 
> I don't think it's good to let user specify cache-ids while doing cache
> allocation.

This is necessary for our usecase.

> the cache ids used should rely on what cpu affinity the vm are setting.

The cache ids configuration should match the cpu affinity configuration.

> eg.
> 
> 1. for those host who has only one cache id(one socket host), we don't need
> to set cache id

Right.

> 2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping
> (define cpuset for a VM), then we (libvirt) need to compute how much cache
> on which cache id should set.
> Which is to say, user should set the cpu affinity before cache allocation.
> 
> I know that the most cases of using CAT is for NFV. As far as I know, NFV
> is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to
> worry about on which cache id we set the cache size.
> 
> So, just let user specify cache size(here my propose is cache unit account)
> and let libvirt detect on which cache id set how many cache.

Ok fine, its OK to not expose this to the user but calculate it
internally in libvirt. As long as you recompute the schematas whenever
cpu affinity changes. But using different cache-id's in schemata is
necessary for our usecase.

> >
> > 4) Usefulness of exposing minimum unit size.
> >
> > Rather than specify unit sizes (which forces the user
> > to convert every time the command is executed), why not specify
> > in kbytes and round up?
> >
> 
> I accept this, I propose to expose minimum unit size because of I'd like to
> let using specify the unit count(which as you say this is not good),
> 
> as you know the minimum unit size is decided by hard ware
> eg
> on a host, we have 56320 KiB cache, and the max cbm length is 20 (fffff),
> so the minimum cache should be 56320/20 = 2816 KiB
> 
> if we allow use specify cache size instead of cache unit count, user may
> set the cache as 2817 KiB, and we should round up it to 2816 * 2,  there
> will be 2815 KiB wasted.

Yes but the user can know the wasted amount if necessary, if you expose
the cache unit size (again, i doubt this will happen in practice because
the granularity of the CBM bits is small compared to the cache size).

The problem with the cache unit count specification is that it does not
work across different hosts: if a user saves the "cache unit count"
value manually in a XML file, then uses that XML file on a different
host, the reservation on the new host can become smaller than desired,
which violates expectations.

> Anyway , I am open to using KiB size and let libvirt to calculate the cbm
> bits, am thinking if we need to tell the actual_cache_size is up to 5632
> KiB even they wants 2816 KiB cache.

Another thing i did on resctrltool is to have a safety margin for
allocations: do not let the user allocate all of the cache (that is
leave 0 bytes for the default group). I used one cache unit as the
minimum:

        if ret == ERR_LOW_SPACE:
            print "Warning: free space on default mask is <= %d\n" %
(kbytes_per_bit_of_cbm)
            print "use --force to force"


> >
> >       <resctrl name='L3' unit='KiB' cache_size='56320'
> > cache_unit='2816'/>
> >
> > As noted in item 1 of
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html,
> > "1) Convertion of kbytes (user specification) --> number of CBM bits
> > for host.",
> > the format where the size is stored is kbytes, so its awkward
> > to force users and OpenStack to perform the convertion themselves
> > (and zero benefits... nothing changes if you know the unit size).
> 
> 
> 
> Hmm.. as I can see libvirt is just an user space API, not sure if whether
> in libvirt we bypass some low level detail..






More information about the libvir-list mailing list