[libvirt] Yet another RFC for CAT

Mon Sep 4 15:57:31 UTC 2017

On Mon, Sep 04, 2017 at 04:14:00PM +0200, Martin Kletzander wrote:
> * The current design (finally something libvirt-related, right?)
> 
> The discussion ended with a conclusion of the following (with my best
> knowledge, there were so many discussions about so many things that I
> would spend too much time looking up all of them):
> 
> - Users should not need to specify bit masks, such complexity should be
>   abstracted.  We'll use sizes (e.g. 4MB)
> 
> - Multiple vCPUs might need to share the same allocation.
> 
> - Exclusivity of allocations is to be assumed, that is only unoccupied
>   cache should be used for new allocations.
> 
> The last point seems trivial but it's actually very specific condition
> that, if removed, can cause several problems.  If it's hard to grasp the
> last point together with the second one, you're on the right track.  If
> not, then I'll try to make a point for why the last point should be
> removed in 3... 2... 1...
> 
> * Design flaws
> 
> 1) Users cannot specify any allocation that would share only part with
>    some other allocation of the domain or the default group.
> 
> 2) It was not specified what to do with the default resource group.
>    There might be several ways to approach this, with varying pros and
>    cons:
> 
>     a) Treat it as any other group.  That is any bit set for this group
>        will be excluded from usable bits when creating new allocation
>        for a domain.
> 
>         - Very predictable behaviour
> 
>         - You will not be able to allocate any amount of cache without
>           previous setting for the default group as that will have all
>           the bits set which will make all the cache unusable
> 
>     b) Automatically remove the appropriate amount of bits that are
>        needed for new domains.
> 
>         - No need to do any change to the system settings in order to
>           use this new feature
> 
>         - We would have to change system settings, which is generally
>           frowned upon when done "automatically" as a side effect of
>           starting a domain, especially for such scarce resource as
>           cache
> 
>         - The change to system settings would not be entirely
>           predictable
> 
>     c) Act like it doesn't exist and don't remove its allocations from
>        consideration
> 
>         - Doesn't really make sense as system processes might be
>           trashing the cache as any VM, moreover when all VM processes
>           without allocations will be based in the default group as
>           well
> 
> 3) There is no way for users to know what the particular settings are
>    for any running domain.
> 
> The first point was deemed a corner case.  Fair enough on its own, but
> considering point 2 and its solutions, it is rather difficult for me to
> justify it.  Also, let's say you have domain with 4 vCPUs out of which
> you know 1 might be trashing the cache, but you don't want to restrict
> it completely, but others will utilize it very nicely.  Sensible
> allocations for such domain's vCPUs might be:
> 
>  vCPU  0:   000f
>  vCPUs 1-3: ffff
> 
> as you want vCPUs 1-3 to utilize even the part of cache that might get
> trashed by vCPU 0.  Or they might share some data (especially
> guest-memory-related).
> 
> The case above is not possible to set up with only per-vcpu(s) scalar
> setting.  And there are more as you might imagine now.  For example how
> do we behave with iothreads and emulator threads?

Ok, I see what you're getting at.  I've actually forgotten what
our current design looks like though :-)

What level of granularity were we allowing within a guest ?
All vCPUs use separate cache regions from each other, or all
vCPUs use a share cached region, but separate from other guests,
or a mix ?

> * My suggestion:
> 
> - Provide an API for querying and changing the allocation of the
>   default resource group.  This would be similar to setting and
>   querying hugepage allocations (see virsh's freepages/allocpages
>   commands).

Reasonable

> - Let users specify the starting position in addition to the size, i.e.
>   not only specifying "size", but also "from".  If "from" is not
>   specified, the whole allocation must be exclusive.  If "from" is
>   specified it will be set without checking for collisions.  The latter
>   needs them to query the system or know what settings are applied
>   (this should be the case all the time), but is better then adding
>   non-specific and/or meaningless exclusivity settings (how do you
>   specify part-exclusivity of the cache as in the example above)

I'm concerned about the idea of not checking 'from' for collisions,
if there's allowed a mix of guests with & within 'from'.

eg consider

 * Initially 24 MB of cache is free, starting at 8MB
 * run guest A   from=8M, size=8M
 * run guest B   size=8M
     => libvirt sets from=16M, so doesn't clash with A
 * stop guest A
 * run guest C   size=8M
     => libvirt sets from=8M, so doesn't clash with B
 * restart guest A
     => now clashes with guest C, whereas if you had
        left guest A running, then C would have
	got from=24MB and avoided clash

IOW, if we're to allow users to set 'from', I think we need to
have an explicit flag to indicate whether this is an exclusive
or shared allocation. That way guest A would set 'exclusive',
and so at least see an error when it got a clash with guest
C in the example.

> - After starting a domain, fill in any missing information about the
>   allocation (I'm generalizing here, but fro now it would only be the
>   optional "from" attribute)
> 
> - Add settings not only for vCPUs, but also for other threads as we do
>   with pinning, schedulers, etc.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|