[Libvir] CPU pinning of domains at creation time

Thu Oct 11 15:03:59 UTC 2007

On Thu, Oct 11, 2007 at 09:00:14AM -0400, Daniel Veillard wrote:
>   There are a few things I gathered on this issue. This affects 
> NUMA setups, where basically if a domain must be placed on a given cell
> it is not good to let the hypervisor place it first with its own heuristics
> and then later migrate it to a different set of CPU, but better to 
> instruct the hypervisor to start said domain on the given set.
>    - For Xen it is possible to instruct the hypervisor by passing 
>      (cpus '2,3') in the SExpr where the argument is a list of
>      the physical processors allowed
>    - For KVM I think the standard way would be to select the 
>      cpuset using sched_setaffinity() between the fork of the 
>      current process and the exec of the qemu process

Yep, as with Xen, this will only let you specify coarse mapping at time
of creating the VM. ie you can say 'this VM is allow to run on pCPUs 1 & 3',
but you can't say 'this VM's vCPU 1 is allow on pCPU 1 and vCPU 2 is allowed
on pCPU 3'. This is because KVM has one thread per vCPU, and at the time
of creating the VM the, vCPU threads don't yet exist & so there's nothing
to pin. Not a huge problem really, just something we should document.

Basically we are setting VM affinity at time of creationg. VCPU affinity
can be set once a VM is running to fine-tune.

>    - there is no need (from a NUMA perspective) to do fine grained
>      allocation at that point, as long as the domain can be restricted
>      to a given cell at startup, then if needed virDomainPinVcpu() can be
>      used later to do more precise pinning in order to try to optimize
>      placement

Yep, from a NUMA pov we're only concerned with the VM memory allocation,
so it is sufficient to consider VM affinity and not vCPU affinity.

>    - to be able to instruct the hypervisor at creation time adding the
>      information in the domain XML description looks the more natural way
>      (another option would be to force to use virDomainDefineXML, add a
>       call using the resulting virDomainPtr to define the set, and 
>       then virDomainCreate would be used to do the actual start)
>      + the good point of having this embedded in the XML is that
>        we still have all informations about the domain settings in
>        the XML, if we want to restart it later
>      + the bad point is that we need to fetch and carry this extra
>        information when doing XML dumps to not loose it for example
>        when manipulating the domain to add or remove devices
>    - extracting a cpuset can still be an heavy operation, for example
>      if using xend on need one RPC per vcpu in the domain, the cpuset
>      being constructed by OR'ing logically all cpumaps used by the 
>      vcpus of the domain (though in most case this will be the full
>      map after the first CPU and can be stopped immediately)

Fetching /xend/domain/%s?op=vcpuinfo  lets us get info for all vCPUs
in a domain in a single RPC doesn't it ?  In any case we should first
just try the hypercall - in all normal scenarios that'll work fine.

>    - for the mapping at the XML level I suggest to use a simple extension
>      to the <vcpu>n</vcpu> and extend it to
>      <vcpu cpuset='2,3'>n</vcpu>
>      with a limited syntax which is just the comma separated list of
>      allowed CPU numbers (if the code actually detects such a cpuset is
>      in effect i.e. in general this won't be added).

It doesn't make sense to me to include the info at a vCPU level in the
XML, since our granularity at time of creation is only at a VM level.
When dumping XML, the VM's affinity is basically the union of the affinity
of all of its pCPUs.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|