[Libvir] CPU pinning of domains at creation time
ryanh at us.ibm.com
Thu Oct 11 15:45:44 UTC 2007
* Daniel Veillard <veillard at redhat.com> [2007-10-11 08:01]:
> There are a few things I gathered on this issue. This affects
> NUMA setups, where basically if a domain must be placed on a given cell
> it is not good to let the hypervisor place it first with its own heuristics
> and then later migrate it to a different set of CPU, but better to
> instruct the hypervisor to start said domain on the given set.
> - For Xen it is possible to instruct the hypervisor by passing
> (cpus '2,3') in the SExpr where the argument is a list of
> the physical processors allowed
A bit more detail here just FYI:
Xen takes the cpu list and converts that into an affinity bitmap that is
then applied to each vcpu allocated to the guest.
> - For KVM I think the standard way would be to select the
> cpuset using sched_setaffinity() between the fork of the
> current process and the exec of the qemu process
> - there is no need (from a NUMA perspective) to do fine grained
> allocation at that point, as long as the domain can be restricted
> to a given cell at startup, then if needed virDomainPinVcpu() can be
> used later to do more precise pinning in order to try to optimize
kvm-46 added user-space allocated memory which means that we can use
libnuma/numactl to set the approriate node.
> - to be able to instruct the hypervisor at creation time adding the
> information in the domain XML description looks the more natural way
> (another option would be to force to use virDomainDefineXML, add a
> call using the resulting virDomainPtr to define the set, and
> then virDomainCreate would be used to do the actual start)
> + the good point of having this embedded in the XML is that
> we still have all informations about the domain settings in
> the XML, if we want to restart it later
> + the bad point is that we need to fetch and carry this extra
> information when doing XML dumps to not loose it for example
> when manipulating the domain to add or remove devices
> - extracting a cpuset can still be an heavy operation, for example
> if using xend on need one RPC per vcpu in the domain, the cpuset
> being constructed by OR'ing logically all cpumaps used by the
> vcpus of the domain (though in most case this will be the full
> map after the first CPU and can be stopped immediately)
Yeah, that might be a decent patch to xend - build up an array of
affinity masks for each vcpu.
> - for the mapping at the XML level I suggest to use a simple extension
> to the <vcpu>n</vcpu> and extend it to
> <vcpu cpuset='2,3'>n</vcpu>
> with a limited syntax which is just the comma separated list of
> allowed CPU numbers (if the code actually detects such a cpuset is
> in effect i.e. in general this won't be added).
I think we should support the same cpuset notation that Xen supports,
which means including ranges (1-4) and negation (^1). These two
features make describing large ranges much more compact.
> Internally implementing this should not be too hard, I would probably refactor
> some of the existing parsing code, provide functions to get the cpuset and
> the number of physical processors.
> Does this sounds okay ?
Yeah, I think this covers everything we'd need.
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh at us.ibm.com
More information about the libvir-list