[libvirt] [RFC] Data in the <topology> element in the capabilities XML

Thu Jan 17 09:36:56 UTC 2013

On Thu, Jan 17, 2013 at 12:12:35AM +0100, Peter Krempa wrote:
> On 01/16/13 21:24, Daniel P. Berrange wrote:
> >On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote:
> >>On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:
> >>>On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:
> >>>>----- Original Message -----
> >>>>From: Daniel P. Berrange <berrange at redhat.com>
> >>>>To: Peter Krempa <pkrempa at redhat.com>
> >>>>Cc: Jiri Denemark <jdenemar at redhat.com>, Amador Pahim <apahim at redhat.com>, libvirt-list at redhat.com, dougsland at redhat.com
> >>>>Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
> >>>>Subject: Re: [libvirt] [RFC] Data in the <topology> element in the	capabilities XML
> >>>>
> >>>>On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
> >>>>>On 01/16/13 19:11, Daniel P. Berrange wrote:
> >>>>>>On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
> >>>>>>>Hi everybody,
> >>>>>>>
> >>>>>>>a while ago there was a discussion about changing the data that is
> >>>>>>>returned in the <topology> sub-element:
> >>>>>>>
> >>>>>>><capabilities>
> >>>>>>><host>
> >>>>>>><cpu>
> >>>>>>><arch>x86_64</arch>
> >>>>>>><model>SandyBridge</model>
> >>>>>>><vendor>Intel</vendor>
> >>>>>>><topology sockets='1' cores='2' threads='2'/>
> >>>>>>>
> >>>>>>>
> >>>>>>>The data provided here is as of today taken from the nodeinfo
> >>>>>>>detection code and thus is really wrong when the fallback mechanisms
> >>>>>>>are used.
> >>>>>>>
> >>>>>>>To get a useful count, the user has to multiply the data by the
> >>>>>>>number of NUMA nodes in the host. With the fallback detection code
> >>>>>>>used for nodeinfo the NUMA node count used to get the CPU count
> >>>>>>>should be 1 instead of the actual number.
> >>>>>>>
> >>>>>>>As Jiri proposed, I think we should change this output to separate
> >>>>>>>detection code that will not take into account NUMA nodes for this
> >>>>>>>output and will rather provide data as the "lspci" command does.
> >>>>>>>
> >>>>>>>This change will make the data provided by the element standalone
> >>>>>>>and also usable in guest XMLs to mirror host's topology.
> >>>>>>Well there are 2 parts which need to be considered here. What do we report
> >>>>>>in the host capabilities, and how do you configure guest XML.
> >>>>>>
> >>>>>> From a historical compatibility pov I don't think we should be changing
> >>>>>>the host capabilities at all. Simply document that 'sockets' is treated
> >>>>>>as sockets-per-node everywhere, and that it is wrong in the case of
> >>>>>>machines where an socket can internally have multiple NUMA nodes.
> >>>>>I'm too somewhat concerned about changing this output due to
> >>>>>historic reasons.
> >>>>>>Apps should be using the separate NUMA <topology> data in the capabilities
> >>>>>>instead of the CPU <topology> data, to get accurate CPU counts.
> >>>>> From the NUMA <topology> the management apps can't tell if the CPU
> >>>>>is a core or a thread. For example oVirt/VDSM bases the decisions on
> >>>>>this information.
> >>>>Then, we should add information to the NUMA topology XML to indicate
> >>>>which of the child <cpu> elements are sibling cores or threads.
> >>>>
> >>>>Perhaps add a 'socket_id' + 'core_id' attribute to every <cpu>.
> >>>
> >>>>In this case, we will also need to add the thread siblings and
> >>>>perhaps even core siblings information to allow reliable detection.
> >>>The combination fo core_id/socket_id lets you determine that. If two
> >>>core have the same socket_id then they are cores or threads within the
> >>>same socket. If two <cpu> have the same socket_id & core_id then they
> >>>are threads within the same core.
> >>
> >>Not true to AMD Magny-Cours 6100 series, where different cores can
> >>share the same physical_id and core_id. And they are not threads.
> >>This processors has two numa nodes inside the same "package" (aka
> >>socket) and they shares the same core ID set. Annoying.
> >
> >I don't believe there's a problem with that. This example XML
> >shows a machine with 4 NUMA nodes, 2 sockets each containing
> >2 cores, and 2 threads, giving 16 logical CPUs
> >
> >     <topology>
> >       <cells num='4'>
> >         <cell id='0'>
> >           <cpus num='4'>
> >             <cpu id='0' socket_id='0' core_id='0'/>
> >             <cpu id='1' socket_id='0' core_id='0'/>
> >             <cpu id='2' socket_id='0' core_id='1'/>
> >             <cpu id='3' socket_id='0' core_id='1'/>
> >           </cpus>
> >         </cell>
> >         <cell id='1'>
> >           <cpus num='2'>
> >             <cpu id='4' socket_id='0' core_id='0'/>
> >             <cpu id='5' socket_id='0' core_id='0'/>
> >             <cpu id='6' socket_id='0' core_id='1'/>
> >             <cpu id='7' socket_id='0' core_id='1'/>
> >           </cpus>
> >         </cell>
> >         <cell id='2'>
> >           <cpus num='2'>
> >             <cpu id='8'  socket_id='1' core_id='0'/>
> >             <cpu id='9'  socket_id='1' core_id='0'/>
> >             <cpu id='10' socket_id='1' core_id='1'/>
> >             <cpu id='11' socket_id='1' core_id='1'/>
> >           </cpus>
> >         </cell>
> >         <cell id='3'>
> >           <cpus num='2'>
> >             <cpu id='12' socket_id='1' core_id='0'/>
> >             <cpu id='13' socket_id='1' core_id='0'/>
> >             <cpu id='14' socket_id='1' core_id='1'/>
> >             <cpu id='15' socket_id='1' core_id='1'/>
> >           </cpus>
> >         </cell>
> >       </cells>
> >     </topology>
> >
> >I believe there's enough info there to determine all the co-location
> >aspects of all the sockets/core/threads involved.
> 
> Well not for all machines in the wild out there. This is a very
> similar approach that libvirt uses now to detect the topology and it
> is not enough to detect threads on AMD Bulldozer as the cpus
> corresponding to the threads have different core_id's (they are also
> considered as cores from the perspective of the kernel). This is
> unfortunate for the virtualization management tools as oVirt that
> still consider the AMD Bulldozer "module" as a 1 core with two
> threads, even if it registers as two cores.
> 
> For AMD Bulldozer to be detected correctly, we would need to expose
> the thread_id's along with thread siblings information to determine
> the two threads belonging together.

NB, the socket_id / core_id values in the above XML are *not* intended
to be anyway related to similarly named values in /proc/cpuinfo. They
are values libvirt assigns to show the topology accurately.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|