[libvirt] [Qemu-devel] Exposing and calculating CPU APIC IDs (was Re: [RFC 1/3] target-i386: moving registers of vmstate from cpu_exec_init() to x86_cpu_realizefn())

Thu Feb 13 10:00:57 UTC 2014

On Tue, 21 Jan 2014 11:10:30 +0100
Andreas Färber <afaerber at suse.de> wrote:

> Am 21.01.2014 10:51, schrieb Chen Fan:
> > On Tue, 2014-01-21 at 10:31 +0100, Igor Mammedov wrote:
> >> On Tue, 21 Jan 2014 15:12:45 +0800
> >> Chen Fan <chen.fan.fnst at cn.fujitsu.com> wrote:
> >>> On Mon, 2014-01-20 at 13:29 +0100, Igor Mammedov wrote:
> >>>> On Fri, 17 Jan 2014 17:13:55 -0200
> >>>> Eduardo Habkost <ehabkost at redhat.com> wrote:
> >>>>> On Wed, Jan 15, 2014 at 03:37:04PM +0100, Igor Mammedov wrote:
> >>>>>> I recall there were objections to it since APIC ID contains topology
> >>>>>> information and it's not trivial for user to get it right.
> >>>>>> The last idea that was discussed to fix it was not expose APIC ID to
> >>>>>> user but rather introduce QOM hierarchy like:
> >>>>>>   /machine/node/N/socket/X/core/Y/thread/Z
> >>>>>> and use it in user interface as a means to specify an arbitrary CPU
> >>>>>> and let QEMU calculate APIC ID based on this path.
> >>>>>>
> >>>>>> But nobody took on implementing it yet.
> >>>>>
> >>>>> We're taking so long to get a decent interface implemented, that part of
> >>>>> me is considering exposing the APIC ID directly like suggested before,
> >>>>> and requiring libvirt to calculate topology-aware APIC IDs[1] to
> >>>>> properly implement CPU hotplug (and possibly for other tasks).
> >>>> If you are speaking about 
> >>>> 'qemu will core dump with "-smp 254, sockets=2, cores=3, threads=2"'
> >>>> http://patchwork.ozlabs.org/patch/301272/
> >>>> bug then it's limitation of ACPI implementation,
> >>>> I'm going to refactor it to use full APIC ids instead of using bitmap,
> >>>> so that we won't ever run into issue regardless of cpu supported CPU count.
> >>>>
> >>>>>
> >>>>> Another part of me is hoping that the libvirt developers ask us to
> >>>>> please not do that, so I can use it as argument against exposing the
> >>>>> APIC IDs directly the next time we discuss this.  :)
> >>>>
> >>>> why not try your  /machine/node/N/socket/X/core/Y/thread/Z idea first.
> >>>> It will benefit not only cpu hotplug but also '-numa' and topology
> >>>> description in general.
> >>>>
> >>> have there been any plan/model of the idea? Need to add a new option to
> >>> qemu command?
> >> I suppose we can start with internal default implementation first.
> >>
> >> one way could be
> >>  1. let machine prebuild empty QOM tree /machine/node/N/socket/X/core/Y/thread/Z
> >>  2. add node, socket, core, thread properties to CPU and link CPU into respective
> >>     link created by #1
> >>  
> > Thanks, I hope I can take some time to make some patches to implement
> > it.
> 
> Please give us a few hours to reply. :)
> 
> /machine/node seems too broad a term to me.
> You can't prebuild the full tree, you can only prepare the nodes.
> core[Y]/thread[Z] was previously discussed as syntax.
> 
> The important part to decide on will be what is going to be child<> and
> what link<>. Has anyone played with the Intel Quark platform for
> instance? (Galileo board or upcoming Edison card) On a regular
> mainboard, we would have socket[X] as a link<x86_64-cpu>, which might
> point to a child<cpu> /machine/memory-node[W]/cpu[X]. But if we do so we
> can't reassign it to another memory node - acceptable? With Quark (or
> Qseven modules etc.) there would be a container object rather than the
> /machine itself that has a child<i386-cpu> instead of a link<i386-cpu>.
> I guess the memory nodes could still be on the /machine though.
> The other point of discussion between Anthony and me was whether core[Y]
> should be a link<> or child<>, same for thread. I believe a child<> is
> better as it enforces that unrealizing the CPU will unrealize all its
> cores and all its threads in the future.
In terms of parent/child relationship, I guess we are not going to come up
with uniform design, since boards could differ very much in that aspect.

I was rather thinking in terms of providing stable/uniform CLI/QMP NUMA
interface using QOM tree.
At startup we potentially have cpu topology information and set of NUMA
nodes, so we could pre-build containers up to the point where CPU threads
are attached and pre-create empty links<CPU> and fill them later with actual
CPU threads.

> 
> More issues may pop up when thinking about it longer than a few minutes.
> But yes, we need to start investigating this, and so far I had other
> priorities like getting the CPUState mess I created cleaned up.
> 
> Regards,
> Andreas
>