[libvirt] [RFC] Memory hotplug for qemu guests and the relevant XML parts
Daniel P. Berrange
berrange at redhat.com
Wed Jul 30 10:08:04 UTC 2014
On Tue, Jul 29, 2014 at 05:05:23PM +0100, Daniel P. Berrange wrote:
> On Tue, Jul 29, 2014 at 04:40:50PM +0200, Peter Krempa wrote:
> > On 07/24/14 17:03, Peter Krempa wrote:
> > > On 07/24/14 16:40, Daniel P. Berrange wrote:
> > >> On Thu, Jul 24, 2014 at 04:30:43PM +0200, Peter Krempa wrote:
> > >>> On 07/24/14 16:21, Daniel P. Berrange wrote:
> > >>>> On Thu, Jul 24, 2014 at 02:20:22PM +0200, Peter Krempa wrote:
> >
> > >>
> > >>>> So from that POV, I'd say that when we initially configure the
> > >>>> NUMA / huge page information for a guest at boot time, we should
> > >>>> be doing that wrt to the 'maxMemory' size, instead of the current
> > >>>> 'memory' size. ie the actual NUMA topology is all setup upfront
> > >>>> even though the DIMMS are not present for some of this topology.
> > >>>>
> > >>>>> "address" determines the address in the guest's memory space where the
> > >>>>> memory will be mapped. This is optional and not recommended being set by
> > >>>>> the user (except for special cases).
> > >>>>>
> > >>>>> For expansion the model="pflash" device may be added.
> > >>>>>
> > >>>>> For migration the target VM needs to be started with the hotplugged
> > >>>>> modules already specified on the command line, which is in line how we
> > >>>>> treat devices currently.
> > >>>>>
> > >>>>> My suggestion above contrasts with the approach Michal and Martin took
> > >>>>> when adding the numa and hugepage backing capabilities as they describe
> > >>>>> a node while this describes the memory device beneath it. I think those
> > >>>>> two approaches can co-exist whilst being mutually-exclusive. Simply when
> > >>>>> using memory hotplug, the memory will need to be specified using the
> > >>>>> memory modules. Non-hotplug guests could use the approach defined
> > >>>>> originally.
> > >>>>
> > >>>> I don't think it is viable to have two different approaches for configuring
> > >>>> NUMA / huge page information. Apps should not have to change the way they
> > >>>> configure NUMA/hugepages when they decide they want to take advantage of
> > >>>> DIMM hotplug.
> > >>>
> > >>> Well, the two approaches are orthogonal in the information they store.
> > >>> The existing approach stores the memory topology from the point of view
> > >>> of the numa node whereas the <device> based approach from the point of
> > >>> the memory module.
> > >>
> > >> Sure, they are clearly designed from different POV, but I'm saying that
> > >> from an application POV is it very unpleasant to have 2 different ways
> > >> to configure the same concept in the XML. So I really don't want us to
> > >> go down that route unless there is absolutely no other option to achieve
> > >> an acceptable level of functionality. If that really were the case, then
> > >> I would strongly consider reverting everything related to NUMA that we
> > >> have just done during this dev cycle and not releasing it as is.
> > >>
> > >>> The difference is that the existing approach currently wouldn't allow
> > >>> splitting a numa node into more memory devices to allow
> > >>> plugging/unplugging them.
> > >>
> > >> There's no reason why we have to assume 1 memory slot per guest or
> > >> per node when booting the guest. If the user wants the ability to
> > >> unplug, they could set their XML config so the guest has arbitrary
> > >> slot granularity. eg if i have a guest
> > >>
> > >> - memory == 8 GB
> > >> - max-memory == 16 GB
> > >> - NUMA nodes == 4
> > >>
> > >> Then we could allow them to specify 32 memory slots each 512 MB
> > >> in size. This would allow them to plug/unplug memory from NUMA
> > >> nodes in 512 MB granularity.
> >
> > In real hardware you still can plug in modules of different sizes. (eg
> > 1GiB + 2Gib) ...
>
> I was just illustrating that as an example of the default we'd
> write into the XML if the app hadn't explicitly given any slot
> info themselves. If doing it manually you can of course list
> the slots with arbitrary sizes, each a different size.
>
> > > Well, while this makes it pretty close to real hardware, the emulated
> > > one doesn't have a problem with plugging "dimms" of weird
> > > (non-power-of-2) sizing. And we are loosing flexibility due to that.
> > >
> >
> > Hmm, now that the rest of the Hugepage stuff was pushed and the release
> > is rather soon. What approach should I take? I'd rather avoid crippling
> > the interface for memory hotplug and having to add separate apis and
> > other stuff and mostly I'd like to avoid having to re-do it after
> > consumers of libvirt deem it to be unflexible.
>
> NB, as a general point of design, it isn't our goal to always directly
> expose every possible way to configuring things that QEMU allows. If
> there are multiple ways to achieve the same end goal it is valid for
> libvirt to pick a particular approach and not expose all possible QEMU
> flexibility. This is especially true if this makes cross-hypervisor
> support of the feature more practical.
>
> Looking at the big picture, we've got a bunch of memory related
> configuration sets
>
> - Guest NUMA topology setup, assigning vCPUs and RAM to guest nodes
>
> <cpu>
> <numa>
> <cell id='0' cpus='0' memory='512000'/>
> <cell id='1' cpus='1' memory='512000'/>
> <cell id='2' cpus='2-3' memory='1024000'/>
> </numa>
> </cpu>
>
> - Request the use of huge pages, optionally different size
> per guest NUMA node
>
> <memoryBacking>
> <hugepages/>
> </memoryBacking>
>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB' nodeset='0,1'/>
> <page size='1' unit='GiB' nodeset='2'/>
> </hugepages>
> </memoryBacking>
>
> - Mapping of guest NUMA nodes to host NUMA nodes
>
> <numatune>
> <memory mode="strict" nodeset="1-4,^3"/>
> <memnode cellid="0" mode="strict" nodeset="1"/>
> <memnode cellid="1" mode="strict" nodeset="2"/>
> </numatune>
>
>
> At the QEMU level, aside from the size of the DIMM, the memory slot
> device lets you
>
> 1. Specify guest NUMA node to attach to
> 2. Specify host NUMA node to assign to
> 3. Request use of huge pages, optionally with size
[snip]
> So I think it is valid for libvirt to expose the memory slot feature
> just specifying the RAM size and the guest NUMA node and infer huge
> page usage, huge page size and host NUMA node from existing data that
> libvirt has in its domain XML document elsewhere.
I meant to outline how I thought hotplug/unplug would interact with
the existing data.
When first booting the guest
- If the XML does not include any memory slot info, we should
add minimum possible memory slots to match the per-guest
NUMA node config.
- If XML does include slots, then we must validate that the
sum of the memory for slots listed against each guest NUMA
node matches the memory set in /cpu/numa/cell/@memory
When hugepages are in use we need to make we validate that we're
adding slots whose size is a multiple of huge page size. The code
should already be validating that each NUMA node is a multiple of
the configured hge page size for that node.
When hotplugging / unplugging
- Libvirt would update the /cpu/numa/cell/@memory attribute
and /memory element to reflect the newly added/removed DIMM
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list