[libvirt] [RFC PATCH 00/12] Add support for memory hotplug

Peter Krempa pkrempa at redhat.com
Mon Feb 9 14:15:11 UTC 2015


On Mon, Feb 09, 2015 at 08:38:52 -0500, John Ferlan wrote:
> On 01/30/2015 08:20 AM, Peter Krempa wrote:
> > !! this series applies on top of the cleanup series posted earlier !!
> > 
> > Hi,
> > this is my try to implement memory hotplug in libvirt (for the qemu) driver.
> > This series is almost code-complete but is lacking tests and docs as I wanted
> > to agree on design first before attempting to write the documentation.
> > 
> > Additionally this series is also lacking code that would fix memballoon
> > handling but I'm waiting on a clousure how it's supposed to work from qemu's
> > side as it appears to be broken there too.
> > 
> > The basic XML used to add a memory device:
> >     <memory model='acpi-dimm'>
> >       <target>
> >         <size unit='KiB'>524287</size>
> >         <node>0</node>
> >       </target>
> >     </memory>
> > 
> > The <target> subelement is mandatory, whereas the <source> subelement (that
> > contains source numa nodes, source page size ) is optional and is inferred
> > from the NUMA tuning for given target numa node.
> > 
> > Please note that at least one guest numa node has to be configured for the
> > guest for this to work (limitation of qemu).
> > 
> 
> Been kind of back-burner thinking about this series overall for the last
> couple of days. After reading Daniel's review - I think he touched on a
> few things I was pondering. Since I haven't followed the upstream QEMU
> work on this - I assume it's following that work very closely. I also
> assume other hypervisor's would/could be interested in providing similar
> functionality (hot plug memory) - I'm left wondering - is the model
> chosen flexible enough for that? You certainly will need a "healthy
> dose" of documentation to describe how to configure values properly.
> Perhaps even an extra utility to output what's available to be provided
> for the XML.
> 
> The one area that seems to be very QEMU implementation specific is the
> "<address type='acpi-dimm' slot='#' base='0xHEX'/>" XML. Maybe that
> becomes a "<driver name='qemu' slot='#' base='0xHEX'/>" instead leaving
> "<address type='acpi-dimm'/>" or even empty? Allowing users to provide
> "base" implies either extra validation to ensure it's 'correct' (format
> & usage/duplication) or documentation to describe what/how to provide
> the data. Perhaps even an extra utility to provide "valid values". If

Firstly, the address is by default generated automatically by qemu. We
represent it only for migration purposes so that the address doesn't
change in that case. The user ideally shouldn't provide any address and
let the hypervisor assign the module itself. That is the main reason I
chose to do it via the <address> element. Users usually don't touch
those while having the slot and base in the device itself would likely
incline the users to provide some values.

We can also make the address type qemu specific, but I still think that
the right place for the slot and base data is the address element.

Secondly, for validating of the actual data, we use qemu. The data is
passed through to the emulator and if it is incorrect it will either be
tweaked to the actual value used or rejected. We could (and I probably
will add this check in the next version) check the slot ID as it has to
be less than the maximum count configured at startup.

For the address qemu is doing alignment and other checks that I don't
want to represent in libvirt as it would be mostly black magic and
require us to mirror qemu. Basically the user shouldn't touch the base
address ... ever.

> QEMU is expecting specific values - maybe it'd be in our best interests
> not to expect the values. I suppose I'm having a difficult time
> visualizing the values and what they express. Also what if someone
> attempts to "poor man's clone" a live guest (dumpxml > file, edit file
> to change name, remove/change uuid, change disk source data, etc., then
> start). If they don't change those addresses - what will happen?  I

That is actually exactly what they should do. The address represents the
start address in the physical address space in the guest where the
memory module is placed at thus it can be duplicated.

> would think it would be "expected of" libvirt to ensure what's passed to
> QEMU is "uniquely expressed" anyway, right? What's still not clear to me
> is the relationship between 'acpi-dimm' and NUMA slots/nodes.
> 
> FWIW:  As an aside, after patch8, it would be beneficial to update
> qemuxml2argv-memory-hotplug.args in order to include the expected output
> of the "pc-dimm,node=%d,memdev=mem%s,id=%s" and "-object xxx -device
> xxxx"...

Tests are still missing for this series. I'll be tweaking them before
the non-RFC posting.

> 
> A tangent... When I see 'acpi-' in the XML - I think firmware or some
> external memory device which firmware needs to find/recognize... Long

Well it actually is this case. If a piece of the unused physicall
address space would become additional memory, then without any
interaction between the hardware (in this case qemu) and the guest
operating system the kernel wouldn't be able to guess that some memory
appeared and is free to use. ACPI in this case is used to notify the
guest kernel of the new memory that appeared.

> ago I worked on a project that used a memory device as a cluster
> interconnect (memory channel). A "practical use" in a virtual world
> could be as an interconnect for migrations. This is what I thought of
> when I saw a <memory> "device" with 'acpi-...' as opposed to a memory
> device that's providing host NUMA nodes/slots to the guest as extra
> available hot-plug memory.

That sounds more like RDMA-migration to me.

> 
> John
> 

Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20150209/9a402e9f/attachment-0001.sig>


More information about the libvir-list mailing list