[libvirt] RAM backend and guest ABI (was Re: [Qemu-devel] [PATCH v2] pc: memhp: enforce minimal 128Mb) alignment for pc-dimm

Eduardo Habkost ehabkost at redhat.com
Thu Oct 29 18:16:57 UTC 2015


(CCing Michal and libvir-list, so libvirt team is aware of this
restriction)

On Thu, Oct 29, 2015 at 02:36:37PM +0100, Igor Mammedov wrote:
> On Tue, 27 Oct 2015 14:36:35 -0200
> Eduardo Habkost <ehabkost at redhat.com> wrote:
> 
> > On Tue, Oct 27, 2015 at 10:14:56AM +0100, Igor Mammedov wrote:
> > > On Tue, 27 Oct 2015 10:53:08 +0200
> > > "Michael S. Tsirkin" <mst at redhat.com> wrote:
> > > 
> > > > On Tue, Oct 27, 2015 at 09:48:37AM +0100, Igor Mammedov wrote:
> > > > > On Tue, 27 Oct 2015 10:31:21 +0200
> > > > > "Michael S. Tsirkin" <mst at redhat.com> wrote:
> > > > > 
> > > > > > On Mon, Oct 26, 2015 at 02:24:32PM +0100, Igor Mammedov wrote:
> > > > > > > Yep it's workaround but it works around QEMU's broken virtio
> > > > > > > implementation in a simple way without need for guest side changes.
> > > > > > > 
> > > > > > > Without foreseeable virtio fix it makes memory hotplug unusable and even
> > > > > > > more so if there were a virtio fix it won't fix old guests since you've
> > > > > > > said that virtio fix would require changes of both QEMU and guest sides.
> > > > > > 
> > > > > > What makes it not foreseeable?
> > > > > > Apparently only the fact that we have a work-around in place so no one
> > > > > > works on it.  I can code it up pretty quickly, but I'm flat out of time
> > > > > > for testing as I'm going on vacation soon, and hard freeze is pretty
> > > > > > close.
> > > > > I can lend a hand for testing part.
> > > > > 
> > > > > > 
> > > > > > GPA space is kind of cheap, but wasting it in chunks of 512M
> > > > > > seems way too aggressive.
> > > > > hotplug region is sized with 1Gb alignment reserve per DIMM so we aren't
> > > > > actually wasting anything here.
> > > > >
> > > > 
> > > > If I allocate two 1G DIMMs, what will be the gap size? 512M? 1G?
> > > > It's too much either way.
> > > minimum would be 512, and if backend is 1Gb-hugepage gap will be
> > > backend's natural alignment (i.e. 1Gb).
> > 
> > Is backend configuration even allowed to affect the machine ABI? We need
> > to be able to change backend configuration when migrating the VM to
> > another host.
> for now, one has to use the same type of backend on both sides
> i.e. if source uses 1Gb huge pages backend then target also
> need to use it.
> 

The page size of the backend don't even depend on QEMU arguments, but on
the kernel command-line or hugetlbfs mount options. So it's possible to
have exactly the same QEMU command-line on source and destination (with
an explicit versioned machine-type), and get a VM that can't be
migrated? That means we are breaking our guarantees about migration and
guest ABI.


> We could change this for the next machine type to always force
> max alignment (1Gb), then it would be possible to change
> between backends with different alignments.

I'm not sure what's the best solution here. If always using 1GB is too
aggressive, we could require management to ask for an explicit alignment
as a -machine option if they know they will need a specific backend page
size.

BTW, are you talking about the behavior introduced by
aa8580cddf011e8cedcf87f7a0fdea7549fc4704 ("pc: memhp: force gaps between
DIMM's GPA") only, or the backend page size was already affecting GPA
allocation before that commit?

-- 
Eduardo




More information about the libvir-list mailing list