[libvirt] adding a new libvirt xml element for File Descriptor backed memory for use with vhost-user

Mooney, Sean K sean.k.mooney at intel.com
Thu May 12 18:57:49 UTC 2016



> -----Original Message-----
> From: Daniel P. Berrange [mailto:berrange at redhat.com]
> Sent: Thursday, May 12, 2016 5:28 PM
> To: Mooney, Sean K <sean.k.mooney at intel.com>
> Cc: libvir-list at redhat.com
> Subject: Re: [libvirt] adding a new libvirt xml element for File
> Descriptor backed memory for use with vhost-user
> 
> On Thu, May 12, 2016 at 04:00:29PM +0000, Mooney, Sean K wrote:
> > > > Today it is possible to use Libvirt to spawn a vm without hugepage
> > > > memory and a file descriptor backed memdev Via the use of the
> > > qemu:commandline element.
> > > >
> > > >   <qemu:commandline>
> > > >     <qemu:arg value='-object'/>
> > > >     <qemu:arg value='memory-backend-file,id=mem,size=1024M,mem-
> > > path=/var/lib/libvirt/qemu,share=on'/>
> > > >     <qemu:arg value='-numa'/>
> > > >     <qemu:arg value='node,memdev=mem'/>
> > > >     <qemu:arg value='-mem-prealloc'/>
> > > >   </qemu:commandline>
> > > >
> > > > I created a proof of concept patch to nova to demonstrate that
> > > > this works however to support this usecase in Nova a new xml
> > > > element is
> > > required.
> > > > https://review.openstack.org/#/c/309565/1
> > > >
> > > > I would like to propose the introduction of  a new subelemnt to
> > > > the memorybacking element to request file discrptro backed memory
> > > >
> > > > <memoryBacking>
> > > >    <filedescriptor size_mb="1024" path="/var/lib/libvirt/qemu"
> > > > prealloc="true" shared="on" />  </memoryBacking>
> > >
> > > Specifying a size is not required - we already know how big memory
> > > must be for the guest.
> > >
> > > We already have a memAccess='shared' attribute against the <numa>
> > > element that is used to determine if the underlying memory should be
> > > setup as shared.  We could define a further element that lets us
> > > control memory access mode for guests without NUMA topology
> specified.
> > [Mooney, Sean K] hi yes the reason I added the shared attribute was to
> > cater for The case of guest without numa topology. For guest with numa
> > topology I agree that Using the memAcess='shared' on the cell is
> better for consistency with hugepage memory.
> >
> > >   <memoryBacking>
> > >      <access mode="shared"/>
> > >   </memoryBacking>
> > >
> > > For huge pages it seems we unconditionally pass --mem-prealloc. I'm
> > > thinking we could perhaps make that configurable via an element
> > >
> > >
> > >   <memoryBacking>
> > >      <allocation mode="immediate|ondemand"/>
> > >   </memoryBacking>
> > >
> > > to control use of -mem-prealloc or not.
> > [Mooney, Sean K] for the vhost user case the the mem-prealloc is
> > required Because you are basically doing dma so you really want memory
> to allocated.
> > Generically though from a Libvirt point of view I do think It makes
> > sense for this To be configurable to allow over subscript of memory
> for higher density.
> > >
> > > So all that remains is a way to request file based backing of RAM.
> > > As with huge pages, I think we should hide the actual path from the
> user.
> > > We should just use /dev/shm as the backing for non-hugepage RAM. For
> > > this we could define something like
> > >
> > >    <memoryBacking>
> > >        <source type="file|anonymous"/>
> > >    </memoryBacking>
> > >
> > [Mooney, Sean K] for some reason when I used /dev/shm I could only
> boot one instance at a time.
> > that was my first choice but maybe we would have to create a file per
> instance under /dev/shm to make it work.
> 
> QEMU should create the file itself - its not different to our use of
> hugetlbfs in fact. Possibly you hit a limit on amount of memory allowed
> to be used via /dev/shm - iirc the mount point tis limited to 50% by
> default
> 
> If you use /var/lib/libvirt/ as the location you get a real file backed
> by disk, so akin to putting the VM on swap IIUC !
[Mooney, Sean K] That was my initial assumption too however when you use 
/var/lib/libvirt/ or /dev/shm qemu does not create a file in the directory.
What I think is happening is it does not actually create a file and just
a file descriptor that is mapped to a memory region. I believe it is merely
using the path to determine what the default page size should be when allocating
filebacking in memory. This is something that we can look into though.

> 
> > > Putting that all together, to get what you want we'd have
> > >
> > >    <memoryBacking>
> > >        <source type="file"/>
> > >        <access mode="shared"/>
> > >        <allocation mode="immediate"/>
> > >    </memoryBacking>
> > >
> > [Mooney, Sean K]
> > Yes this seems like it would be a clean way to address this use case.
> > Can you guage how small/large of a change this would be. Its been A
> > while since I worked with c directly but if you could point me in the
> > Right direction in the Libvirt  codebase I would be happy to look at
> > creating an RFC patch.
> 
> First there's defining the XML extensions - needs
> docs/schemas/domaincommon.rng and src/conf/domain_conf.{c,h} to be
> changed.
> 
> Then there's wiring up QEMU XML -> ARGV conversion -
> src/qemu/qemu_command.c and adding test cases in
> tests/qemuxml2argvtest.c
> 
> > From a nova side assuming Libvirt was extended for this feature should
> > I open a blueprint to extend the existing guest memory backing support
> > In parallel to the Libvirt implementation or wait until after it is
> > support in Libvirt to start the Nova discussion? In either case I
> > think we agree that any support in nova Would Depend on Libvirt
> > support to be accepted in  upstream nova.
> 
> You're going to hit the deadline for approval of Newton specs in Nova
> fairly soon, and unless the libvirt impl is done before then, I think it
> is unlikely you'd get a spec approved. So by all means work on this in
> parallel, but be realistic about chances of approval in Nova for this
> cycle.
[Mooney, Sean K]  actually I was assuming that this would be completed early
In Ocata as it required changes in Libvirt first. 
> 
> 
> Regards,
> Daniel
> --
> |: http://berrange.com      -o-
> http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-
> manager.org :|
> |: http://autobuild.org       -o-
> http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-
> vnc :|




More information about the libvir-list mailing list