[libvirt] [PATCH 01/12] Introduce virNodeHugeTLB

Daniel P. Berrange berrange at redhat.com
Fri May 30 16:48:08 UTC 2014


On Fri, May 30, 2014 at 03:42:01PM +0200, Michal Privoznik wrote:
> On 30.05.2014 14:46, Daniel P. Berrange wrote:
> >On Fri, May 30, 2014 at 01:41:06PM +0200, Michal Privoznik wrote:
> >>On 30.05.2014 10:52, Daniel P. Berrange wrote:
> >>>On Thu, May 29, 2014 at 10:32:35AM +0200, Michal Privoznik wrote:
> >>>>  /**
> >>>>+ * virNodeHugeTLB:
> >>>>+ * @conn: pointer to the hypervisor connection
> >>>>+ * @type: type
> >>>>+ * @params: pointer to memory parameter object
> >>>>+ *          (return value, allocated by the caller)
> >>>>+ * @nparams: pointer to number of memory parameters; input and output
> >>>>+ * @flags: extra flags; not used yet, so callers should always pass 0
> >>>>+ *
> >>>>+ * Get information about host's huge pages. On input, @nparams
> >>>>+ * gives the size of the @params array; on output, @nparams gives
> >>>>+ * how many slots were filled with parameter information, which
> >>>>+ * might be less but will not exceed the input value.
> >>>>+ *
> >>>>+ * As a special case, calling with @params as NULL and @nparams
> >>>>+ * as 0 on input will cause @nparams on output to contain the
> >>>>+ * number of parameters supported by the hypervisor. The caller
> >>>>+ * should then allocate @params array, i.e.
> >>>>+ * (sizeof(@virTypedParameter) * @nparams) bytes and call the API
> >>>>+ * again.  See virDomainGetMemoryParameters() for an equivalent
> >>>>+ * usage example.
> >>>>+ *
> >>>>+ * Returns 0 in case of success, and -1 in case of failure.
> >>>>+ */
> >>>>+int
> >>>>+virNodeHugeTLB(virConnectPtr conn,
> >>>>+               int type,
> >>>>+               virTypedParameterPtr params,
> >>>>+               int *nparams,
> >>>>+               unsigned int flags)
> >>>
> >>>What is the 'type' parameter doing ?
> >>
> >>Ah, it should be named numa_node rather than type. If type==-1, then overall
> >>statistics are returned (number of {available,free} pages accumulated across
> >>all NUMA nodes), if type >= 0, info on the specific NUMA node is returned.
> >>
> >>>
> >>>I think in general this API needs a different design. I'd like to have
> >>>an API that can request info for all page sizes on all NUMA nods in a
> >>>single call. I also think the static unchanging data should be part of
> >>>the cpu + NUMA info in the capabilities XML. So the API only reports
> >>>info which is changing - ie the available pages.
> >>
> >>The only problem is, the size of huge pages pool is not immutable. Now it's
> >>possible for 2M huge pages to be allocated dynamically:
> >>
> >># echo 8 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> >>
> >>and it may be possible for 1GB too in future (what if kernel learns how to
> >>do it?). In general, the only thing that we can take unalterable for now is
> >>the default size of huge pages. And I wouldn't bet on that either.
> >
> >Yes, you can in theory change the number of huge pages at an arbitrary
> >time, but realistically people mostly only do it immediately at boot.
> >With 1 GB pages is will be impossible todo it any time except immediately
> >at boot. If you wait a little while, then memory will be too fragmented
> >for you to be able to dynamically allocate more 1 GB pages. The same
> >is true of 2MB pages to a lesser degree.
> 
> IMO no. Processes never ever see physical address (PA). All they see are
> virtual addresses (VA). So there is possibility for kernel to rearrange
> physical memory without effect on the processes in order to gain bigger
> segments of free memory. 

Applications aren't typically the problem - it is the kernels' own data
structures that often cannot be moved at all, so over time they will
cause free physical RAM regions to be very fragmented. It is a real
problem with huge page usage which will be an order of magnitude worse
for 1GB size pages.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list