[libvirt] [PATCH 1/4] Introduce virNodeAllocPages
Michal Privoznik
mprivozn at redhat.com
Thu Sep 25 09:01:54 UTC 2014
On 23.09.2014 21:49, Eric Blake wrote:
> On 09/18/2014 02:24 AM, Michal Privoznik wrote:
>> A long time ago in a galaxy far, far away it has been decided
>> that libvirt will manage not only domains but host as well. And
>> with my latest work on qemu driver supporting huge pages, we miss
>> the cherry on top: an API to allocate huge pages on the run.
>> Currently users are forced to log into the host and adjust the
>> huge pages pool themselves. However, with this API the problem
>> is gone - they can both size up and size down the pool.
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>
>> +/**
>> + * virNodeAllocPages:
>> + * @conn: pointer to the hypervisor connection
>> + * @npages: number of items in the @pageSizes array
>
> I'd mention '@pageSizes and @pageCounts arrays'
>
>> + * @pageSizes: which huge page sizes to allocate
>> + * @pageCounts: how many pages should be allocated
>> + * @startCell: index of first cell to allocate pages on
>> + * @cellCount: number of consecutive cells to allocate pages on
>> + * @flags: extra flags; binary-OR of virConnectGetAllDomainStatsFlags
>> + *
>> + * Sometimes, when trying to start a new domain, it may be
>> + * necessary to reserve some huge pages in the system pool which
>> + * can be then allocated by the domain. This API serves that
>> + * purpose. On its input, @pageSizes and @pageCounts are arrays
>> + * of the same cardinality of @npages. The @pageSizes contains
>> + * page sizes which are to be allocated in the system (the size
>> + * unit is kibibytes), and @pageCounts then contains the number
>> + * of pages to reserve. Depending on @flags specified, the number
>> + * of pages is either added into the pool and the pool is sized
>> + * up (@flags have VIR_NODE_ALLOC_PAGES_ADD set) or the number of
>
> Technically, VIR_NODE_ALLOC_PAGES_ADD is 0, so it can't be set :) Do we
> even need this named constant, or can we just drop it?
Well, I'd like to keep it to explicitly state that the pool is added to.
You know:
unsigned int flags = 0;
if (<condition>)
flag |= VIR_NODE_ALLOC_PAGES_SET;
else
flag |= VIR_NODE_ALLOC_PAGES_ADD;
virNodeAlloPages(..., flags);
This is just a hint for users, the named value is certainly not needed.
I thought of following already pre-existing VIR_DOMAIN_AFFECT_CURRENT.
>
>> + * pages is looked at the new size of pool and the pool can be
>> + * both sized up or down (@flags have VIR_NODE_ALLOC_PAGES_SET
>
> The wording is a bit awkward; maybe:
>
> If @flags is 0 (VIR_NODE_ALLOC_PAGES_ADD), each pool corresponding to
> @pageSizes grows by the number of pages specified in the corresponding
> @pageCounts. If @flags contains VIR_NODE_ALLOC_PAGES_SET, each pool
> mentioned is resized to the given number of pages.
>
>> + * set). The pages pool can be allocated over several NUMA nodes
>> + * at once, just point at @startCell and tell how many subsequent
>> + * NUMA nodes should be taken in.
>
> Is there shorthand such as @startCell==-1 for allocating across all NUMA
> nodes?
Yes there is. I've updated the documentation.
> Is there checking that @npages cannot exceed the number of NUMA
> nodes available starting at @startCell through @cellCount?
Well, that's question for 3/4 but er, how to put it. No, not currently,
but I'll update 3/4.
>
> If I understand correctly, usage is something like (pseudo-code):
>
> virNodeAllocPages(conn, 2, (int[]){ [0]=2M, [1]=1G },
> (int[]){ [0]=1024, [1]=1 },
> 1, 2, 0)
>
> which proceeds to add 1024 pages of size 2M, and 1 page of size 1G, to
> the pools associated both with NUMA node 1 and NUMA node 2 (that is, I
> just allocated 6G across two nodes).
Exactly!
>
>> + *
>> + * Returns: the number of nodes successfully adjusted or -1 in
>> + * case of an error.
>
> Can this function partially succeed? That is, if I pass @npages==2 and
> @cellCount==2, will I ever get a non-negative result less than 4? If
> the result is 2, which allocations failed (are all allocations on the
> first cell attempted before any on the second, or are all allocations to
> the first pool size attempted across multiple cells before revisiting
> cells to allocate in the second pool size)?
Since there's no rollback on partially successful returns (in fact, the
majority of our APIs have no fallback at all) I find it just okay to
have it like that. Moreover, runtime huge page allocation is still a bit
of magic: the first call may fail, while the later attempt may succeed.
Then - we already have an API to fetch the pool sizes
(virNodeGetFreePages) so if the Alloc() partially fails, users may fetch
which pools succeeded and which needs to be adjusted again.
>
>> + */
>> +int
>> +virNodeAllocPages(virConnectPtr conn,
>> + unsigned int npages,
>> + unsigned int *pageSizes,
>> + unsigned long long *pageCounts,
>> + int startCell,
>> + unsigned int cellCount,
>> + unsigned int flags)
>> +{
>> + VIR_DEBUG("conn=%p npages=%u pageSizes=%p pageCounts=%p "
>> + "startCell=%d cellCount=%u flagx=%x",
>> + conn, npages, pageSizes, pageCounts, startCell,
>> + cellCount, flags);
>> +
>> + virResetLastError();
>> +
>> + virCheckConnectReturn(conn, -1);
>> + virCheckNonZeroArgGoto(npages, error);
>> + virCheckNonNullArgGoto(pageSizes, error);
>> + virCheckNonNullArgGoto(pageCounts, error);
>> +
>
> Where is the validation that startCell and cellCount make sense? I'd
> assume we want to ensure cellCount is positive? Or is there a special
> case of a count of 0 for visiting all NUMA cells without knowing in
> advance how many there are?
Oh right. I'll add the check. If we ever want to go with your idea, we
can just drop the check.
>
>> +++ b/src/libvirt_public.syms
>> @@ -677,6 +677,7 @@ LIBVIRT_1.2.8 {
>> virDomainListGetStats;
>> virDomainOpenGraphicsFD;
>> virDomainStatsRecordListFree;
>> + virNodeAllocPages;
>> } LIBVIRT_1.2.7;
>
> Need a new section for 1.2.9.
I'm still not fully used to our new release style, apparently :) Thanks
for catching that.
Michal
More information about the libvir-list
mailing list