[libvirt] [PATCH 1/4] Introduce virNodeAllocPages

Michal Privoznik mprivozn at redhat.com
Thu Sep 25 09:01:54 UTC 2014


On 23.09.2014 21:49, Eric Blake wrote:
> On 09/18/2014 02:24 AM, Michal Privoznik wrote:
>> A long time ago in a galaxy far, far away it has been decided
>> that libvirt will manage not only domains but host as well. And
>> with my latest work on qemu driver supporting huge pages, we miss
>> the cherry on top: an API to allocate huge pages on the run.
>> Currently users are forced to log into the host and adjust the
>> huge pages pool themselves.  However, with this API the problem
>> is gone - they can both size up and size down the pool.
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>
>> +/**
>> + * virNodeAllocPages:
>> + * @conn: pointer to the hypervisor connection
>> + * @npages: number of items in the @pageSizes array
>
> I'd mention '@pageSizes and @pageCounts arrays'
>
>> + * @pageSizes: which huge page sizes to allocate
>> + * @pageCounts: how many pages should be allocated
>> + * @startCell: index of first cell to allocate pages on
>> + * @cellCount: number of consecutive cells to allocate pages on
>> + * @flags: extra flags; binary-OR of virConnectGetAllDomainStatsFlags
>> + *
>> + * Sometimes, when trying to start a new domain, it may be
>> + * necessary to reserve some huge pages in the system pool which
>> + * can be then allocated by the domain. This API serves that
>> + * purpose. On its input, @pageSizes and @pageCounts are arrays
>> + * of the same cardinality of @npages. The @pageSizes contains
>> + * page sizes which are to be allocated in the system (the size
>> + * unit is kibibytes), and @pageCounts then contains the number
>> + * of pages to reserve. Depending on @flags specified, the number
>> + * of pages is either added into the pool and the pool is sized
>> + * up (@flags have VIR_NODE_ALLOC_PAGES_ADD set) or the number of
>
> Technically, VIR_NODE_ALLOC_PAGES_ADD is 0, so it can't be set :)  Do we
> even need this named constant, or can we just drop it?

Well, I'd like to keep it to explicitly state that the pool is added to. 
You know:

unsigned int flags = 0;

if (<condition>)
   flag |= VIR_NODE_ALLOC_PAGES_SET;
else
   flag |= VIR_NODE_ALLOC_PAGES_ADD;

virNodeAlloPages(..., flags);

This is just a hint for users, the named value is certainly not needed. 
I thought of following already pre-existing VIR_DOMAIN_AFFECT_CURRENT.

>
>> + * pages is looked at the new size of pool and the pool can be
>> + * both sized up or down (@flags have VIR_NODE_ALLOC_PAGES_SET
>
> The wording is a bit awkward; maybe:
>
> If @flags is 0 (VIR_NODE_ALLOC_PAGES_ADD), each pool corresponding to
> @pageSizes grows by the number of pages specified in the corresponding
> @pageCounts.  If @flags contains VIR_NODE_ALLOC_PAGES_SET, each pool
> mentioned is resized to the given number of pages.
>
>> + * set).  The pages pool can be allocated over several NUMA nodes
>> + * at once, just point at @startCell and tell how many subsequent
>> + * NUMA nodes should be taken in.
>
> Is there shorthand such as @startCell==-1 for allocating across all NUMA
> nodes?

Yes there is. I've updated the documentation.

> Is there checking that @npages cannot exceed the number of NUMA
> nodes available starting at @startCell through @cellCount?

Well, that's question for 3/4 but er, how to put it. No, not currently, 
but I'll update 3/4.

>
> If I understand correctly, usage is something like (pseudo-code):
>
> virNodeAllocPages(conn, 2, (int[]){ [0]=2M, [1]=1G },
>                    (int[]){ [0]=1024, [1]=1 },
>                    1, 2, 0)
>
> which proceeds to add 1024 pages of size 2M, and 1 page of size 1G, to
> the pools associated both with NUMA node 1 and NUMA node 2 (that is, I
> just allocated 6G across two nodes).

Exactly!

>
>> + *
>> + * Returns: the number of nodes successfully adjusted or -1 in
>> + * case of an error.
>
> Can this function partially succeed?  That is, if I pass @npages==2 and
> @cellCount==2, will I ever get a non-negative result less than 4?  If
> the result is 2, which allocations failed (are all allocations on the
> first cell attempted before any on the second, or are all allocations to
> the first pool size attempted across multiple cells before revisiting
> cells to allocate in the second pool size)?

Since there's no rollback on partially successful returns (in fact, the 
majority of our APIs have no fallback at all) I find it just okay to 
have it like that. Moreover, runtime huge page allocation is still a bit 
of magic: the first call may fail, while the later attempt may succeed. 
Then - we already have an API to fetch the pool sizes 
(virNodeGetFreePages) so if the Alloc() partially fails, users may fetch 
which pools succeeded and which needs to be adjusted again.

>
>> + */
>> +int
>> +virNodeAllocPages(virConnectPtr conn,
>> +                  unsigned int npages,
>> +                  unsigned int *pageSizes,
>> +                  unsigned long long *pageCounts,
>> +                  int startCell,
>> +                  unsigned int cellCount,
>> +                  unsigned int flags)
>> +{
>> +    VIR_DEBUG("conn=%p npages=%u pageSizes=%p pageCounts=%p "
>> +              "startCell=%d cellCount=%u flagx=%x",
>> +              conn, npages, pageSizes, pageCounts, startCell,
>> +              cellCount, flags);
>> +
>> +    virResetLastError();
>> +
>> +    virCheckConnectReturn(conn, -1);
>> +    virCheckNonZeroArgGoto(npages, error);
>> +    virCheckNonNullArgGoto(pageSizes, error);
>> +    virCheckNonNullArgGoto(pageCounts, error);
>> +
>
> Where is the validation that startCell and cellCount make sense?  I'd
> assume we want to ensure cellCount is positive?  Or is there a special
> case of a count of 0 for visiting all NUMA cells without knowing in
> advance how many there are?

Oh right. I'll add the check. If we ever want to go with your idea, we 
can just drop the check.

>
>> +++ b/src/libvirt_public.syms
>> @@ -677,6 +677,7 @@ LIBVIRT_1.2.8 {
>>           virDomainListGetStats;
>>           virDomainOpenGraphicsFD;
>>           virDomainStatsRecordListFree;
>> +        virNodeAllocPages;
>>   } LIBVIRT_1.2.7;
>
> Need a new section for 1.2.9.

I'm still not fully used to our new release style, apparently :) Thanks 
for catching that.

Michal




More information about the libvir-list mailing list