[libvirt] [PATCH v2] lib: Add note that bulk stats API queries may overrun RPC buffers

Wed May 24 10:55:16 UTC 2017

On Tue, May 23, 2017 at 05:29:28PM +0200, Michal Privoznik wrote:
>On 05/23/2017 05:19 PM, Martin Kletzander wrote:
>> On Tue, May 23, 2017 at 05:07:40PM +0200, Michal Privoznik wrote:
>>> On 05/23/2017 04:35 PM, Martin Kletzander wrote:
>>>> On Tue, May 23, 2017 at 04:23:30PM +0200, Peter Krempa wrote:
>>>>> Hint that the users should limit the number of VMs queried in the bulk
>>>>> stats API.
>>>>> ---
>>>>> v2:
>>>>> - added a suggestion of the number of queried VMs (valid after bump to
>>>>> 32M message)
>>>>>
>>>>> src/libvirt-domain.c | 8 ++++++++
>>>>> 1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
>>>>> index 310b91b37..aef061943 100644
>>>>> --- a/src/libvirt-domain.c
>>>>> +++ b/src/libvirt-domain.c
>>>>> @@ -11315,6 +11315,10 @@ virConnectGetDomainCapabilities(virConnectPtr
>>>>> conn,
>>>>>  * VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF and/or
>>>>>  * VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER for all other states.
>>>>>  *
>>>>> + * Note that this API is prone to exceeding maximum RPC message size
>>>>> on hosts
>>>>> + * with lots of VMs so it's suggested to use virDomainListGetStats
>>>>> with a
>>>>> + * reasonable list of VMs as the argument.
>>>>> + *
>>>>>  * Returns the count of returned statistics structures on success, -1
>>>>> on error.
>>>>>  * The requested data are returned in the @retStats parameter. The
>>>>> returned
>>>>>  * array should be freed by the caller. See
>>>>> virDomainStatsRecordListFree.
>>>>> @@ -11386,6 +11390,10 @@ virConnectGetAllDomainStats(virConnectPtr
>>>>> conn,
>>>>>  * Note that any of the domain list filtering flags in @flags may be
>>>>> rejected
>>>>>  * by this function.
>>>>>  *
>>>>> + * Note that this API is prone to exceeding maximum RPC if querying
>>>>> too many VMs
>>>>> + * with lots of statistics. It's suggested to query in batches of
>>>>> 10VMs, which
>>>>> + * should be good enough for VMs with 3000 disks + networks.
>>>>> + *
>>>>
>>>> Coming to think about it... Why don't we just batch this ourselves under
>>>> the hood and just return the merged result?
>>>
>>> Because:
>>>
>>> https://www.redhat.com/archives/libvir-list/2017-May/msg00088.html
>>>
>>
>> Not on the RPC level, the API would just be syntax sugar to
>> virDomainListGetStats() if a flag was passed
>> (e.g.
>> VIR_DOMAIN_GET_ALL_STATS_I_DONT_REALLY_CARE_IF_THIS_IS_DONE_IN_ONE_LIBVIRT_CALL)
>
>We can't guarantee atomicity of such call, so I don't see a reason for

Yes, that's why it would *only* happen if that flag was present.  That's
precisely why I added info about the flag.

>us to implement it. On the other hand, it's us who knows the limits for
>RPC, not users. So they might have some troubles writing it. But then
>again, we would have some troubles too because we might be talking to an
>older server with lower limits thus we'd need some guessing too.
>

We would definitely need guessing.  You cannot predict the size very
nicely.

>Also, the flag would expose what we want to hide - RPC layer. It should
>be no concern for users how we pass data around.
>

No, this has nothing to do with RPC.  Let me "pseudocode" this:

int
virConnectGetAllDomainStats(virConnectPtr conn,
                            ...,
                            unsigned int flags)
{
    bool legacy = flags & VIR_CONNECT_GET_ALL_DOMAIN_STATS_OK_IF_NOT_ATOMIC;

    ...

    ret = conn->driver->connectGetAllDomainStats(conn, ..., flags);
    if (ret < 0 && lastError == RPC_SIZE_ERROR) {
        if (legacy || !conn->driver->domainListGetStats)
            goto cleanup;

        virResetLastError();

        ndomains = conn->driver->connectListAllDomains(conn, ..., flags);
        if (ndomains < 0)
            goto cleanup;

        /* halve the number of domains requested until the RPC error is
         * avoided and next batches start from the number that worked
         * and for each batch call domainListGetStats() and then merge
         * the results.
         *
         * I was trying to implement it, but since it looks like nobody
         * else likes the idea, it would be wasting my time. */
    }

    ...
}

Or we can expose it as a new helper function.  I see no RPC exposition
here.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170524/f88882e0/attachment-0001.sig>