[libvirt] [RFC][scale] new API for querying domains stats
Michal Privoznik
mprivozn at redhat.com
Tue Jul 1 09:47:32 UTC 2014
On 01.07.2014 11:33, Daniel P. Berrange wrote:
> On Tue, Jul 01, 2014 at 11:19:04AM +0200, Michal Privoznik wrote:
>> On 01.07.2014 09:09, Francesco Romani wrote:
>>> Hi everyone,
>>>
>>> I'd like to discuss possible APIs and plans for new query APIs in libvirt.
>>>
>>> I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM;
>>> VDSM is the node management daemon, which is in charge, among many other things, to
>>> gather the host and statistics per Domain/VM.
>>>
>>> Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans
>>> to scale much more, and to possibly reach thousands in a not so distant future.
>>> At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk),
>>> and of course this obviously scales poorly.
>>
>> I think this is your main problem. Why not have only one thread that would
>> manage list of domains to query and issue the APIs periodically instead of
>> having one thread per domain?
>
> You suffer from round trip time on every API call if you serialize it all
> in a single thread. eg if every API call is 50ms and you want to check
> once per scond, you can only monitor 20 VMs before you take more time than
> you have available. This really sucks when the majority of that 50ms is a
> sleep in poll() waiting for the RPC response.
Unless you have the bulk query API which will take the RTT only once ;)
>
>>> This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously
>>> python 2.x behaves very badly with threads. We are already working to improve our code,
>>> but I'd like to bring the discussion here and see if and when the querying API can be improved.
>>>
>>> We currently use these APIs for our sempling:
>>> virDomainBlockInfo
>>> virDomainGetInfo
>>> virDomainGetCPUStats
>>> virDomainBlockStats
>>> virDomainBlockStatsFlags
>>> virDomainInterfaceStats
>>> virDomainGetVcpusFlags
>>> virDomainGetMetadata
>>>
>>> What we'd like to have is
>>>
>>> * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
>>> This would be just awesome. Either a single callback or a different one per call is fine
>>> (let's discuss this!).
>>> please note that we are much more concerned about thread reduction then about performance
>>> numbers. We had report of thread number becoming a real harm, while performance so far
>>> is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)
>>
>> I'm not a big fan of this approach. I mean, IIRC python has this Big Python
>> Lock, which effectively prevents two threads run concurrently. So while in C
>> this would make perfect sense, it doesn't do so in python. The callbacks
>> would be called from the event loop, which given how frequently you dump the
>> info will block other threads. Therefore I'm afraid the approach would not
>> bring any speed up, rather slow down.
>
> I'm not sure I agree with your assessment here. If we consider a single
> API call, the time this takes to complete is made up of a number of parts
>
> 1. Time to write() the RPC call to the socket
> 2. Time for libvirtd to process the RPC call
> 3. Time to recv() the RPC reply from the socket
>
> 1. Time to write() the RPC call to the socket
> 2. Time for libvirtd to process the RPC call
> 3. Time to recv() the RPC reply from the socket
>
> 1. Time to write() the RPC call to the socket
> 2. Time for libvirtd to process the RPC call
> 3. Time to recv() the RPC reply from the socket
> ...and so on..
>
> If the time for item 2 dominates over the time for items 1 & 2 (which
> it should really) then the client thread is going to be sleeping in a
> poll() for the bulk of the duration of the libvirt API call. If we had
> an async API mechanism, then the VDSM time would essentially be consumed
> with
>
> 1. Time to write() the RPC call to the socket
> 2. Time to write() the RPC call to the socket
> 3. Time to write() the RPC call to the socket
> 4. Time to write() the RPC call to the socket
> 5. Time to write() the RPC call to the socket
> 6. Time to write() the RPC call to the socket
> 7. wait for replies to start arriving
> 8. Time to recv() the RPC reply from the socket
> 9. Time to recv() the RPC reply from the socket
> 10. Time to recv() the RPC reply from the socket
> 11. Time to recv() the RPC reply from the socket
> 12. Time to recv() the RPC reply from the socket
> 13. Time to recv() the RPC reply from the socket
> 14. Time to recv() the RPC reply from the socket
>
Well, in the async form you need to account even the time spent in the
callbacks:
1. write(serial=1, ...)
2. write(serial=2, ...)
..
7. wait for replies
8. recv(serial=x1, ...) // there's no guarantee on order of replies
9. callback(serial=x1, ...)
10. recv(serial=x2, ...)
11. callback(serial=x2, ....)
And it's the callback times I'm worried about. I'm not saying we should
not add the callback APIs. What I'm really saying is I have doubts it
will help python apps. It will definitely help scaling C applications
though.
> Of course there's a limit to how many outstanding async calls you can
> make before the event loop gets 100% busy processing the responses,
> but I don't think that makes async calls worthless. Even if we had the
> bulk list API calls, async calling would be useful, because it would
> let VDSM fire off requests for disk, net, cpu, mem stats in parallel
> from a single thread.
>
> Regards,
> Daniel
>
Michal
More information about the libvir-list
mailing list