[libvirt] [RFC][scale] new API for querying domains stats

Tue Jul 1 09:33:17 UTC 2014

On Tue, Jul 01, 2014 at 11:19:04AM +0200, Michal Privoznik wrote:
> On 01.07.2014 09:09, Francesco Romani wrote:
> >Hi everyone,
> >
> >I'd like to discuss possible APIs and plans for new query APIs in libvirt.
> >
> >I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM;
> >VDSM is the node management daemon, which is in charge, among many other things, to
> >gather the host and statistics per Domain/VM.
> >
> >Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans
> >to scale much more, and to possibly reach thousands in a not so distant future.
> >At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk),
> >and of course this obviously scales poorly.
> 
> I think this is your main problem. Why not have only one thread that would
> manage list of domains to query and issue the APIs periodically instead of
> having one thread per domain?

You suffer from round trip time on every API call if you serialize it all
in a single thread. eg if every API call is 50ms and you want to check
once per scond, you can only monitor  20 VMs before you take more time than
you have available. This really sucks when the majority of that 50ms is a
sleep in poll() waiting for the RPC response.

> >This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously
> >python 2.x behaves very badly with threads. We are already working to improve our code,
> >but I'd like to bring the discussion here and see if and when the querying API can be improved.
> >
> >We currently use these APIs for our sempling:
> >   virDomainBlockInfo
> >   virDomainGetInfo
> >   virDomainGetCPUStats
> >   virDomainBlockStats
> >   virDomainBlockStatsFlags
> >   virDomainInterfaceStats
> >   virDomainGetVcpusFlags
> >   virDomainGetMetadata
> >
> >What we'd like to have is
> >
> >* asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
> >   This would be just awesome. Either a single callback or a different one per call is fine
> >   (let's discuss this!).
> >   please note that we are much more concerned about thread reduction then about performance
> >   numbers. We had report of thread number becoming a real harm, while performance so far
> >   is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)
> 
> I'm not a big fan of this approach. I mean, IIRC python has this Big Python
> Lock, which effectively prevents two threads run concurrently. So while in C
> this would make perfect sense, it doesn't do so in python. The callbacks
> would be called from the event loop, which given how frequently you dump the
> info will block other threads. Therefore I'm afraid the approach would not
> bring any speed up, rather slow down.

I'm not sure I agree with your assessment here. If we consider a single
API call, the time this takes to complete is made up of a number of parts

 1. Time to write() the RPC call to the socket
 2. Time for libvirtd to process the RPC call
 3. Time to recv() the RPC reply from the socket

 1. Time to write() the RPC call to the socket
 2. Time for libvirtd to process the RPC call
 3. Time to recv() the RPC reply from the socket
 ...and so on..

If the time for item 2 dominates over the time for items 1 & 2 (which
it should really) then the client thread is going to be sleeping in a
poll() for the bulk of the duration of the libvirt API call. If we had
an async API mechanism, then the VDSM time would essentially be consumed
with

 1. Time to write() the RPC call to the socket
 2. Time to write() the RPC call to the socket
 3. Time to write() the RPC call to the socket
 4. Time to write() the RPC call to the socket
 5. Time to write() the RPC call to the socket
 6. Time to write() the RPC call to the socket
 7. wait for replies to start arriving
 8. Time to recv() the RPC reply from the socket
 9. Time to recv() the RPC reply from the socket
 10. Time to recv() the RPC reply from the socket
 11. Time to recv() the RPC reply from the socket
 12. Time to recv() the RPC reply from the socket
 13. Time to recv() the RPC reply from the socket
 14. Time to recv() the RPC reply from the socket

Of course there's a limit to how many outstanding async calls you can
make before the event loop gets 100% busy processing the responses,
but I don't think that makes async calls worthless. Even if we had the
bulk list API calls, async calling would be useful, because it would
let VDSM fire off requests for disk, net, cpu, mem stats in parallel
from a single thread.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|