[libvirt] [RFC][scale] new API for querying domains stats

Wed Jul 2 15:56:23 UTC 2014

----- Original Message -----
> From: "Michal Privoznik" <mprivozn at redhat.com>
> To: "Francesco Romani" <fromani at redhat.com>, libvir-list at redhat.com
> Sent: Tuesday, July 1, 2014 11:19:04 AM
> Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats

> > Right now we aim for a number of VM per node in the (few) hundreds, but we
> > have big plans
> > to scale much more, and to possibly reach thousands in a not so distant
> > future.
> > At the moment, we use one thread per VM to gather the VM stats (CPU,
> > network, disk),
> > and of course this obviously scales poorly.
> 
> I think this is your main problem. Why not have only one thread that
> would manage list of domains to query and issue the APIs periodically
> instead of having one thread per domain?

Indeed it is. I'm actually personally addressing this problem in VDSM.
It is mostly an inheritence of past times, when this wasn't yet a big problem.
We are moving toward a thread pool of fixed size to handle the sampling.

> > This is made only worse by the fact that VDSM is a python 2.7 application,
> > and notoriously
> > python 2.x behaves very badly with threads. We are already working to
> > improve our code,
> > but I'd like to bring the discussion here and see if and when the querying
> > API can be improved.
> >
> > We currently use these APIs for our sempling:
> >    virDomainBlockInfo
> >    virDomainGetInfo
> >    virDomainGetCPUStats
> >    virDomainBlockStats
> >    virDomainBlockStatsFlags
> >    virDomainInterfaceStats
> >    virDomainGetVcpusFlags
> >    virDomainGetMetadata
> >
> > What we'd like to have is
> >
> > * asynchronous APIs for querying domain stats
> > (https://bugzilla.redhat.com/show_bug.cgi?id=1113106)
> >    This would be just awesome. Either a single callback or a different one
> >    per call is fine
> >    (let's discuss this!).
> >    please note that we are much more concerned about thread reduction then
> >    about performance
> >    numbers. We had report of thread number becoming a real harm, while
> >    performance so far
> >    is not yet a concern
> >    (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54)
> 
> I'm not a big fan of this approach. I mean, IIRC python has this Big
> Python Lock, which effectively prevents two threads run concurrently.

It has the GIL, yes. Only one thread can run python code at any given time.
This however it is not true for extensions modules written in C which if carefully
designed (read: coded to properly release the GIL) can run concurrently.
This is one of the reasons while threading in python it is tolerated for I/O,
evne though never recommended.

AFAIK/IIRC the code the libvirt module for python allows this, so we should
be good to go.

> So while in C this would make perfect sense, it doesn't do so in python.
> The callbacks would be called from the event loop, which given how
> frequently you dump the info will block other threads. Therefore I'm
> afraid the approach would not bring any speed up, rather slow down.

I'm not sure about this and I think quite the opposite, that performance-wise
we can gain something, even though yes, all the callbacks will pile up in the
event loop. Surely this will greatly reduce the GIL battle

http://dabeaz.blogspot.it/2010/01/python-gil-visualized.html

- which is improved in python >= 3.2, but we are on 2.7 for the foreseeable future,
and will improve our thread proliferation which is an immediate and real
concern of us

- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani