[libvirt] [PATCH 0/4] improve virConnectListAllInterfaces()

Guido Günther agx at sigxcpu.org
Sat Sep 26 11:19:32 UTC 2015


On Fri, Sep 25, 2015 at 01:48:41PM -0400, Laine Stump wrote:
> On 09/25/2015 01:27 PM, Daniel P. Berrange wrote:
> >On Fri, Sep 25, 2015 at 05:22:30PM +0100, Daniel P. Berrange wrote:
> >>On Fri, Sep 25, 2015 at 11:13:52AM -0400, Laine Stump wrote:
> >>>There's a bit of background about this here:
> >>>
> >>>https://www.redhat.com/archives/augeas-devel/2015-September/msg00001.html
> >>>
> >>>In short, virt-manager is calling the virInterface APIs and that ties
> >>>up a libvirt thread (and CPU core) for a very long time on hosts that
> >>>have a large number of interfaces. These patches don't cure the
> >>>problem (I don't know that there really is a cure other than "Don't DO
> >>>that!"), but they do fix a couple of bugs I found while investigating,
> >>>and make a substantial improvement in the amount of time used by
> >>>virConnectListAllInterfaces().
> >>>
> >>>One thing that I wondered about while investigating this - a big use
> >>>of CPU by virConnectListAllInterfaces() comes from the need to
> >>>retrieve the MAC address of every interface. The MAC addresses are
> >>>both
> >>>
> >>>1) returned to the caller in the interface objects and
> >>>
> >>>2) sent to the policykit ACL checking to decide which interfaces to include in
> >>>the list.
> >>>
> >>>I'm 90% confident that
> >>>
> >>>1) most callers don't really care that they're getting the MAC address
> >>>along with interface name (virt-manager, for example, follows up with
> >>>a virInterfaceGetXMLDesc() anyway)), and
> >>>
> >>>2) there is not even a single host *anywhere* that is using libvirt
> >>>policykit ACLs to limit the list of host interfaces viewable by a
> >>>client.
> >>>
> >>>So we could add a flag to not return MAC addresses, which would allow
> >>>cutting down the time to list all devices to something < 1
> >>>second). But this would be at the expense of no longer having the
> >>>possibility to limit the list with policykit according to MAC
> >>>address. Since all host interface information is available to all
> >>>users via the file system, for example, I don't see this as a huge
> >>>issue, but it would change behavior, so I don't feel comfortable doing
> >>>it. I don't like the idea of a single API call taking > 1 minute to
> >>>return either, though. Does anyone have an opinion about this?
> >>Ultimately 500 interfaces, each ifcfg-XXX file 300 bytes in size
> >>on average is only 150 KB of data. Given the amount of data we
> >>are consuming, here I think it is reasonable to expect we can
> >>process than in a tiny fraction of a second. So there's clearly
> >>a serious algorithmic / data structure flaw here if its taking
> >>minutes.
> >>
> >>By the sounds of the thread you quote, its in augeas itself, so I
> >>think we really need to focus on addressing that root cause as a
> >>priority rather than try to work around it.
> >>
> >>As a side note, we might consider adding new API to netcf so that
> >>we can fetch the entire interface set + macs in one api call to
> >>netcf, though I doubt it'd chance performance that much.
> >So, I instrumented the netcf and augeas code to checking timings.
> 
> What did you use? I tried using perf and oprofile, but all I could get them
> to tell me was that a ton of time was being spent in strcmp(), so either it
> couldn't figure out who was the caller due to missing stack frame pointers,
> or I just didn't know the right commandline options. (The last time I did
> any serious profiling I used some custom code (written by someone else at a
> previous employer) that massaged xml format output from oprofile. A lot has
> changed since then.)
> 
> >The aug_get calls time less than a millisecond, as do the various
> >other calls. I found the bulk of the time is actually coming from
> >the netcf function "get_augeas", which in turns call "aug_load"
> >for every single damn netcf function call.
> 
> I remember David Lutterkort talking about exactly that problem several years
> ago and *thought* I remembered that he had put something into augeas to only
> reread the files if they had changed. Has my memory failed me again? Or is
> augeas doing something and netcf just isn't taking advantage of it?

It's at least in the releas notes:

0.7.3 - 2010-08-06
    aug_load: only reparse files that have actually changed; greatly
    speeds up reloading

Cheers,
 -- Guido

> 
> >Either we need to stop loading congfig files on every fnuction
> >call in netcf, or we need to add a netcf bulk data API call,
> >so that libvirt can load /all/ the data it needs in 1 single
> >API call.
> 
> I much prefer (1) :-)
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 




More information about the libvir-list mailing list