[libvirt] RFC: Improve performance of macvtap device creation

Tony Krowiak akrowiak at linux.vnet.ibm.com
Fri Oct 30 15:56:31 UTC 2015

On 10/30/2015 06:49 AM, Michal Privoznik wrote:
> On 29.10.2015 18:48, Laine Stump wrote:
>> On 10/29/2015 12:49 PM, Tony Krowiak wrote:
>>> For a guest domain defined with a large number of macvtap devices, it
>>> takes an exceedingly long time to boot the guest. In a test of a guest
>>> domain configured with 82 macvtap devices, it took over two minutes
>>> for the guest to boot. An strace of the ioctl calls during guest start
>>> up showed the SIOCGIFFLAGS ioctl literally being invoked 3,403 times.
>>> I was able to isolate the source of the ioctl calls to
>>> the*virNetDevMacVLanCreateWithVPortProfile*  function
>>> in*virnetdevmacvlan.c*. The macvtap interface name is created by
>>> looping over a counter variable, starting with zero, and appending the
>>> counter value to 'macvtap'.
>> I've wondered ever since the first time I saw that code why it was done
>> that way, and why there had never been any performance complaints.
>> Lacking any complaints, I promptly forgot about it (until the next time
>> I went past the code for some other tangentially related reason.)
>> Since you're the first to complain, you have the honor of fixing it :-)
>>> With each iteration, a call is made to*virNetDevExists*  (SIOCGIFFLAGS
>>> ioctl) to determine if a device with that name already exists, until a
>>> unique name is created. In the test case cited above, to create an
>>> interface name for the 82nd macvtap device, the*virNetDevExists*
>>> function will be called for interface names 'macvtap0' to 'macvtap80'
>>> before it is determined that 'mavtap81' can be used. So if N is the
>>> number of macvtap interfaces defined for a guest, the SIOCGIFFLAGS
>>> ioctl will be invoked (N x N + N)/2 times to find an unused macvtap
>>> device names. That's assuming only one guest is being started, who
>>> knows how many times the ioctl may have to be called in an
>>> installation running a large number of guests defined with macvtap
>>> devices.
> Not only that, but unitl c0d162c68c2f19af8d55a435a9e372da33857048 (
> contained v1.2.2~32) if two threads were starting a domain concurrently,
> they even competed with each other in that specific area of the code.
>>> I was able to reduce the amount of time for starting a guest domain
>>> defined with 82 macvtap devices from over 2 minutes to about 14
>>> seconds by keeping track of the interface name suffixes previously
>>> used. I defined two static bit maps (virBitmap), one each for macvtap
>>> and macvlan device name suffixes. When a macvtap/macvlan device is
>>> created, the index of the next clear bit (virBitmapNextClearBit) is
>>> retrieved to create the name. If an interface with that name does not
>>> exist, the device is created and the bit at the index used to create
>>> the interface name is set (virBitmapSetBit). When a macvtap/macvlan
>>> device is deleted, if the interface name has the pattern 'macvtap%d'
>>> or 'macvlan%d', the suffix is parsed into a bit index and used to
>>> clear the (virBitMapClearBit) bit in the respective bitmap.
>> This sounds fine, as long as 1) you recreate the bitmap whenever
>> libvirtd is restarted (while scanning through all the interfaces of
>> every domain; there is already code being executed in exactly the right
>> place - look for qemu_process.c:qemuProcessNotifyNets() and add
>> appropriate code inside the loop there), and 2) you retry some number of
>> times if a supposedly unused device name is actually in use (to account
>> for processes other than libvirt using the same naming convention).
> How about re-using the approach we have for virPortAllocator? We
> maintain a bitmap of ports. On acquiring new port, we try to bind() to
> it. If we succeeded, we set the corresponding bit in the bitmap. Of
> course it may happen that a port in the host is already taken but our
> bitmap does not think so. That's okay. We just leave the corresponding
> bit alone => if we would set it as used, nobody will ever unset it.
> Moreover, we will try the port next time, and it may be free.
> Moreover, the bitmap is not saved anywhere, nor restored on daemon
> restart - this could be changed though.
> So what am I saying is practically the same as Laine, just extending his
> thoughts and giving you an example how to proceed further :)
I appreciate the input. This is similar to the first solution I 
proposed, which I actually implemented and tested. It is described above.
> Michal

More information about the libvir-list mailing list