[libvirt] RFC: put domain's interfaces into distinct namespaces

Daniel P. Berrangé berrange at redhat.com
Mon Nov 19 16:39:41 UTC 2018


On Wed, Nov 07, 2018 at 08:48:16AM +0000, Nikolay Shirokovskiy wrote:
> Hi, all!
> 
> There is performance issue with network filters and broadcast ethernet traffic.
> If L2 segment is large enough (several thousands of VMs) then there is a lot of
> broadcast ARP traffic (about frames 100/s). As aresult on host with several hundreds
> VMs (say 300) we have kernel thread eating 100% of CPUs just for checking this traffic
> against firewall rules. The problem is if there are rules in ebtables POSTROUTING chain
> (clean-traffic is example of such filter) then when every single broadcast frame turns into
> 300, one for every distinct bridge port and then each one of these 300 is checked against
> 300 / 2 rules average to find chain for that port. As a result we have 100 * 300 * 300 / 2
> = 4.5 * 10^6 rules checks per second. Kernel does not spread this workload onto
> different CPUs and anyway this is wasting CPUs!

Yes, this is a key limitation of the traditional  ebtables/ip[6]tables commands.
There's no efficient way to associate rules with specific devices.

This is apparently solved with nftables if you setup your chains to match on
the 'netdev' family.

> The simple solution is to put rules that ACCEPT ARP traffic into POSTROUTING
> itself before any port specific chains. But this will affect non-VM ports too 
> and host itself. So can we instead make a distinct network namespace for every
> VM and put tap there, next add the bridge into the namespace too so we can apply
> ebtables rules there and insert tap into the bridge. Finally connect the bridges
> in root namespace and VM namespace by veth pair. As result in the situation
> described above each cloned frame will be cheched only againt rules for this
> very VM. The regular TCP traffic will have same benefits. On the other hand we
> need a bridge and veth pair for every VM and some CPU power to process this extra
> traffic path.

Yeah, I don't really like the idea of introducing extra devices into
the I/O path for every NIC, as it will burn extra CPU and introduce
latency.


I don't really have a particular suggestion for fixing the perf problem
offhand, other than my note about nftables supposedly allowing us to fix
this problem. RHEL-8 & Fedora 30 will both be nftables based, so it is
imminently available as a solution for libvirt, assuming it does in fact
let us solve the perf problem.

The hard thing is that we'll need some significant work in the nwfilter
driver to port it to native nft commands - just using the legacy iptbles
compat tools uses nft, but not a way that would let us get the perf
benefit IIUC.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list