[libvirt] [PATCHv2 0/9] Let libvirt manage a bridge's FDB

Wed Dec 3 17:02:11 UTC 2014

On Tue, Dec 02, 2014 at 12:08:30PM -0500, Laine Stump wrote:
> (Everyone - see the request for opinions/ideas towards the bottom)
> 
> The idea behind these patches is the following:
> 
> 1) most virtual machines only have a single MAC address behind each
> interface, and that MAC address is known by libvirt.
> 
> 2) If we (i.e. libvirt) manually add an entry to the bridge's
> forwarding database (fdb) for the MAC address associated with a port
> on the bridge, we can turn off learning and unicast_flooding for that
> port.
> 
> 3) kernels starting with 3.15 (and actually working correctly starting
> in kernel 3.17) will notice that all of a bridge's ports have flood
> and learning turned off, and in that case will turn off promiscuous
> mode on all ports. If all but one of the ports have flood/learning
> turned off, then promiscuous will be turned off on that port (and left
> on for all the other ports)
> 
> 4) When (4) can be done, there is a measurable performance
> advantage. It can also *kind of* help security, as it will prevent a
> guest from doing anything useful if it changes its MAC address (but
> won't prevent the guest from *sending* packets with a spoofed MAC
> address).
> 
> NB: These only work with a fixed MAC address, and no vlan tags set in
> the guest. Support for both of those will be coming.

So just to be clear on the functional differences between the
impls

With the current impl

 1. The bridge has a table mapping MAC <-> TAP devices
    which is initially empty
 2. Whenever a TAP device sends a packet, it causes an
    entry to be added to the MAC <-> TAP table
 3. When the bridge needs to forward a packet with a MAC
     - If MAC is unknown, it is sent to all TAP devices
     - Else it is send to the specific TAP device
 4. The physical device is always promiscuous to see all
    traffic on the LAN

An result of point 2, is that if the guest changes its
MAC address on the fly, it merely needs to send a packet
and the bridge will learn its new MAC address. That said
our firewall rules can potentially block this dynamic
change.

With the new impl

 1. The bridge has a table mapping MAC <-> TAP devices
    which is initially empty
 2. Libvirt tells the bridge what MAC address is associated
    with each TAP device added
 3. The bridge always forwards the packets to the correct
    TAP device
 4. The physical device is never promiscuous, it is
    programmed with MACs of all TAP devices

In both cases the guest can send packets with spoofed
MAC addresses. In the old code these packets would make
it onto the LAN and it can receive replies back. In the
new code these packets would make it onto the LAN but
it will never receive replies back.

At a high level the conceptual difference is whether the table
of MAC addresses & TAP devices is statically defined by libvirt
or dynamically defined on the fly.

> HERE IS THE REQUEST FOR OPINIONS/IDEAS:
> 
> This V2 of the patchset addresses several issues brought up by jferlan
> on the original series, and changes the name of the attribute from:
> 
>    promiscLinks='yes|no'
> 
> to
> 
>    fdb='learningWithFlood|managed'
> 
> I'm somewhat more happy with this new naming than the previous but
> still looking for better ideas. It is closer to describing what the
> new code really does, but "learningWithFlood" seems a bit long and
> awkward, while I have been told that "fdb" is too short and
> unrecognizeable (I will point out that 1) "fdb" is the same name used
> by iproute2's "bridge" command, and 2) another bridge option, "stp" is
> also a three letter acronym that will only be recognized by those
> familiar with configuring an L2 bridge device or watching NASCAR on
> Saturday afternoons (or whenever it's on - not a fan myself :-))

One thing to remember is that libvirt is supposed to be providing a
higher level configuration language that is independant of specific
implementations.  'fdb' is terminology specific to the Linux
bridge implementation, so I don't think that's appropriate to use
in the XML configuration.

> Here is a full list of every idea that either I or someone else has
> come up with since I started thinking about this:
> 
>  promiscLinks='yes|no'
> 
>    After initially going with this for the v1 of the patchset, I later
>    decided against it, because it doesn't describe what libvirt is
>    doing, but only a *possible* side effect on *some* of the ports
>    connected to the bridge (in practice, it only happens to the physdev
>    port).

Agreed, promiscLinks seems to be referring to only one small part
of the solution.

> 
>  fdb='auto|managed'
> 
>    I like "fdb" as the name of the attribute, because I think it really
>    gets at what libvirt is doing - it is taking over management of the
>    bridge's fdb (Forwarding Database), which ends up providing better
>    performance in several ways.
> 
>    The problem with this proposal is that the two values are kind of
>    ambiguous - it's not clear which one is using the bridge module's
>    built-in management (I had figured this would be "auto"), and which
>    is telling libvirt to manage it ("managed"). (on the other hand,
>    the first option is "ignore the issue, let the underlying system
>    handle it", vs. "libvirt should manage it", so maybe it *is* a
>    reasonable choice).
> 
>    Also, see the comment above about the perceived terseness and obscurity
>    of "fdb".
> 
>  fdb='learningWithFlood|managed'
> 
>    This alternate name was suggested by Michael Tsirkin as a way to
>    unambiguously indicate what was being done in the mode where libvirt
>    isn't involved in the fdb management. There was some criticism on
>    IRC that the name is *too* verbose, especially when contrasted with
>    "fdb".
> 
>  fdb='learning|managed'
> 
>    A suggested shortening of "learningWithFlood".
> 
>  forwardingDatabase='blah'
> 
>    A way to get around criticism of "fdb". I think this is too verbose,
>    but maybe I'm biased :-)

I don't like any of these really because they all refer to a specific
implementation choice.

> 
>  [specify each minor item that separately]
> 
>    In order to manage the fdb by itself, libvirt disables "learning"
>    and "unicast_flood" for each tap device, enables "vlan_filtering"
>    for the bridge itself, then adds an fdb entry for each tap to the
>    bridge. There was one suggestion that, rather than trying to come
>    up with a single option that says "do all of these things", we
>    should instead make each of them separately configured. The problem
>    with this is that it makes it too easy to screw up the
>    configuration such that it causes sub-par performance, or simply
>    doesn't work at all. Part of libvirt's job is making it difficult
>    to screw up (or at least easier to succeed); for example, libvirt's
>    virtual networks do a lot of things automatically - create a
>    bridge, add iptables rules for filtering and NAT, run an instance
>    of dnsmasq - over time we've offered the option of tweaking more
>    and more of the details of this setup, but the initial aim was to
>    provide something that worked with as few required (or even
>    permitted) tweaks as possible.
> 
>    I guess what I'm getting at is that I think it would be a mistake
>    to require turning on several different knobs (which individually
>    make little/no sense) in order to get the bridge into this higher
>    performing mode.

Agreed, this is even worse because it is exposing many internal
implementation details.

> 
> So - does anyone have an opinion of any of the options offered above,
> or any ideas for alternates?
>
> In the meantime, note that while the default is currently
> "learningWithFlood" (meaning that that name is never actually directly
> used/required anywhere, but is just in the RNG and the enum
> definition), the intent of the people who developed this functionality
> in the kernel is that eventually it will work so well that libvirt
> management of the fdb can silently become the default with no visible
> change in behavior.

There is a user visible change in behaviour because this new solution,
as implemented in this patch series, will no longer allow a guest to
change its MAC address on the fly.

Of course if QEMU can provide a notification to libvirt when the guest
changes its MAC address, then libvirt can update the MAC table and so
re-gain functional equivalance with the old solution. I think we would
want to be able to turn that on or off though.

This all leads me to suggest that we should use the following syntax
that is indepedant of the particular implementation choice, and instead
represents the logical behaviour of the feature.

  macTable=static|dynamic

The mode of 'static' means that the MAC address table will always
match MACs recorded in libvirt guest XML. This is the new mode
that your patches implement

The mode of 'dynamic' means that the MAC address table will be
able to dynamically update itself as the guest changes MAC addresses.
In the near term, this will use the traditional bridge learning
mode. In the long term, if we can get MAC change notifications from
QEMU, we can switch this over to use the new bridge features.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|