[libvirt RFC PATCH] network: NetworkManager script to monitor for conflicts with new interfaces
Daniel P. Berrangé
berrange at redhat.com
Tue May 5 09:09:49 UTC 2020
On Mon, May 04, 2020 at 12:04:45PM -0400, Laine Stump wrote:
> There has been a problem for several years with libvirt's default
> network conflicting with the host network connection on new installs,
> particularly when the "host" is actually virtual machine that is,
> itself, connected to the libvirt default network on its respective
> host. If the two default networks use the same subnet, and if the
> nested host's libvirtd happens to start its network before the system
> networking connects to the toplevel host, then network connectivity to
> the guest is just silently non-working.
>
> We've tried several things over the years to eliminate this problem,
> including:
>
> 1) Checking for conflicting routes/interfaces during installation of
> the libvirt-daemon-config-network package (the one containing the
> default network config file) which tries different subnets until it
> finds one that doesn't conflict. This helps in some cases, but fails
> on 2 points: a) if the installation environment is different from the
> environment where the system is actually run (e.g. a "live CD" image
> of a distro, which is built in a container by the distro maintainers,
> then could be run in any number of places, and b) it modifies the
> installed files during the rpm installation post_install script, which
> is now discouraged because people building container images don't like
> dealing with that.
>
> 2) When starting a libvirt network, we now check for any route or
> interface that conflicts with the new network's IP's and routes. This
> doesn't fix the problem, but does notify the user of the problem *as
> long as libvirt starts its networks after the host has started its
> system networks*.
>
> Neither of these help in the situation where libvirt's networks are
> started first. The script in this patch uses a NetworkManager facility
> (dispatcher.d scripts, which are run whenever any interface is
> brought up or down) to check for any libvirt networks that conflict
> with the new NetworkManager interface, and if it finds a conflict it
> logs a message and destroys the libvirt network. So as with case (2)
> above, it doesn't solve the problem, but it does make it more obvious,
> and also makes sure the host networking isn't borked, so you can still
> ssh into the machine and fix it.
>
> There are a few caveats/misgivings with the script I've come up with,
> which cause me to post it as just an RFC:
>
> 1) It's written in python, and uses libxml2
>
> When I first sat down to make this script (following a suggestion by
> danpb during an IRC discussion), I immediately put #!/bin/bash at the
> top of the file, because it's supposed to be a script. But then I
> remembered that we're trying to narrow down the usage of languages in
> libvirt, and anyway it would be nice to be able to properly parse the
> XML to get the IP addresses/netmasks/prefixes. So I instead wrote it
> in python. This makes for a less ugly program, but it does mean that
> the daemon-config-network package is now dependent on python3-libxml
> (and python3 of course), which pulls in a bunch of other packages.
> Everybody's dev systems already have these packages and their
> dependencies installed (since both are required to build) and many
> other users systems already have them (they are required by
> virt-install and virt-manager, among others), including the Fedora
> live CD, for example. But not *all* systems have them. There has been
> a lot of work to reduce package bloat caused by pulling in more and
> more packages, so I'm reluctant to contribute to that. On the other
> hand, someone who is looking to minimize their system footprint will
> also probably not be installing the libvirt default network, so maybe
> it's acceptable (I'm leaning towards "not", but want to know if anyone
> else has a different opinion)
I think this is a non-issue, since you've bundled the script into
the libvirt-daemon-config-network RPM. It would have been more of
an issue if in the libvirt-daemon-driver-network RPM as that would
have made it mandatory. Even then though, I really wouldn't be very
bothered by it.
> 2) As delivered, it checks for conflicts with *all* libvirt
> networks.
>
> While that was fun to write, and in theory is the right
> thing to do, in practice it may be / probably is overkill. As we
> discussed in IRC, the main time when this is a problem is just after
> the first boot of a new OS / libvirt installation, and the only
> libvirt network that's going to exist at that time is the default
> network. So while having a loop that scans all libvirt networks is
> more complete, in almost all cases it is just wasting time. (I did
> provide a bool at the top of the script that can be modified to tell
> it to only check the default network, but of course the script is
> installed in /usr/lib, and it's not proper to modify files in
> /usr/lib, so that isn't a real-world solution, but more a way to allow
> people (i.e. "me") right now to test out the different behaviors.
I figure checking all networks is fine.
> 3) This only works if NetworkManager is enabled.
>
> I brought this up when danpb first mentioned the idea, and he rightly
> pointed out that every report of this problem we've had has been from
> the first boot of a new install on a system that used Network Manager
> - anybody using any other method of networking config on their host
> has had to manually intervene to get it setup, but NM tries to do
> everything auto-magically. So while there may still be a very
> occasional rare occurence of a silent networking failure even with
> this script installed, this should eliminate 99% of the cases.
I guess there could be another tool that is not NetworkManager, but
does the same kind of job as NetworkManager which would want this
functionality.
> So, the questions I have are:
>
> 1) Do we want to allow adding the dependency on python3-libxml to the
> libvirt-daemon-config-network package in order to avoid adding a bash
> script. Or should I just redo it in bash
I think this is a non-issue
> 2) Do we want something that checks all active networks as this does,
> or just the default network (and if the latter, should we make it
> easily modifiable to check all networks, or just strip it down as much
> as possible)
We could have a config param in /etc/libvirt/something.conf. I'm
fine with checking all networks by default though.
> 3) I could eliminate the libxml2 dependency by just grepping the xml
> for "<address .*ip=". Is reduced correctness (which is probably 0%
> different in practice) worth the reduced package dependencies?
We're making functional changes to the host, so its important that
we are robust in our code. As such I strongly prefer that we're
correctly parsing XML instead of regex'ing data out of it.
> As a single question: on a sliding scale of "this script" ----->
> "simple bash script" where do we want to land?
The right answer to any problem is never a shell script :-)
> diff --git a/libvirt.spec.in b/libvirt.spec.in
> index 6abf97df85..e40068b2be 100644
> --- a/libvirt.spec.in
> +++ b/libvirt.spec.in
> @@ -496,6 +496,7 @@ Requires: libvirt-libs = %{version}-%{release}
> Requires: dnsmasq >= 2.41
> Requires: radvd
> Requires: iptables
> +Requires: python3-libxml2
>
> %description daemon-driver-network
> The network driver plugin for the libvirtd daemon, providing
> @@ -1630,6 +1631,7 @@ exit 0
> %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/
> %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper
> %{_libdir}/%{name}/connection-driver/libvirt_driver_network.so
> +%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
I was sceptical about /usr/lib being right, but it seems that NM
really does want this dir.
>
> %if %{with_firewalld_zone}
> %{_prefix}/lib/firewalld/zones/libvirt.xml
> diff --git a/src/network/Makefile.inc.am b/src/network/Makefile.inc.am
> index 196a30e16c..d58e09f88b 100644
> --- a/src/network/Makefile.inc.am
> +++ b/src/network/Makefile.inc.am
> @@ -170,6 +170,9 @@ install-data-network:
> ( cd $(DESTDIR)$(confdir)/qemu/networks/autostart && \
> rm -f default.xml && \
> $(LN_S) ../default.xml default.xml )
> + $(MKDIR_P) "$(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d"
> + $(INSTALL_DATA) $(srcdir)/network/nm-dispatcher-check-nets.py \
> + $(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
> if WITH_FIREWALLD_ZONE
> $(MKDIR_P) "$(DESTDIR)$(prefix)/lib/firewalld/zones"
> $(INSTALL_DATA) $(srcdir)/network/libvirt.zone \
> @@ -189,7 +192,10 @@ endif WITH_FIREWALLD_ZONE
>
> endif WITH_NETWORK
>
> -EXTRA_DIST += network/default.xml network/libvirt.zone
> +EXTRA_DIST += \
> + network/default.xml \
> + network/nm-dispatcher-check-nets.py \
> + network/libvirt.zone
>
> .PHONY: \
> install-data-network \
> diff --git a/src/network/nm-dispatcher-check-nets.py b/src/network/nm-dispatcher-check-nets.py
> new file mode 100755
> index 0000000000..9225a08ea1
> --- /dev/null
> +++ b/src/network/nm-dispatcher-check-nets.py
> @@ -0,0 +1,182 @@
> +#!/usr/bin/env python3
> +#
> +# Copyright (C) 2012-2019 Red Hat, Inc.
2020
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library. If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +import libvirt
> +import sys
> +import os
> +import libxml2
> +from ipaddress import ip_network
> +
> +# This script should be installed in
> +# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
> +# called by NetworkManager every time a network interface is taken up
> +# or down. When a new network comes up, it checks the libvirt virtual
> +# networks to see if their IP address(es) (including any static
> +# routes) are in conflict with the IP address(es) (or static routes)
> +# of the newly added interface. If so, the libvirt network is
> +# disabled. It is assumed that the user will notice that their guests
> +# no longer have network connectvity (and/or the message logged by
> +# this script), see that the network has been disabled, and then
> +# realize the conflict when they try to restart it.
> +#
> +# set checkDefaultOnly=False to check *all* active libvirt networks
> +# for conflicts with the new interface. Set to True to check only the
> +# libvirt default network (since most networks other than the default
> +# network are added post-install at a time when all of the hosts other
> +# networks are already active, it may be overkill to check all of the
> +# libvirt networks for conflict here (and instead just add more
> +# needless overheard to bringing up a new host interface).
> +#
> +checkDefaultOnly = False
> +
> +# NB: since this file is installed in /usr/lib, it really shouldn't be
> +# modified by the user, but instead should be copied to
> +# /etc/NetworkManager/dispatcher.d, where it will override the copy in
> +# /usr/lib. Even that isn't a proper solution though - if we're going
> +# to actually have this config knob, perhaps we should check for it in
> +# the environment, and if someone wants to modify it they can put a
> +# short script in /etc that exports and environment variable and then
> +# calls this script? Just thinking out loud here...
> +
> +def checkconflict(conn, netname, hostnets, hostif):
> +
> + # ignore if the network has been brought down or removed since we
> + # got the list
> + try:
> + net = conn.networkLookupByName(netname)
> + except libvirt.libvirtError:
> + return
> +
> + if not net.isActive():
> + return
> +
> + xml = net.XMLDesc()
> + doc = libxml2.parseDoc(xml)
> + ctx = doc.xpathNewContext()
> +
> + # see if NetworkManager is informing us that this libvirt network
> + # itself is coming online
> + bridge = ctx.xpathEval("/network/bridge/@name");
> + if bridge and bridge[0].content == hostif:
> + return
Is it worth filtering based on the @type attribute too or are
we ok relying on ?
> +
> + # check *all* the addresses of this network
> + addrs = ctx.xpathEval("/network/*[@address]")
Shouldn't we match /network/ip/ here ? This wildcard also
matches /network/route/ - is that desirable ?
> + for ip in addrs:
> + ctx.setContextNode(ip)
> + address = ctx.xpathEval("@address")
> + prefix = ctx.xpathEval("@prefix")
> + netmask = ctx.xpathEval("@netmask")
> +
> + if not (address and len(address[0].content)):
> + continue
> +
> + addrstr = address[0].content
> + if not (prefix and len(prefix[0].content)):
> + # check for a netmask
> + if not (netmask and len(netmask[0].content)):
> + # this element has address, but no prefix or netmask
> + # probably it is <mac address ...> so we can ignore it
> + continue
> + # convert netmask to prefix
> + prefixstr = str(ip_network("0.0.0.0/%s" % netmask[0].content).prefixlen)
> + else:
> + prefixstr = prefix[0].content
> +
> + virtnetaddress = ip_network("%s/%s" % (addrstr, prefixstr), strict = False)
> + # print("network %s address %s" % (netname, str(virtnetaddress)))
> + for hostnet in hostnets:
> + if virtnetaddress == hostnet:
> + # There is a conflict with this libvirt network and the specified
> + # net, so we need to disable the libvirt network
> + print("Conflict with host net %s when starting interface %s - bringing down libvirt network '%s'"
> + % (str(hostnet), hostif, netname))
> + try:
> + net.destroy()
> + except libvirt.libvirtError:
> + print("Failed to destroy network %s" % netname)
> + return
> + return
> +
> +
> +def addHostNets(hostnets, countenv, addrenv):
> +
> + count = os.getenv(countenv)
> + if not count or count == 0:
> + return
> +
> + for num in range(int(count)):
> + addrstr = os.getenv("%s_%d" % (addrenv, num))
> + if not addrstr or addrstr == "":
> + continue
> +
> + net = ip_network(addrstr.split()[0], strict=False)
> + if net:
> + hostnets.append(net)
> + return
> +
> +
> +############################################################
> +
> +if sys.argv[2] != "up":
> + sys.exit(0)
> +
> +hostif = sys.argv[1]
> +
> +try:
> + conn = libvirt.open(None)
> +except libvirt.libvirtError:
> + print('Failed to open connection to the hypervisor')
> + sys.exit(0)
> +
> +if checkDefaultOnly:
> + nets = []
> + net = conn.networkLookupByName("default")
> + if not (net and net.isActive()):
> + sys.exit(0)
> + nets.append(net)
> +else:
> + nets = conn.listAllNetworks(libvirt.VIR_CONNECT_LIST_NETWORKS_ACTIVE)
> + if not nets:
> + sys.exit(0)
> +
> +# We have at least one active network. Build a list of all network
> +# routes added by the new interface, and compare that list to the list
> +# of all networks used by each active libvirt network. If any are an
> +# exact match, then we have a conflict and need to shut down the
> +# libvirt network to avoid killing host networking.
> +
> +# When NetworkManager calls scripts in /etc/NetworkManager/dispatcher.d
> +# it will have all IP addresses and routes associated with the interface
> +# that is going up or down in the following environment variables:
> +#
> +# IP4_NUM_ADDRESSES - number of IPv4 addresses
> +# IP4_ADDRESS_N - one variable for each address, starting at _0
> +# IP4_NUM_ROUTES - number of IPv5 routes
> +# IP4_ROUTE_N - one for each route, starting at _0
> +# (replace "IP4" with "IP6" and repeat)
Oh that's very nice - much easier than probing for this info ourselves.
> +#
> +hostnets = []
> +addHostNets(hostnets, "IP4_NUM_ADDRESSES", "IP4_ADDRESS");
> +addHostNets(hostnets, "IP4_NUM_ROUTES", "IP4_ROUTE");
> +addHostNets(hostnets, "IP6_NUM_ADDRESSES", "IP6_ADDRESS");
> +addHostNets(hostnets, "IP6_NUM_ROUTES", "IP6_ROUTE");
> +
> +for net in nets:
> +
> + checkconflict(conn, net.name(), hostnets, hostif)
> --
> 2.25.4
>
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
More information about the libvir-list
mailing list