[libvirt] [PATCH 4/5] network: regain guest network connectivity after firewalld switch to nftables

Thu Jan 17 02:43:58 UTC 2019

On 1/15/19 12:54 PM, Daniel P. Berrangé wrote:
> On Tue, Jan 15, 2019 at 12:43:05PM -0500, Laine Stump wrote:
>> On 1/15/19 11:39 AM, Daniel P. Berrangé wrote:
>>> On Wed, Jan 09, 2019 at 09:57:36PM -0500, Laine Stump wrote:
>>>> From: Laine Stump<laine at redhat.com>
>>>>
>>>> In the past (when both libvirt and firewalld used iptables), if either
>>>> libvirt's rules *OR* firewalld's rules accepted a packet, it would be
>>>> accepted. This was because libvirt and firewalld rules were processed
>>>> by the same kernel hook.
>>>>
>>>> But now firewalld can use nftables for its backend, while libvirt's
>>>> firewall rules are still using iptables; iptables rules are still
>>>> processed, but at a different time during packet processing
>>>> (i.e. during a different hook) than the firewalld nftables rules. The
>>>> result is that a packet must be accepted by *BOTH* the libvirt
>>>> iptables rules *AND* the firewalld nftable rules in order to be
>>>> accepted.
>>>>
>>>> This causes pain because
>>>>
>>>> 1) libvirt always adds rules to permit DNS and DHCP (and sometimes
>>>> TFTP) from guests to the local host. But libvirt's bridges are in
>>>> firewalld's "default" zone (which is usually the zone called
>>>> "public"). The public zone allows ssh, but doesn't allow DNS, DHCP, or
>>>> TFTP. So even though libvirt's rules allow the DHCP and DNS traffic,
>>>> the firewalld rules dont, thus guests connected to libvirt's bridges
>>>> can't acquire an IP address from DHCP, nor can they make DNS queries
>>>> to the DNS server libvirt has setup on the host.
>>>>
>>>> 2) firewalld's higher level "rich rules" don't yet have the ability to
>>>> configure the acceptance of forwarded traffic (traffic that is going
>>>> somewhere beyond the host), so any traffic that needs to be forwarded
>>>> is rejected by the public zone's default "reject" policy (which
>>>> rejects all traffic in the zone not specifically allowed by the rules
>>>> in the zone, whether that traffic is destined to be forwarded or
>>>> locally received by the host).
>>>>
>>>> libvirt can't send "direct" nftables rules (firewalld only supports
>>>> that for iptables), so we can't solve this problem by just sending
>>>> explicit nftables rules instead of explicit iptables rules (which, if
>>>> it could be done, would place libvirt's rules in the same hook as
>>>> firewalld's native rules, and thus eliminate the need for packets to
>>>> be accepted by both libvirt's and firewalld's own rules).
>>>>
>>>> However, we can take advantage of a quirk in firewalld zones that have
>>>> a default policy of "accept" (meaning any packet that doesn't match a
>>>> specific rule in the zone will be *accepted*) - this default accept will
>>>> also accept forwarded traffic (not just traffic destined for the host).
>>>>
>>>> Putting each network's bridge in a new zone called "libvirt" which has
>>>> a default policy of accept will allow the forwarded traffic to pass,
>>>> but the same default accept policy that fixes forwarded traffic also
>>>> causes *all* traffic from guest to host to be accepted. To solve this
>>>> new problem, we can take advantage of a new feature in firewalld
>>>> (currently slated for firewalld-0.7.0) - priorities for rich rules -
>>>> to add a low priority rule that rejects all local traffic (but leaves
>>>> alone all forwarded traffic).
>>> Ok, so we depend on newer firewalld. Any older version using nftables
>>> will still be broken.
>>>
>>> I think it would be quite desirable if we queried the firewalld version
>>> and backend and reported a fatal error to the user if they try to
>>> start a virtual network when we know it will be broken.
>>>
>>> You can get version from /org/fedoraproject/FirewallD1
>>>
>>>       org.freedesktop.Properties.Get("org.fedoraproject.FirewallD1", "version")
>>>
>>> And backend from /org/fedoraproject/FirewallD1/config
>>>
>>>       org.freedesktop.Properties.Get("org.fedoraproject.FirewallD1.config", "FirewallBackend")
>> Depending on the firewalld version won't give us correct results, since the
>> patches to support rule priorities could be / will be backported to older
>> firewalld versions on some distros.
>>
>>
>> Of course we could make failure to set the zone to "libvirt" a fatal error -
>> that is really the most reliable proxy we have for the question "Does this
>> firewalld support rule priorities?" That's a bit draconian though, since it
>> means that "new" libvirt will fail to start networks on all distros that
>> haven't yet updated their firewalld version (or backported the necessary
>> patches). In particular, if they haven't updated firewalld, but they're
>> still using the iptables backend anyway, we shouldn't be whining (although
>> for consistency we should still set the zone to "libvirt" when possible,
>> even if the backend is iptables,)
> I feel we should enforce failure to set libvirt zone as a fatal
> error *if* we query that FirewallBackend property == nftables.
> That is a situation we know will never work correctly, without
> opening a security hole in the libvirt rules.
>
> We should also enforce failure if setting zone failed and
> version >= 0.7.0, regardless of what backend is set, as we
> expect that to always work.
>
> Essentially only case we don't want to report an error is
> in version < 0.7.0 and backend is iptables, as that is
> safe to use.
>
> This should avoid issues with distros that choose to backport.

Okay, those all make sense.

>
>>>> +<rule priority='127'>
>>>> +  <reject/>
>>>> +</rule>
>>>> +<service name='dhcp'/>
>>>> +<service name='dhcpv6'/>
>>>> +<service name='dns'/>
>>>> +<service name='ssh'/>
>>>> +<service name='tftp'/>
>>>> +</zone>
>>> I wonder if we should have an configure time check so that we do
>>> *not* install this file if built agains an old firewalld version.
>>> By unconditionally installing it we're probably causing firewalld
>>> to emit an error message in syslog which some sysadmin is almost
>>> certainly to report a bug about
>> 1) see the comment above about the unreliability of version checks.
>>
>>
>> 2) The version / build of firewalld present at build time is really
>> irrelevant. As a matter of fact, we currently don't require firewalld to be
>> present *at all* at build time (or at runtime - we just use it if it's there
>> and active, otherwise we go directly to iptables/ebtables).
> If configure installs it by default, but provides a switch, we can
> use  --with-firewall-zone / --without-firewalld-zone explicitly in
> the RPM specs. We know exactly which Fedora / RHEL versions need
> this in so can avoid installing it where we know if isn't wanted.

To make sure I'm understanding properly (because I think I misunderstood 
the 1st time I read it) you're suggesting that we not check the version 
of firewalld during configure/build, but have a configure option that 
always defaults to "with", and can be manually set to "without"?

Since the extraneous error message that you want to get rid of is one 
that's logged by firewalld when it fails to parse the libvirt zone file 
(I think we can avoid all other extraneous logs at runtime by checking 
the firewalld backend, and checking for existence of the libvirt zone), 
I guess this configure change would only be useful for indicating 
whether or not the libvirt zone file should be installed (and not really 
essential for any runtime code in  libvirt), but that's something that 
can't really be affected by configure (aside from having it modify the 
libvirt.spec file, but that's only useful for RHEL/Fedora, and I don't 
think RHEL or Fedora will be updating libvirt without updating 
firewalld). So I'm not sure there's a point to supporting 
--without_firewalld_zone in the configure.

>> 3) I actually *want* the bug reported to distro maintainers, to push them to
>> update their firewalld sooner rather than later.
> If we install it based on version where nftables supprot was first
> introduced in firewalld, rather than when it was fixed, we would
> still get useful bug reports without spamming people with errors on
> distros where nftables isn't even supported by firewalld.

Our aim is to avoid the error log caused by firewalld failing to parse 
the libvirt zone (if that failure is due to a lack of support for rule 
priorities). To do that, we need to make sure the libvirt zone file 
isn't installed on systems where firewalld lacks support for rule 
priorities. We can either do that by modifying the package contents to 
include or not include libvirt.zone, or we can do it by modifying the 
install-time scripts to copy/not copy libvirt.zone into its place.

If we modify the package contents at build time, then updating firewalld 
to a version supporting nftables+priorities would necessitate a new 
build *and* install of the libvirt packages.

If, on the other hand, we use logic in the install scripts, we wouldn't 
have to rebuild the libvirt packages, but it would still mean that an 
update of firewalld on the target host would require a reinstall of libvirt.

In both cases, if you didn't rebuild+reinstall libvirt (or reinstall in 
the latter case) after updating firewalld to a new version with 
nftables+priorities, you would end up with the situation we have now, 
where the backend is nftables and libvirt networks are broken, but there 
is no error message pointing that out. I think that is worse than 
(temporarily, until firewalld gets updated) logging one extra error log 
each time firewalld is restarted.

Hmm, but I guess if the *only* change when building/installing with 
--without_firewalld_zones was that libvirt.zone wasn't installed (i.e. 
we would still compile the code that logs a libvirt error if the backend 
is nftables but there is no libvirt zone), then I guess the failure 
wouldn't be silent. So I retract my statement about a silent failure, 
but it still would mean the libvirt packages would need to be 
rebuilt/reinstalled to get things working after a firewalld upgrade.

>> So here's an idea - how about if we do this:
>>
>>
>> Once when libvirtd is started:
>>
>>    retrieve a list of zones and check for presence of the libvirt zone. Save
>> this somewhere.
>>
>> Each time a network is started:
>>
>> if (libvirt zone doesn't exist) {
>>
>>      if (firewalldbackend == nftables) {
>>
>>         Error "Hey you!! Change the firewalld backend to nftables or update
>> firewalld!!"
>>
>>      }
>>
>> } else {
>>
>>      put the bridge in the libvirt zone (regardless of nftables/iptables)
>>
>> }
> That doesn't solve the issue with installing an invalid zone
> file for older firewalld which could generate syslog errors
> on firewalld startup.

s/could/would/ :-)

But on the positive side, it's only a single error message, logged just 
once per restart of firewalld. So it's got that going for it. Which is 
nice. </BillMurrayCaddyShack>

Since it's only a transient thing (it will only be a problem for 
somebody who upgrades to new libvirt, but keeps their firewalld at 
something older than 0.7.0), I don't think we should do anything at all 
complicated to prevent it - we'd just end up with a bunch of obscure 
code that won't be used any time after 6 months from now, and will 
continue confusing people long into the future. I think it's better in 
the end to just live with a few extra (and harmless) error messages 
during the short period when some distros have libvirt >= 5.1.0 and 
firewalld < 0.7.0, rather than having silent (i.e. no error message) 
failures if a distro updates firewalld without rebuilding libvirt.

>> (in the meantime, we haven't relied on version number checks, and a system
>> with old firewalld can still build a libvirt that supports new firewalld)
> As above I don't think we need to rely on version numbers for te compile
> time check, as long as we provide the --with/--without flags to let the
> distro vendors override it.

On one hand, it could be useful to add the 
--with/--without_firewalld_zones configure flags. On the other hand, 
that permits a distro to build a package that is guaranteed to cause a 
silent failure when firewalld is upgraded and has backend set to 
nftables, which I think is a much worse problem because it is silent. 
That makes me lean towards not even providing that option, just for the 
sake of safety.