[libvirt] dhcp6, radvd, ip6tables, etc. (update)

Tue Oct 30 18:28:51 UTC 2012

On 10/30/2012 11:58 AM, Gene Czarcinski wrote:
> On 10/29/2012 04:08 PM, Laine Stump wrote:
>> On 10/29/2012 08:26 AM, Gene Czarcinski wrote:
>>> On 10/27/2012 03:18 PM, Gene Czarcinski wrote:
>>>> OK, I have the basic implementation for libvirt support of dhcp6. Let
>>>> me say again that 98% of the work was already done.  There is still a
>>>> bunch of work today which includes writing some tests, understanding
>>>> how things such as bootp, dhcp-host, etc. should be supported with
>>>> dhcp6, as well as the items I discuss below.
>>>>
>>>> 1.  Right now, the only way that dhcp6 is in effect is if there is no
>>>> dhcp4 range definition.  This will be fixed/expanded so that, at a
>>>> minimum, you can have both a dhcp4 and dhcp6 on the same interface.
>>>> However, it appears to be easier to just pass to dnsmasq ANY/EVERY
>>>> dhcp4 range or dhcp6 range defined in the xml.
>>>>
>>>> Comments?  Any input on which approach to use or avoid?
>>> For the current situation, the implementation is for one (the first)
>>> IPv4 dhcp and one (the first) IPv6 dhcp.  This introduces enough
>>> little gotchas that need to be worked out.
>> I think that is the proper thing to do for now. As discussed earlier,
>> before supporting dhcp on multiple subnets of the same protocol (ipv6 vs
>> ipv4) we would need to decide why and how we want to do that - IPs
>> assigned from different subnets need to be matched with the IP address
>> of that subnet, and it will take a more complicated dnsmasq commandline
>> to do that, iirc.
> I cannot think of a good reason to have multiple IPv4 or IPv6
> dhcp-range specification.  Some day, someone will come up with a good
> reason but, for right now, I believe that one IPv4 and one IPv6
> dhcp-range specifications is one of those "good enough" answers.
>>
>>>> 2.  I have modified radvd so both stateful (dhcp6) and stateless
>>>> (SLAAC) addressing is supported with radvd for the default route.
>>>> This is done on an interface basis (that is the way it works).  So if
>>>> any dhcp6 range is specified, then stateful is used.  The way this is
>>>> implemented will make it easy to add some tests verifying that the
>>>> configuration parameters are working.  I intend this to be an
>>>> expansion to networkxml2argvtest since it has the xml specification
>>>> files which determine both dnsmasq and radvd configuration parameters.
>>>>
>>> NC ... working fine.
>>>> 3. After completing what I thought was code that should result in a
>>>> guest getting dhcp6 addresses, it was not working.  Once more it took
>>>> me a little time to realize that ip6tables rules were blocking it. [I
>>>> have been down this path before, you would think I would realize the
>>>> problem sooner.]
>>>>
>>>> 3a. In looking over the ip6tables rules, I saw a whole bunch of
>>>> additions at the top of the INPUT chain which were accepts for
>>>> udp/tcp port 53.  In looking at the code in bridge_driver.c, I found
>>>> that, every time a network device was started, 3 FORWARD rules and 2
>>>> INPUT rules were added, but, when the network device was destroyed,
>>>> only the 3 FORWARD rules were removed.  I believe this is a bug (but
>>>> not high priority) and I will be submitting a separate patch to fix
>>>> this.
>>>>
>>>> 3b. There are two different approaches for the rule which allows the
>>>> dhcp6 server to work.  I could add (actually insert) one rule to the
>>>> INPUT chain which accepted the packet if it is "-d ff02::1:2 "--dport
>>>> 547".  Or, I could add (insert) a rule specifying "-i virbr__" for
>>>> every IPv6 device which would be removed when the device was
>>>> destroyed.
>>> OBE - I chose the approach of adding (and removing) a rule per
>>> interface.  The rule adds "--dport 547" but does NOT specify "-d
>>> ff02::1:2".
>> I haven't looked at how dhcp6 works, but if its anything like dhcp4, the
>> IP address is irrelevant and shouldn't be included in the rule. As long
>> as your rule specifies both the interface and port, that should be fine
>> (take a look at the rules already being added for dhcp4) (and no, I have
>> absolutely no idea why we add a rule to allow *tcp* on the dhcp port.
>> It's just been that way since the first day I set eyes on the code).
> Well, ff02::1:2 does have some meaning in dhcp6.
>
> From what I have seen by "well behaved" clients is that the client
> always uses port 546 and the server always uses port 547.  But,
> dnsmasq have some comments/code which indicates that not all clients
> are "well behaved."
>
> In dhcp6, a little four dhcpv6 dance is performed to establish the
> clients address:
>
> 1. dhcpv6 solicit:  from=fe80::client:546  to=ff02::1:2:547
> 2. dhcpv6 advertise:  from=fe80::server:547  to=fe80::client:546
> 3. dhcpv6 request:  from=fe80::client:546  to=ff02::1:2:547
> 4. dhcpv6 reply:  from=fe80::server:547  to=fe80::client:546
>
> Or, in other words: (1) need dhcpv6, (2) I serve it, (3) OK, give me
> one, and (4) here it is.

Ah, so that address has a special meaning/function.

>
> Since dnsmasq does its own packet filtering and with bind-interfaces
> having a real meaning, it all works (assuming that radvd has the right
> configuration).
>>
>>
>>> This works With the radvd configuration and a dhcp-range specified for
>>> a ipv6 subnet, a guest will get a dhcp6 address and RA default route.
>> Interesting - so both radvd and dnsmasq are involved, correct?
> Yes, why not?  

No reason. Just curious.

> Sometime in the future this should be reconsidered and either
> everything is done by dnsmasq or it is at least an option.

Yeah, it couldn't be mandatory, at least not now, since there are still
lots of people using dnsmasq versions that don't have all the required
capabilities.

>   I must say that having dnsmasq do everything does have appeal ...
> one less dependency.
>>
>>>> 4.  After getting all of this working to my satisfaction, my next
>>>> mountain to climb is VM ... it really does not like network xml
>>>> definitions which include a dhcp-range for an ipv6 definition.
>>>>
>>>> Comments?
>>>>
>>>> NOTE:  I am implementing all of this assuming that my previous
>>>> patches have been accepted ... the ones for creating a dnsmasq
>>>> conf-file for parameters rather than using the dnsmasq command-line.
>> I have no problem with the "convert from long commandline to conf file"
>> patch except for the bit that points to a "conf directory" where user
>> supplied conf files can be added. Aside from that part needing to be in
>> a spearate patch, if we're going to add that kind of configurability, we
>> need to do it in a way that will allow us to easily see that the user is
>> playing outside the fence (otherwise we spend a lot of time chasing
>> "bugs" that end up being caused by user-supplied options).
> Originally, I wanted the conf-dir so that I could pop in/out some
> configuration changes that would happen when dnsmasq re-read the
> configuration.  Well, it does not work that way and dnsmasq has to be
> restarted for some of the more interesting changes.  Given this, I
> believe that conf-dir serves no useful purpose and should be removed
> and I will remove it and resubmit the patch.
>>
>> Because we're in freeze right now I haven't spent a lot of time
>> discussing that, but planned to send a message about it when I get a
>> minute.
> Right now I am working on getting dhcpv6 functional and, while what I
> have works, there is still more to do.
>>
>>>> I am sure that someone could spend the time refitting the dhcp6
>>>> patches to the old code but why get aggravated?  If you folks do not
>>>> want to do things that way, fine, please say so.  But if it is going
>>>> to be accepted, then I would like some indication of this.
>>> 5. As far as I can tell (or at least this is for dnsmasq),
>>> "dhcp-no-override", "enable-tftp", "tftp-root=", and "dhcp-boot=" are
>>> all IPv4 only and thus ignored for IPv6 in bridge_driver.  I have not
>>> looked to see what network_conf.c does.
>> "what network_conf.c does"? Well, it of course doesn't deal directly
>> with those options, but the config that feeds into some of those options
>> is parsed in virNetworkIPParseXML(), and is only done if the <ip>
>> element is ipv4. But then you've already seen that code if you have dhcp
>> working for ipv6 - the <dhcp> element is also only parsed for ipv6. The
>> format-side code doesn't have that extra check; I guess I figured that
>> if there was no way to configure ipv6 with dhcp or tftp, it was safe to
>> assume any ip element with dhcp or tftp info was ipv4 anyway.
> I have now dived into network_conf.c and a little into dnsmasq.c. Yet
> again I was surprised because most of what was needed for dhcpv6 was a
> little tweaking here and there.

I'd like to think that's because we've tried to do things in a manner
that anticipates future expansions, but some of it may be pure luck :-)

>
> To support dhcp-host for IPv6, I did assumed that for IPv6 no MAC
> address would be specified since it does not have a defined meaning in
> DHCPv6.  Therefore, in dnsmasq.c/hostsfileAdd(), if the mac==NULL, I
> use the IPv6 format of <hostname>,[<ipv6-addr>]  whereas for IPv4 it
> is either MAC,<ipv4-addr>,<hostname. or MAC,<ipv4-addr>.
>
> This way most of the code works as is.  Dnsmasq has lots of options
> and different ways that dhcp-host= can be specified, but this is
> simple and I know it works.
>>
>>> 6.  Handling of the info for addn-hosts file and the dhcp-hostsfile.
>>> This currently works because things are forced so that one and only
>>> one IPv4 dhcp definition will be handled.  With the addition of IPv6
>>> dhcp, things fall apart.
>>>
>>> 6a. addn-hosts:  The addn-hosts file is similar to the /etc/hosts file
>>> in both form and function.  The <dns>-<host> specification is done on
>>> an interface bases and, thus, the processing of the data and creation
>>> of the file should only be done once.
>>>
>>> 6b. dhcp-hostsfile (dhcp-host=):  This needs to be done at least for
>>> every ip definition that is processed for dhcp.  Initially, this will
>>> be for dhcp4 only until I can figure out how to do it for dhcp6.
>>>
>>> 6c. Thus, networkBuildDnsmasqHostsfile() needs to be split into two
>>> functions [one for addn-hosts and one for dhcp-hosts]. Additionally,
>>> all the functions which call dnsmasqSave() need to be reworked
>>> appropriately.
>>
>> I've actually never liked the "dnsmasqContext" concept, as it seems like
>> overkill and has conceptual problems such as what you've described. I
>> would be just as happy with replacements that were simpler and easier to
>> deal with. I think part of what complicates dnsmasq.[hc] is that it's in
>> the util directory, so it isn't allowed to understand the contents of
>> virNetworkDef, and must instead be sent the list of hosts in a simpler
>> format. If, instead, there was a file src/network/bridge_dnsmasq.[hc],
>> that could have functions that took virNetworkDef as an arg, and just
>> immediately return a string (or, in a separate function, write to a
>> file). That should simplify calling, and writing tests. And existing
>> dnsmasq-related functions in bridge_driver.c could be moved there as
>> well, reducing clutter in bridge_driver.c. (Of course I'm saying all
>> this without ever seriously considering it, just talking off the cuff,
>> so I may be completely wrong :-)
> Right now I have fixed things up so they work.  I would like to leave
> this as an exercise for someone else [or at least a later time].
>
> Besides, isn't bridge_driver.c pretty much for dnsmasq only?

Right now, yes. It's always possible that an alternative could come
along, of course...

>>> 7. So far, the only things I have done involving the xml specification
>>> is to enable <dhcp> for IPv6.  However, the  xml to specify a dns
>>> addn-hosts appears, IMHO, to be overly verbose and complicated.
>> It is made that way because a single IP address may have multiple
>> hostnames associated with it, and we want to avoid having multiple
>> methods of describing the same thing. What you propose in the next
>> couple sentences was already proposed, tried, and rejected when dns host
>> support was originally added.
> OK.
>>
>>>    So, while allowing the current xml to be valid, I suggest adding an
>>> alternate form for which is similar to that used for dhcp-host.  An
>>> example is "<host ip='1.2.3.4' name='one' />"
>>>
>> There are a few places in libvirt's XML where the same thing can be
>> expressed in two different ways, but that is only done when necessary
>> because the existing XML is unable to completely describe the new
>> functionality but backward compatibility is required. It creates all
>> sorts of problems when formatting the config back into XML though (which
>> of the two do you choose? Or do you do both? Either of these is a bad
>> answer), therefore we definitely don't want to do that except in cases
>> where it is absolutely necessary; this isn't one of those cases.
> This is why I have tried to tweak and bend things so it works (and
> because it mostly worked before).
>
> And now, as the saying goes, one more thing.
>
> I now realize that I am going to need to get into virsh net-update
> since I am adding things to the xml specification and net-update will
> need to differentiate between dhcp4 and dhcp6 changes.
>
> Another thought that occurs to me is whether there has any
> consideration been given having a "virsh net-restart" which would just
> restart dnsmasq and radvd.  Typing stuff in for the command line of
> net-update is a little prone to typos.

You can always put them in (temporary) files :-)

>   Wouldn't having net-edit and net-restart do what is intended for
> net-update.  Maybe there is a way to have net-update do the equivalent
> of net-edit/net-restart.  For example, if you only did "virsh
> net-update <network>" it would do it.

My first proposal for doing a live update of network config was to
simply enhance virNetworkDefine with a flag that meant "take effect
immediately", but this was voted down as being too open-ended and
difficult to keep track of what had and hadn't changed. I also proposed
an API that would use xpath expressions to specify which part of the
network was being updated, thus allowing arbitrary change of any piece
of the network config; also voted down for some of the same reasons (and
some others, e.g. we would have an API that *looked* like it could
specify any change, but it could actually encounter problems - if you're
interested you should look up the mail threads, just search for
virNetworkDefine and virNetworkUpdate.

What I would like to do, however, is to get the network code and the
domain interface code interconnected well enough that when a network is
destroyed, all the interfaces connected to it have their link set down,
and anytime a network is started, a search would be done through all
interfaces on all active domains to reconnect them to the network. This
would have the same effect.
>
> BTW, as I mentioned in another message, net-update for <dns> <host>
> does not work.

Yeah, that's because it wasn't on the "required" list for the initial
pass and got left out, and then I forgot that fact when I was telling
you about it. I have some other patches pending that will make it
simpler to implement, so I'll probably post a patch for it sometime
shortly after 1.0.0 is released.