[libvirt] [PATCHv3 1/2] network: added waiting for DAD to finish for bridge address.

Maxim Perevedentsev mperevedentsev at virtuozzo.com
Tue Aug 11 08:54:52 UTC 2015



On 08/11/2015 11:14 AM, Simon Kelley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
>
>
> On 10/08/15 22:29, Laine Stump wrote:
>> On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote:
>>> This is a fix for commit
>>> db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process
>>> exits without waiting for DAD, this is dnsmasq daemon's task. So
>>> we periodically poll the kernel using netlink and check whether
>>> there are any IPv6 addresses assigned to bridge which have
>>> 'tentative' state. After DAD is finished, execution continues. I
>>> guess that is what dnsmasq was assumed to do.
>> Since the comments in our code imply that dnsmasq should be waiting
>> for DAD to complete prior to daemonizing, before pushing a fix like
>> this I'd like to find out from the dnsmasq folks if we are
>> erroneously relying on nonexistent dnsmasq behavior, or if maybe
>> there is a bug in some version of dnsmasq.
>>
>> Simon (or other dnsmasq people) - when dnsmasq is run with
>> "enable-ra", does it make sure it completes DAD prior to
>> daemonizing? Or does libvirt need to do this extra polling to
>> assure that DAD has completed? (or maybe there's some other config
>> parameter we need to add?)
>>
>>
> Dnsmasq doesn't wait for DAD to complete before returning. Internally,
> it know is DAD is still happening on an interface, as it needs to
> delay calling bind() until after it's complete. It would, therefore be
> relatively simple to add this behaviour, but it's not a complete
> solution, since new interfaces can appear _after_ dnsmasq has
> completed start-up.
>
> Having libvirt do its own checks whenever it creates an interface
> might therefore be a cleaner way of doing things, but I don't have an
> objection to changing dnsmasq behaviour if there's a good reason why
> that's not sensible.
I think this behavior will be inconsistent with the behavior present in 
dnsmasq for long, so we can create conflicts with software relying on 
current dnsmasq logic. I think if libvirt needs dad to finish then it 
should wait itself instead of modifying third-party software.
>
> Cheers,
>
> Simon.
>
>
>
>>> --- Difference to v2: Moved to virnetdev.
>>>
>>> src/libvirt_private.syms    |   1 + src/network/bridge_driver.c |
>>> 35 +++++++++- src/util/virnetdev.c        | 160
>>> ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h
>>> |   2 + 4 files changed, 197 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
>>> index 0517c24..fa9e1c1 100644 --- a/src/libvirt_private.syms +++
>>> b/src/libvirt_private.syms @@ -1776,6 +1776,7 @@
>>> virNetDevSetRcvMulti; virNetDevSetupControl; virNetDevSysfsFile;
>>> virNetDevValidateConfig; +virNetDevWaitDadFinish;
>>>
>>>
>>> # util/virnetdevbandwidth.h diff --git
>>> a/src/network/bridge_driver.c b/src/network/bridge_driver.c index
>>> 3d6721b..2172a3d 100644 --- a/src/network/bridge_driver.c +++
>>> b/src/network/bridge_driver.c @@ -2026,6 +2026,33 @@
>>> networkAddRouteToBridge(virNetworkObjPtr network, }
>>>
>>> static int +networkWaitDadFinish(virNetworkObjPtr network) +{ +
>>> virNetworkIpDefPtr ipdef; +    virSocketAddrPtr *addrs = NULL; +
>>> size_t i; +    int ret; +    for (i = 0; +         (ipdef =
>>> virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); +
>>> i++) {} + +    if (i == 0) +        return 0; +    if
>>> (VIR_ALLOC_N(addrs, i)) +        return -1; + +    for (i = 0; +
>>> (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); +
>>> i++) { +        addrs[i] = &ipdef->address; +    } + +    ret =
>>> virNetDevWaitDadFinish(addrs, i); +    VIR_FREE(addrs); +
>>> return ret; +} + +static int
>>> networkStartNetworkVirtual(virNetworkDriverStatePtr driver,
>>> virNetworkObjPtr network) { @@ -2159,7 +2186,13 @@
>>> networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if
>>> (v6present && networkStartRadvd(driver, network) < 0) goto err4;
>>>
>>> -    /* DAD has happened (dnsmasq waits for it), dnsmasq is now
>>> bound to the +    /* dnsmasq main process does not wait for DAD
>>> to complete, +     * so we need to wait for it ourselves. +
>>> */ +    if (v6present && networkWaitDadFinish(network) < 0) +
>>> goto err4; + +    /* DAD has happened, dnsmasq is now bound to
>>> the * bridge's IPv6 address, so we can now set the dummy tun
>>> down. */ if (tapfd >= 0) { diff --git a/src/util/virnetdev.c
>>> b/src/util/virnetdev.c index 1e20789..c81342a 100644 ---
>>> a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -96,6 +96,7
>>> @@ VIR_LOG_INIT("util.netdev"); # define
>>> FEATURE_BIT_IS_SET(blocks, index, field)        \
>>> (FEATURE_WORD(blocks, index, field) & FEATURE_FIELD_FLAG(index))
>>> #endif +# define IP_BUF_SIZE 4096
>>>
>>> typedef enum { VIR_MCAST_TYPE_INDEX_TOKEN, @@ -1219,6 +1220,103
>>> @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
>>>
>>> +/* return whether there is a known address with 'tentative' flag
>>> set */ +static int +virNetDevParseDadStatus(struct nlmsghdr *nlh,
>>> int len, +                        virSocketAddrPtr *addrs, size_t
>>> count) +{ +    struct ifaddrmsg *ifaddrmsg_ptr; +    unsigned int
>>> ifaddrmsg_len; +    struct rtattr *rtattr_ptr; +    size_t i; +
>>> struct in6_addr *addr; +    for (; NLMSG_OK(nlh, len); nlh =
>>> NLMSG_NEXT(nlh, len)) { +        if (NLMSG_PAYLOAD(nlh, 0) <
>>> sizeof(struct ifaddrmsg)) { +            /* Message without
>>> payload is the last one. */ +            break; +        } + +
>>> ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); +        if
>>> (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) { +            /*
>>> Not tentative: we are not interested in this entry. */ +
>>> continue; +        } + +        ifaddrmsg_len =
>>> IFA_PAYLOAD(nlh); +        rtattr_ptr = (struct rtattr *)
>>> IFA_RTA(ifaddrmsg_ptr); +        for (; RTA_OK(rtattr_ptr,
>>> ifaddrmsg_len); +            rtattr_ptr = RTA_NEXT(rtattr_ptr,
>>> ifaddrmsg_len)) { +            if (RTA_PAYLOAD(rtattr_ptr) !=
>>> sizeof(struct in6_addr)) { +                /* No address:
>>> ignore. */ +                continue; +            } + +
>>> /* We check only known addresses. */ +            for (i = 0; i <
>>> count; i++) { +                addr =
>>> &addrs[i]->data.inet6.sin6_addr; +                if
>>> (!memcmp(addr, RTA_DATA(rtattr_ptr), +
>>> sizeof(struct in6_addr))) { +                    /* We found
>>> matching tentative address. */ +                    return 1; +
>>> } +            } +        } +    } +    return 0; +} + +/* return
>>> after DAD finishes for all known IPv6 addresses or an error */
>>> +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t
>>> count) +{ +    struct nl_msg *nlmsg = NULL; +    struct ifaddrmsg
>>> ifa; +    struct nlmsghdr *resp = NULL; +    unsigned int
>>> recvbuflen; +    int ret = -1, dad = 1; + +    if (!(nlmsg =
>>> nlmsg_alloc_simple(RTM_GETADDR, +
>>> NLM_F_REQUEST | NLM_F_DUMP))) { +        virReportOOMError(); +
>>> return -1; +    } + +    memset(&ifa, 0, sizeof(ifa)); +    /*
>>> DAD is for IPv6 adresses only. */ +    ifa.ifa_family =
>>> AF_INET6; +    if (nlmsg_append(nlmsg, &ifa, sizeof(ifa),
>>> NLMSG_ALIGNTO) < 0) { +
>>> virReportError(VIR_ERR_INTERNAL_ERROR, "%s", +
>>> _("allocated netlink buffer is too small")); +        goto
>>> cleanup; +    } + +    /* Periodically query netlink until DAD
>>> finishes on all known addresses. */ +    while (dad) { +
>>> if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, +
>>> NETLINK_ROUTE, 0) < 0) +            goto cleanup; + +        if
>>> (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { +
>>> virReportError(VIR_ERR_SYSTEM_ERROR, "%s", +
>>> _("error reading DAD state information")); +            goto
>>> cleanup; +        } + +        /* Parse response. */ +        dad
>>> = virNetDevParseDadStatus(resp, recvbuflen, addrs, count); +
>>> if (dad) +            usleep(1000 * 10); + +
>>> VIR_FREE(resp); +    } +    ret = 0; + + cleanup: +
>>> VIR_FREE(resp); +    nlmsg_free(nlmsg); +    return ret; +} +
>>> #else /* defined(__linux__) && defined(HAVE_LIBNL) */
>>>
>>> int virNetDevSetIPAddress(const char *ifname, @@ -1338,6 +1436,68
>>> @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
>>>
>>> +/* return whether there is a known address with 'tentative' flag
>>> set */ +static int +virNetDevParseDadStatus(char *outbuf, +
>>> virSocketAddrPtr *addrs, size_t count) +{ +    virSocketAddr
>>> sockaddr; +    size_t i, j; +    int ret = 0; +    char
>>> **addr_strings = virStringSplit(outbuf, "\n", 0); +    for (j =
>>> 0; addr_strings[j] != NULL; ++j) { +        if
>>> (virSocketParseAddr(strings[j], &sockaddr, AF_INET6) < 0) +
>>> continue; +        for (i = 0; i < count; i++) { +            if
>>> (virSocketAddrEqual(addrs[i], &sockaddr)) { +                ret
>>> = 1; +                goto cleanup; +            } +        } +
>>> } + + cleanup: +    virStringFreeList(addr_strings); +    return
>>> ret; +} + +/* return after DAD finishes for all known IPv6
>>> addresses or an error */ +int
>>> +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count)
>>> +{ +    int ret = -1, dad = 1; +    char *outbuf = NULL; +
>>> virCommandPtr cmd = NULL; + +# ifdef IP_PATH +    while (dad) { +
>>> cmd = virCommandNew(POSIX_SHELL); +
>>> virCommandAddArgList(cmd, "-c", +
>>> IP_PATH "-6 addr show tentative | \ +
>>> awk '/inet6/{split(\\$2, a, \\\"/\\\"); \ +
>>> print a[1]}'", NULL); +        virCommandSetOutputBuffer(cmd,
>>> &outbuf); +        if (virCommandRun(cmd, NULL)) +
>>> goto cleanup; +        virCommandFree(cmd); + +        dad =
>>> virNetDevParseDadStatus(outbuf, addrs, count); +        if (dad)
>>> +            usleep(1000 * 10); + +        VIR_FREE(outbuf); +
>>> } +    ret = 0; +# else +    virReportSystemError(ENOSYS, "%s",
>>> _("Unable to check IPv6 DAD")); +# endif + + cleanup: +
>>> virCommandFree(cmd); +    VIR_FREE(outbuf); +    return ret; +}
>>> + #endif /* defined(__linux__) && defined(HAVE_LIBNL) */
>>>
>>> /** diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h
>>> index fff881c..a09b3b2 100644 --- a/src/util/virnetdev.h +++
>>> b/src/util/virnetdev.h @@ -105,6 +105,8 @@ int
>>> virNetDevClearIPAddress(const char *ifname, ATTRIBUTE_NONNULL(1)
>>> ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; int
>>> virNetDevGetIPAddress(const char *ifname, virSocketAddrPtr addr)
>>> ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2)
>>> ATTRIBUTE_RETURN_CHECK; +int
>>> virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +
>>> ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
>>>
>>>
>>> int virNetDevSetMAC(const char *ifname, -- Sincerely, Maxim
>>> Perevedentsev
>>>
>>> -- libvir-list mailing list libvir-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/libvir-list
>>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQIcBAEBCAAGBQJVya7kAAoJEBXN2mrhkTWi8IYP/RNoeivBxzgpFxgsAVptURxE
> zQYzYYBQDZefdvaNX/Zg//OjUchek8bChaMEuk2dIKpA5HWHej4d9+fI3RQjkc2P
> 5FVv4ng1o8CfdM8MZpalFYkQXPZXIKfzrOpxfcCh9IigOlJWdCJLPn9VnAUAOAul
> ylxrqo2hedbnYisPcN3Wb5LRguN/RwpNddyLO8PSpP5gt96Z+ykY6F3PLFMOmXQz
> OFLDwfU6+ppx1vFBlI3wSlN6BG/i5y7m63TzOSUIwOEE9mw4cwoqEKoU8DpgsNEE
> rjpvNs63wED8kcqVHqxMVt0VHSeADPSlVB2E7CcjqqRWksbHJIInkVJ8adM/ibPK
> /CbbfHQpaGAH0H3ke+J5P70KA6Sfo8VlDcvo2iAOT6ENuPmrUi7zwdzxuwAXEXtP
> J/oCILD+FRY00mD2rkZyjdlvfX8zsfiuL6nhOTGmF+OrDDr+qR534NyDJBdvhCfJ
> Hc75ERfbhKa7yYlt0+pSN5wZtShbKAnjHDKxOKloAu8csakDE9B753PwUbfDQIam
> vN3chMENeogNiZqsctntA7csOU8IssqU5u2XWMrGhcV4onnbleiNG/Ue/v3CzEzx
> LlNSqXONETqeU+Z3qIU+rhy42DZL89rdavYR4a5T/asge1fiWjYVdyUg8atkBExt
> 2h6k2LPUVZNjVgD6p4f/
> =edDp
> -----END PGP SIGNATURE-----

-- 
Your sincerely,
Maxim Perevedentsev




More information about the libvir-list mailing list