[PATCH v2 2/2] util: Add phys_port_name support on virPCIGetNetName

Moshe Levi moshele at nvidia.com
Mon Jan 25 18:24:13 UTC 2021


Hi Laine, 
Sorry for the mistake in the split of the commit, and thank for fix it 😊. 
I review you change and it is better. 
I tested it on Mellanox card with kernel 5.9 and it work good with SR-IOV legacy and SR-IOV switchdev.
You have my LGTM for it. 



> -----Original Message-----
> From: Laine Stump <laine at redhat.com>
> Sent: Monday, January 25, 2021 7:23 AM
> To: libvir-list at redhat.com
> Cc: Moshe Levi <moshele at nvidia.com>; Adrian Chiris <adrianc at nvidia.com>
> Subject: Re: [PATCH v2 2/2] util: Add phys_port_name support on
> virPCIGetNetName
> 
> External email: Use caution opening links or attachments
> 
> 
> On 1/21/21 7:15 AM, Moshe Levi wrote:
> > From: Dmytro Linkin <dlinkin at nvidia.com>
> >
> > Current virPCIGetNetName() logic is to get net device name by checking
> > it's phys_port_id, if caller provide it, or by it's index (eg, by it's
> > position at sysfs net directory). This approach worked fine up until
> > linux kernel version 5.8, where NVIDIA Mellanox driver implemented
> > linking of VFs' representors to PF PCI address [1]
> >
> > This mean
> > that device's sysfs net directory will hold multiple net devices. Ex.:
> >
> > $ ls '/sys/bus/pci/devices/0000:82:00.0/net'
> > ens1f0  eth0  eth1
> >
> > In switchdev mode the PF and the VF represntors support
> phys_port_name
> > In that case there is only a single netdev on the PCI device which is
> > the PF. The other net devices are the VF representors which we want to
> > exclude. whereas in the case of using phys_port_id, there could be
> > multiple PFs, and so we have to match the phys_port_id of the VF
> >
> > virPCIGetNetName() will try to get PF name by it's index - 0. The
> > problem here is that the PF nedev entry may not be the first.
> >
> > To fix that, for switch devices, we introduce a new logic to select
> > the PF uplink netdev according to the content of phys_port_name.
> > Extend
> > virPCIGetNetName() logic work in following sequence:
> >   - filter by phys_port_id, if it's provided, (multi PF)
> >   or
> >   - filter by phys_port_name if exist (one PF)
> >   - get net device by it's index (position) in sysfs net directory.
> >
> > [1] - https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/
> >        commit/?id=123f0f53dd64b67e34142485fe866a8a581f12f1
> >
> > Co-Authored-by: Moshe Levi <moshele at nvidia.com>
> > Signed-off-by: Dmytro Linkin <dlinkin at nvidia.com>
> > Reviewed-by: Adrian Chiris <adrianc at nvidia.com>
> > ---
> >   src/util/virnetdev.c |  7 +++----
> >   src/util/virpci.c    | 38 +++++++++++++++++++++++++++++++++++++-
> >   src/util/virpci.h    |  5 +++++
> >   3 files changed, 45 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index
> > d41b967d6a..2485718b48 100644
> > --- a/src/util/virnetdev.c
> > +++ b/src/util/virnetdev.c
> > @@ -1263,7 +1263,7 @@ virNetDevGetVirtualFunctions(const char
> *pfname,
> >           }
> >
> >           if (virPCIGetNetName(pci_sysfs_device_link, 0,
> > -                             pfPhysPortID, NULL, &((*vfname)[i])) < 0) {
> > +                             pfPhysPortID, &((*vfname)[i])) < 0) {
> 
> 
> This change disappears once the incorrect chunks are removed from patch 1.
> 
> 
> >               goto cleanup;
> >           }
> >
> > @@ -1358,8 +1358,7 @@ virNetDevGetPhysicalFunction(const char
> *ifname, char **pfname)
> >           return -1;
> >
> >       if (virPCIGetNetName(physfn_sysfs_path, 0,
> > -                         vfPhysPortID,
> > -                         VIR_PF_PHYS_PORT_NAME_REGEX, pfname) < 0) {
> > +                         vfPhysPortID, pfname) < 0) {
> 
> 
> Likewise.
> 
> 
> >           return -1;
> >       }
> >
> > @@ -1422,7 +1421,7 @@ virNetDevPFGetVF(const char *pfname, int vf,
> char **vfname)
> >        * isn't bound to a netdev driver, it won't have a netdev name,
> >        * and vfname will be NULL).
> >        */
> > -    return virPCIGetNetName(virtfnSysfsPath, 0, pfPhysPortID, NULL,
> vfname);
> > +    return virPCIGetNetName(virtfnSysfsPath, 0, pfPhysPortID,
> > + vfname);
> 
> 
> Likewise
> 
> 
> >   }
> >
> >
> > diff --git a/src/util/virpci.c b/src/util/virpci.c index
> > 50fd5ef7ea..e20fb9e10b 100644
> > --- a/src/util/virpci.c
> > +++ b/src/util/virpci.c
> > @@ -2469,7 +2469,7 @@
> virPCIDeviceAddressGetSysfsFile(virPCIDeviceAddressPtr addr,
> >    * @idx: used to choose which netdev when there are several
> 
> 
> (side note: I was spelunking the code trying to figure out when this idx is
> actually set and used, and it seems like the answer might be "never".
> I should probably look into cleaning it up)
> 
> 
> >    *       (ignored if physPortID is set)
> >    * @physPortID: match this string in the netdev's phys_port_id
> > - *       (or NULL to ignore and use idx instead)
> > + *       (or NULL to ignore and use phys_port_name or idx instead)
> >    * @netname: used to return the name of the netdev
> >    *       (set to NULL (but returns success) if there is no netdev)
> >    *
> > @@ -2483,6 +2483,7 @@ virPCIGetNetName(const char
> *device_link_sysfs_path,
> >   {
> >       g_autofree char *pcidev_sysfs_net_path = NULL;
> >       g_autofree char *firstEntryName = NULL;
> > +    g_autofree char *thisPhysPortName = NULL;
> >       g_autoptr(DIR) dir = NULL;
> >       struct dirent *entry = NULL;
> >       size_t i = 0;
> > @@ -2522,7 +2523,42 @@ virPCIGetNetName(const char
> > *device_link_sysfs_path,
> >
> >                   continue;
> >               }
> > +
> >           } else {
> > +            /* Most switch devices use phys_port_name instead of
> > +             * phys_port_id.
> > +             * NOTE: VFs' representors net devices can be linked to PF's PCI
> > +             * device, which mean that there'll be multiple net devices
> > +             * instances and to get a proper net device need to match on
> > +             * specific regex.
> > +             * To get PF netdev, for ex., used following regex:
> > +             * "(p[0-9]+$)|(p[0-9]+s[0-9]+$)"
> > +             * or to get exact VF's netdev next regex is used:
> > +             * "pf0vf1$"
> > +             */
> > +            if (virNetDevGetPhysPortName(entry->d_name,
> &thisPhysPortName) < 0)
> > +                return -1;
> > +
> > +            if (thisPhysPortName) {
> > +                /* if this one doesn't match, keep looking */
> > +                if (!virStringMatch(thisPhysPortName,
> VIR_PF_PHYS_PORT_NAME_REGEX)) {
> > +                    VIR_FREE(thisPhysPortName);
> 
> 
> When I started using g_autofree and explicitly called VIR_FREE() on a
> g_autofree pointer so I could re-use it, I was told this is a bad practice and to
> avoid it. We can easily avoid it in this case by declaring thisPhysPortName
> inside the else clause where it is used (just before the line above us ^^^ that
> calls virNetDevGetPhysPortName()) - that way it will be autofreed each time
> through the loop.
> 
> 
> > +                    /* Save the first entry we find to use as a failsafe
> > +                     * in case we fail to match on regex.
> > +                     */
> > +                    if (!firstEntryName)
> > +                        firstEntryName = g_strdup(entry->d_name);
> 
> 
> All this stuff about setting firstEntryName is pointless in the else clause of "if
> (physPortID)": we will either get to the bottom of the
> while() loop and return without using firstEntryName, or we will continue
> back to the top of the loop until we run out of entries, drop out of the loop,
> and get to:
> 
> 
>    if (!physPortID)
> 
>        return 0;
> 
> Since we've already established that physPortID == NULL, that's going to take
> us out of this function before we can use firstEntryName.
> 
> 
> > +
> > +                    continue;
> > +                }
> > +            } else {
> > +                /* Save the first entry we find to use as a failsafe in case
> > +                 * phys_port_name is not supported.
> > +                 */
> > +                if (!firstEntryName)
> > +                    firstEntryName = g_strdup(entry->d_name);
> > +                continue;
> 
> 
> The else clause here is what will be taken for any sort of hardware that has
> only a single netdev listed in the "net" directory of the PCI device (so pretty
> much everything except a new generation Mellanox card, right?). I would
> have thought that in this case we should treat it as below (just increment i
> and continue)
> 
> 
> (BTW, if one of the netdevs in a PCI device's "net" directory has a
> phys_port_name, is it *always* the case that all the other netdevs listed in
> that directory also have a phys_port_name? (ie if the PF has a
> phys_port_name, will the VF representors always have a phys_port_name,
> and vice versa?) I'm assuming the answer is "yes".
> 
> 
> > +            }
> >               if (i++ < idx)
> >                   continue;
> 
> 
> After thinking about this a bit, I've come up with the attached diff (based on
> your patches - you can just add it to your local directory with "git am -3 0001-
> blah.patch"). Does that look right to you? If it does (and it works!) then I can
> squash it in and push. If you're not quite happy with it, then you can send
> another iteration. I promise I'll be much faster about responding now!! (be
> sure to Cc me so I notice it though, and maybe even ping me in IRC.)
> 
> 
> 
> 
> >           }
> > diff --git a/src/util/virpci.h b/src/util/virpci.h index
> > 43828b0a8a..9e89ede1d5 100644
> > --- a/src/util/virpci.h
> > +++ b/src/util/virpci.h
> > @@ -55,6 +55,11 @@ struct _virZPCIDeviceAddress {
> >
> >   #define VIR_PCI_DEVICE_ADDRESS_FMT "%04x:%02x:%02x.%d"
> >
> > +/* Represents format of PF's phys_port_name in switchdev mode:
> > + * 'p%u' or 'p%us%u'. New line checked since value is readed from sysfs
> file.
> > + */
> > +# define VIR_PF_PHYS_PORT_NAME_REGEX  "(p[0-9]+$)|(p[0-9]+s[0-
> 9]+$)"
> 
> 
> There should be no space between # and define here. I'm trying to
> remember if it's the cppi package that finds that, or something else.
> (that's also in my diff)
> 
> 
> > +
> >   struct _virPCIDeviceAddress {
> >       unsigned int domain;
> >       unsigned int bus;
> 





More information about the libvir-list mailing list