[PATCH 2/2] Avoid unnecessary error messages handling udev events

Mark Asselstine mark.asselstine at windriver.com
Mon Apr 20 13:54:40 UTC 2020


On Monday, April 20, 2020 9:30:56 A.M. EDT Michal Privoznik wrote:
> On 4/20/20 3:09 PM, Mark Asselstine wrote:
> > On Monday, April 20, 2020 6:11:25 A.M. EDT Michal Privoznik wrote:
> >> On 4/16/20 5:57 PM, Mark Asselstine wrote:
> >>> The udev monitor thread "udevEventHandleThread()" will lag the
> >>> actual/real view of devices in sysfs as it serially processes udev
> >>> monitor events. So for instance if you were to run the following cmd
> >>> to create a new veth pair and rename one of the veth endpoints
> >>> 
> >>> you might see the following monitor events and real world that looks
> >>> like
> >>> 
> >>>                                        time
> >>> 			              |    
> >>> 			              |    create v0 sysfs entry
> >>> 
> >>> wake udevEventHandleThread            |    create v1 sysfs entry
> >>> udev_monitor_receive_device(v1-add)   |    move v0 sysfs to v2
> >>> udevHandleOneDevice(v1)               |
> >>> udev_monitor_receive_device(v0-add)   |
> >>> udevHandleOneDevice(v0)               | <--- error msgs in
> >>> virNetDevGetLinkInfo() udev_monitor_receive_device(v2-move)  |      as
> >>> v0
> >>> no longer exists udevHandleOneDevice(v2)               |
> >>> 
> >>>                                        \/
> >>> 
> >>> As you can see the changes in sysfs can take place well before we get
> >>> to act on the events in the udevEventHandleThread(), so by the time we
> >>> get around to processing the v0 add event, the sysfs entry has been
> >>> moved to v2.
> >>> 
> >>> To work around this we check if the sysfs entry is valid before
> >>> attempting to read it and don't bother trying to read link info if
> >>> not. This is safe since we will never read sysfs entries earlier than
> >>> it existing, ie. if the entry is not there it has either been removed
> >>> in the time since we enumerated the device or something bigger is
> >>> busted, in either case, no sysfs entry, no link info. In the case
> >>> described above we will eventually get the link info as we work
> >>> through the queue of monitor events and get to the 'move' event.
> >>> 
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1557902
> >>> 
> >>> Signed-off-by: Mark Asselstine <mark.asselstine at windriver.com>
> >>> ---
> >>> 
> >>>    src/util/virnetdev.c | 11 +++++++++++
> >>>    1 file changed, 11 insertions(+)
> >>> 
> >>> diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c
> >>> index b465bdac2e..bf256cbe35 100644
> >>> --- a/src/util/virnetdev.c
> >>> +++ b/src/util/virnetdev.c
> >>> @@ -2438,6 +2438,17 @@ virNetDevGetLinkInfo(const char *ifname,
> >>> 
> >>>        if (virNetDevSysfsFile(&path, ifname, "operstate") < 0)
> >>>        
> >>>            goto cleanup;
> >>> 
> >>> +    /* The device may have been removed or moved by the time we got
> >>> here.
> >>> +     * Obviously attempting to get LinkInfo on a no longer existing
> >>> device
> >>> +     * is useless, so stop processing. If we got here via the udev
> >>> monitor
> >>> +     * a remove or move event will follow and we will be able to get
> >>> valid
> >>> +     * LinkInfo at that time */
> >>> +    if (!virFileExists(path)) {
> >>> +      VIR_WARN("The interface '%s' was removed before we could query
> >>> it.",
> >>> +	       ifname);
> >> 
> >> This is not aligned.
> > 
> > Should I send a V2 or can this be correct when merged?
> 
> No need, I've fixed it (also changed goto cleanup to return, since I've
> merged a patch that made 'cleanup' label go away this morning) and pushed.
> 
> >>> +        goto cleanup;
> >>> +    }
> >> 
> >> But I have another idea to consider. We could return a different error
> >> value here (say -2) and then have the caller decide what to do with it
> >> (e.g. print the warning).
> > 
> > I had thought about propagating the error so the caller could possibly
> > wrap
> > the warning message in more context but in the end I decided to call it
> > out at the issue site avoiding having each caller essentially reproducing
> > essentially the same code.
> 
> Yeah, what convinced me was that non-Linux version of this function
> simply prints a debug message and returns 0. I haven't realized that
> earlier, sorry.
> 
> Michal

No problem. Thanks for taking the patches and making the necessary tweaks.

Mark







More information about the libvir-list mailing list