[dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths

Benjamin Marzinski bmarzins at redhat.com
Mon Aug 22 17:46:32 UTC 2022


On Mon, Aug 22, 2022 at 04:15:01PM +0000, Martin Wilck wrote:
> On Wed, 2022-08-17 at 15:48 -0500, Benjamin Marzinski wrote:
> > If there are a very large number of paths that all get checked at the
> > same time, it is possible for the path checkers to starve out other
> > vecs->lock users, like uevent processing. To avoid this, if the path
> > checkers are taking a long time, checkerloop now occasionally unlocks
> > and allows other vecs->lock waiters to run.
> > 
> > In order to make sure that path checking is always making real
> > progress,
> > checkerloop() only checks if it should unlock every 128 path checks.
> > To
> > check, it sees if there are waiters and more than a second has passed
> > since it acquired the vecs->lock. If both of these are true, it drops
> > the lock and calls nanosleep() to schedule.
> > 
> > When checkerloop() reacquires the lock, it may no longer be at the
> > correct place in the vector. While paths can be deleted from any
> > point
> > of the pathvec, they can only be added at the end. This means that
> > the
> > next path to check must be between its previous location and the
> > start
> > of the vector. To help determine the correct starting place,
> > checkerloop() marks each path as not checked at the start of each
> > checker loop. New paths start in the unchecked state. As paths are
> > checked, they are marked as such. If the checker loop is interrupted,
> > when it reacquires the lock, it will iterate backwards from the last
> > location in checked to the start of the vector. The first path it
> > finds
> > that is marked as checked must be the last path it checked.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> > ---
> >  libmultipath/structs.h |  1 +
> >  multipathd/main.c      | 82 +++++++++++++++++++++++++++++++++-------
> > --
> >  2 files changed, 67 insertions(+), 16 deletions(-)
> > 
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index a6a09441..9d4fb9c8 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -351,6 +351,7 @@ struct path {
> >         int fast_io_fail;
> >         unsigned int dev_loss;
> >         int eh_deadline;
> > +       bool is_checked;
> >         /* configlet pointers */
> >         vector hwe;
> >         struct gen_path generic_path;
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 71079113..37c04018 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2548,6 +2548,11 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >         }
> >         return 1;
> >  }
> > +enum checker_state {
> > +       CHECKER_STARTING,
> > +       CHECKER_RUNNING,
> > +       CHECKER_FINISHED,
> > +};
> >  
> >  static void *
> >  checkerloop (void *ap)
> > @@ -2555,7 +2560,6 @@ checkerloop (void *ap)
> >         struct vectors *vecs;
> >         struct path *pp;
> >         int count = 0;
> > -       unsigned int i;
> >         struct timespec last_time;
> >         struct config *conf;
> >         int foreign_tick = 0;
> > @@ -2581,8 +2585,9 @@ checkerloop (void *ap)
> >  
> >         while (1) {
> >                 struct timespec diff_time, start_time, end_time;
> > -               int num_paths = 0, strict_timing, rc = 0;
> > +               int num_paths = 0, strict_timing, rc = 0, i = 0;
> >                 unsigned int ticks = 0;
> > +               enum checker_state checker_state = CHECKER_STARTING;
> >  
> >                 if (set_config_state(DAEMON_RUNNING) !=
> > DAEMON_RUNNING)
> >                         /* daemon shutdown */
> > @@ -2603,22 +2608,67 @@ checkerloop (void *ap)
> >                 if (use_watchdog)
> >                         sd_notify(0, "WATCHDOG=1");
> >  #endif
> > +               while (checker_state != CHECKER_FINISHED) {
> > +                       unsigned int paths_checked = 0;
> > +                       struct timespec chk_start_time;
> >  
> > -               pthread_cleanup_push(cleanup_lock, &vecs->lock);
> > -               lock(&vecs->lock);
> > -               pthread_testcancel();
> > -               vector_foreach_slot (vecs->pathvec, pp, i) {
> > -                       rc = check_path(vecs, pp, ticks);
> > -                       if (rc < 0) {
> > -                               condlog(1, "%s: check_path() failed,
> > removing",
> > -                                       pp->dev);
> > -                               vector_del_slot(vecs->pathvec, i);
> > -                               free_path(pp);
> > -                               i--;
> > -                       } else
> > -                               num_paths += rc;
> > +                       pthread_cleanup_push(cleanup_lock, &vecs-
> > >lock);
> > +                       lock(&vecs->lock);
> > +                       pthread_testcancel();
> > +                       get_monotonic_time(&chk_start_time);
> > +                       if (checker_state == CHECKER_STARTING) {
> > +                               vector_foreach_slot(vecs->pathvec,
> > pp, i)
> > +                                       pp->is_checked = false;
> > +                               i = 0;
> > +                               checker_state = CHECKER_RUNNING;
> > +                       } else {
> > +                               /*
> > +                                * Paths could have been removed
> > since we
> > +                                * dropped the lock. Find the path to
> > continue
> > +                                * checking at. Since paths can be
> > removed from
> > +                                * anywhere in the vector, but can
> > only be added
> > +                                * at the end, the last checked path
> > must be
> > +                                * between its old location, and the
> > start or
> > +                                * the vector.
> > +                                */
> > +                               if (i >= VECTOR_SIZE(vecs->pathvec))
> > +                                       i = VECTOR_SIZE(vecs-
> > >pathvec) - 1;
> 
> What if VECTOR_SIZE(vecs->pathvec) == 0? Maybe you should catch that in
> the while () condition above?

I could for clarity if you want, but the code is correct as is. If
VECTOR_SIZE() is 0, then i will start at -1. This will cause the while()
loop to immediately exit, since VECTOR_SLOT() checks for i < 0. Right
after the while loop, i gets bumped up to start checking at the first
device (which of course isn't there).  It's the same logic as what
happens if the while() loop searches through the entire pathvec, and
doesn't find any checked paths.  Obviously, the empty vector case does a
bit of unnecessary work after finding out that the vector is empty, and
I could add something like

if (VECTOR_SIZE(vecs->pathvec) == 0) {
	checker_state = CHECKER_FINISHED;
	goto unlock;
}

If you'd prefer.

-Ben

> 
> Martin
> 
> 


More information about the dm-devel mailing list