[dm-devel] [RFC PATCH 0/6] allowing path checking to be interrupted.

Martin Wilck martin.wilck at suse.com
Fri Aug 12 15:44:51 UTC 2022


On Sat, 2022-07-30 at 00:12 -0500, Benjamin Marzinski wrote:
> When there are a huge number of paths (> 10000) The about of time
> that
> the checkerloop can hold the vecs lock for while checking the paths
> can get to be large enough that it starves other vecs lock users.  If
> path checking takes long enough, it's possible that uxlsnr threads
> will
> never run. To deal with this, this patchset makes it possible to drop
> the vecs lock while checking the paths, and then reacquire it and
> continue with the next path to check.
> 
> My choice of only checking if there are waiters every 128 paths
> checked
> and only interrupting if path checking has taken more than a second
> are
> arbitrary. I didn't want to slow down path checking in the common
> case
> where this isn't an issue, and I wanted to avoid path checking
> getting
> starved by other vecs->lock users. Having the checkerloop wait for
> 10000
> nsec was based on my own testing with a setup using 4K multipath
> devies
> with 4 paths each. This was almost always long enough for the uevent
> or
> uxlsnr client to grab the vecs lock, but I'm not sure how dependent
> this
> is on details of the system. For instance with my setup in never took
> more than 20 seconds to check the paths. and usually, a looping
> through
> all the paths took well under 10 seconds, most often under 5. I would
> only occasionally run into situations where a uxlsnr client would
> time
> out.

Thank you for tackling this. 

Side note: I have been pondering about it, too, and my thoughts are
rather orthogonal to what you did. As we are using the async checkers
most of the time, my idea is to drop the 1ms synchronous wait after
starting the TUR checker. Obviously, waiting 1ms for every paths in a
set of 10000 paths takes 10 seconds.

We could start the async checker for all paths first (not setting the
state to PATH_PENDING yet), then perhaps wait for a few ms, and start
checking the checker threads for all threads. That should speed up the
checker loop enormously.

Your changes would be helpful nonetheless.

> 
> Benjamin Marzinski (6):
>   multipathd: Use regular pthread_mutex_t for waiter_lock
>   multipathd: track waiters for mutex_lock
>   multipathd: Occasionally allow waiters to interrupt checking paths
>   multipathd: allow uxlsnr clients to interrupt checking paths
>   multipathd: fix uxlsnr timeout
>   multipathd: Don't check if timespec.tv_sec is zero
> 
>  libmultipath/lock.h    |  16 +++++
>  libmultipath/structs.h |   1 +
>  multipathd/main.c      | 144 +++++++++++++++++++++++++--------------
> --
>  multipathd/uxlsnr.c    |  23 +++++--
>  multipathd/uxlsnr.h    |   1 +
>  multipathd/waiter.c    |  14 ++--
>  6 files changed, 132 insertions(+), 67 deletions(-)
> 

For the series, except patch 2 and 3:
Reviewed-by: Martin Wilck <mwilck at suse.com>




More information about the dm-devel mailing list