[dm-devel] [PATCH 31/35] multipathd: uxlsnr: add idle notification

Martin Wilck mwilck at suse.com
Thu Sep 16 15:54:14 UTC 2021


On Thu, 2021-09-16 at 10:06 -0500, Benjamin Marzinski wrote:
> On Thu, Sep 16, 2021 at 10:54:19AM +0200, Martin Wilck wrote:
> > On Wed, 2021-09-15 at 23:14 -0500, Benjamin Marzinski wrote:
> > > On Fri, Sep 10, 2021 at 01:41:16PM +0200, mwilck at suse.com wrote:
> > > > From: Martin Wilck <mwilck at suse.com>
> > > > 
> > > > The previous patches added the state machine and the timeout
> > > > handling,
> > > > but there was no wakeup mechanism for the uxlsnr for cases
> > > > where
> > > > client connections were waiting for the vecs lock.
> > > > 
> > > > This patch uses the previously introduced wakeup mechanism of
> > > > struct mutex_lock for this purpose. Processes which unlock the
> > > > "global" vecs lock send an event in an eventfd which the uxlsnr
> > > > loop is polling for.
> > > > 
> > > > As we are now woken up for servicing client handlers that don't
> > > > wait for input but for the lock, we need to set up the pollfds
> > > > differently, and iterate over all clients when handling events,
> > > > not only over the ones that are receiving. The hangup handling
> > > > is changed, too. We have to look at every client, even if one
> > > > has
> > > > hung up. Note that I don't take client_lock for the loop in
> > > > uxsock_listen(), it's not necessary and will be removed
> > > > elsewhere
> > > > in a follow-up patch.
> > > > 
> > > > With this in place, the lock need not be taken in
> > > > execute_handler()
> > > > any more. The uxlsnr only ever calls trylock() on the vecs
> > > > lock,
> > > > avoiding any waiting for other threads to finish.
> > > > 
> > > > Signed-off-by: Martin Wilck <mwilck at suse.com>
> > > > ---
> > > >  multipathd/uxlsnr.c | 211 ++++++++++++++++++++++++++++++------
> > > > ------
> > > > --
> > > >  1 file changed, 143 insertions(+), 68 deletions(-)
> > > > 
> > 
> > > 
> > > I do worry that if there are, for instance, a lot of uevents
> > > coming in,
> > > this could starve the uxlsnr thread, since other threads could be
> > > grabbing and releasing the vecs lock, but if it's usually being
> > > held,
> > > then the uxlsnr thread might never try to grab it when it's free,
> > > and
> > > it
> > > will keep losing its place in line. Also, every time that the
> > > vecs lock
> > > is dropped between ppoll() calls, a wakeup will get triggered,
> > > even if
> > > the lock was grabbed by something else before the ppoll thread
> > > runs.
> > 
> > I've thought about this too. It's true that the ppoll ->
> > pthread_mutex_trylock() sequence will never acquire the lock if
> > some
> > other thread calls lock() at the same time.
> > 
> > If multiple processes call lock(), the "winner" of the lock is
> > random.
> > Thus in a way this change actually adds some predictablity: the
> > uxlsnr
> > will step back if some other process is actively trying to grab the
> > lock. IMO that the right thing to do in almost all situations.
> > 
> > We don't need to worry about "thundering herd" issues because the
> > number of threads that might wait on the lock is rather small. In
> > the
> > worst case, 3 threads (checker, dmevents handler and uevent
> > dispatcher,
> > plus the uxlsnr in ppoll()) wait for the lock at the same time.
> > Usually
> > one of them will have it grabbed. On systems that lack dmevent
> > polling,
> > the number of waiter threads may be higher, but AFAICS it's a very
> > rare
> > condition to have hundreds of dmevents delivered to different maps
> > simultaneously, and if it happens, it's probably correct to have
> > them
> > serviced quickly.
> > 
> > The uevent dispatcher doesn't hold the lock, it's taken and
> > released
> > for every event handled. Thus uxlsnr has a real chance to jump in
> > between uevents. The same holds for the dmevents thread, it takes
> > the
> > lock separately for every map affected. The only piece of code that
> > holds the lock for an extended period of time (except
> > reconfigure(),
> > where it's unavoidable) is the path checker (that's bad, and next
> > on
> > the todo list).
> > 
> > The really "important" commands (shutdown, reconfigure) don't take
> > the
> > lock and return immediately; the lock is no issue for them. I don't
> > see
> > any other cli command that needs to be served before uevents or dm
> > events.
> > 
> > I haven't been able to test this on huge configurations with 1000s
> > of
> > LUNs, but I tested with artificial delays in checker loop, uevent
> > handlers, and dmevent handler, and lots of clients querying the
> > daemon
> > in parallel, and saw that clients were handled very nicely. Some
> > timeouts are inevitable (e.g. if the checker simply holds the lock
> > longer than the uxsock_timeout), but that is no regression.
> > 
> > Bottom line: I believe that because this patch reduces the busy-
> > wait
> > time, clients will be served more reliably and more quickly than
> > before
> > (more precisely: both average and standard deviation of the service
> > delay will be improved wrt before, and timeouts occur less
> > frequently).
> > I encourage everyone to experiment and see if reality shows that
> > I'm
> > wrong.
> > 
> > > I suppose the only way to deal with that would be to move the
> > > locking
> > > commands to a list handled by a separate thread, so that it could
> > > block
> > > without stalling the non-locking commands.
> > 
> > Not sure if I understand correctly, just in case: non-locking
> > commands
> > are never stalled with my patch.
> 
> I realize. I was saying that you could avoid starvation while still
> allowing non-locking commands to complete by moving the locking
> commands
> to a seperate thread, which did block on the lock. I didn't consider
> a
> ticketing system. Ideally, the checker loop would have the lowest
> priority, Since it isn't responding to any event, and ususally is
> just
> verifiying that nothing has changed.  But you do make a good point
> that
> when we are getting a lot of events, and the uxlsnr loop has a chance
> of
> getting starved, we probably want to prioritize the event handling
> anyway.
> 

I have also thought about using additional threads for handling cli
commands. One could either use a single thread, similar to the udev
listener/dispatcher pair (your suggestion IIUC), or one thread per
(blocking) client.

Moving client handling into separate thread(s) avoids the complexity of
the state machine and the eventfd-based wakeup. But on the back side,
it introduces new multithreading-related complexity (of which we
already have our fair share). Client tasks running lock(&vecs->lock) in
order to serve commands like "multipathd show paths" might now starve
event handling, which would be worse than vice versa, IMO.

Eventually, I found the idea of the poll/wakeup loop with no additional
threads more appealing, and more suitable for the task. But I admit
that it's a matter of personal taste. I tend to try to use pthreads as
little as possible ;-).

So how do we proceed? 

Regards,
Martin






More information about the dm-devel mailing list