[dm-devel] [RFC PATCH 0/4] Make prio timeouts work like checkers

Martin Wilck martin.wilck at suse.com
Tue Aug 29 20:34:39 UTC 2023


On Fri, 2023-07-28 at 14:05 -0500, Benjamin Marzinski wrote:
> This patchset changes how prioritizers set their timeouts, to make
> them
> match how the checker functions work, and also cleans up some minor
> timeout issues. I did this to make out timeouts consistent, but if
> someone has a good reason to object to that, I don't feel
> strongly that it's necessary, and I can resend just the bugfixes.
> 

I don't object the idea, quite the contrary. But I would prefer a
different solution.

IMO we should treat the "io_timeout" as a path property, and add a
field  in "struct path" for it. It would be initialized from 
conf->checker_timeout, or if that's unset, from sysfs, and finally,
DEF_TIMEOUT. By reading the sysfs value, we'd be able to accomodate
different properties for different devices, but we'd not re-read this
value repeatedly like we're doing now. IMO that would be more
consistent with what we do for other device properties.

We currently pass the timeout value down the call stack all the way
from pathinfo() like this (for alua):

pathinfo()
  get_prio()
     prio_getprio()
       p->getprio()
         get_alua_info()
            get_target_port_group()
               do_inquiry() (*)
                 do_inquiry_sg()
            get_asymmetric_access_state()
               do_rtpg() (*)
               
With the exception of the functions marked by (*), all these functions
obtain a "struct path" argument, too. IIUC, the main reason we're doing
this is to avoid stalled getprio() calls for paths that are down
(bb935d4 ("libmultipath: change failed path prio timeout")).

IMO it would make more sense to remove the "timeout" arguments from
these functions, and just determine the timeout where it's needed. I
don't think that's a layering violation; functions that receive a
"struct path" can also handle PATH_DOWN. Thus we could write

int get_prio_timeout_ms(const struct path *pp) 
{
      if (pp->state == PATH_DOWN)
            return 10;
      else
            return pp->io_timeout * 1000;
}

and use this function as far down the stack as we can.

Furthermore, to improve code readability and avoid issues like in 3/4
and 4, I think we should call all variables and fields that take
millisecond values "timeout_ms".

Thoughts?

Martin



More information about the dm-devel mailing list