[dm-devel] RFC for multipath queue_if_no_path timeout.

Frank Mayhar fmayhar at google.com
Fri Sep 27 16:37:13 UTC 2013


On Fri, 2013-09-27 at 15:52 +0200, Hannes Reinecke wrote:
> On 09/27/2013 10:37 AM, Alasdair G Kergon wrote:
> > But this still dodges the fundamental problem:
> > 
> >   What is the right value to use for the timeout?
> >   - How long should you wait for a path to (re)appear?
> >     - In the current model, reinstating a path is a userspace 
> >       responsibility.
> > 
> And with my proposed patch it would still be userspace which is
> setting the timeout.
> Currently, no_path_retry is not a proper measure anyway, as it's
> depending on the time multipathd takes to complete a path check
> round. Which depends on the number of device, the state of those etc.
> 
> > The timeout, as proposed, is being used in two conflicting ways:
> >   - How long to wait for path recovery when all paths went down
> 
> That would be set via the new 'no_path_timeout' feature, which would
> be set instead of the (multipath-internal) no_path_retry
> setting.

Yes, this matches our setup as well.

> >   - How long to wait when the system locks without enough free
> >     memory even to reinstate a path (because of broken userspace
> >     code) before having multipath fail queued I/O in a desperate
> >     attempt at releasing memory to assist recovery
> Do we even handle that case currently?

My understanding is that the current code doesn't, no, but if it does I
would love to know how.

> Methinks this is precisely the use-case this is supposed to address.

Yes, exactly.

> When currently 'no_path_retry' is set _and_ we're running under a
> low-mem condition there is a quite large likelyhood that the
> multipath daemon will be killed by the OOM-killer or not able to
> send any dm messages down to the kernel, as the latter most likely
> require some memory allocations.
> 
> So in the current 'no_path_retry' scenario the maps would have been
> created with 'queue_if_no_path', and the daemon would have to reset
> the 'queue_if_no_path' flag if the no_path_retry value expires.
> Which it might not be able to do so due to the above scenario.
> 
> So with the proposed 'no_path_timeout' we would enable the dm-mpath
> module to terminate all outstanding I/O, irrespective on all
> userland conditions. Which seems like an improvement to me ...

And to me, which is why I went in this direction in the first place.  I
could see no dependable way to deal with outside of the kernel; if I
had, I would have taken it, since userspace changes are _much_ easier
for us to deal with than kernel changes.
-- 
Frank Mayhar
310-460-4042




More information about the dm-devel mailing list