[dm-devel] 2.6.10-rc1-udm1: multipath work in progress

Tue Nov 2 22:50:13 UTC 2004

> > Let the kernel fail them ... as soon as the primary PG paths are
> > exhausted, it will switch to the secondary PG and an event will cause
> > multipathd to reconfigure the table. The secondary will become primary,
> > and failed paths will come back up, grouped in a low prio PG.
> 
> But they are not failed! *whine* They'd be useable again if we sent them
> a initialization command, very likely.
> 
> And this is what we'd have to do if we didn't have an healthy paths left
> in the other PG(s). But which we couldn't if we had failed them.
> 
"unit not ready" is failed (F-path-state) from the device-mapper point
of view, as far as I can see.

> > We can failback already, with the current design.
> 
> No, because you just failed all paths.
> 
Not really : as I said above "failed paths will come back up, grouped in
a low prio PG". So failover works.

In fact I'm not speculating here : I verified that on multiple scenarii
in the lab (controlers power cycles, portdisable/portenable cycles,
combinations) and under heavy load (70 MB/s on 12 active paths, 12
ghosts, 4 LU)

> I'm not saying anything about "predictive" behaviour. I'm saying we need
> to _react_ correctly. If we get a "unit not ready" response, we switch
> to the other PG. Why we got it, I'm not trying to predict; it's just
> that "something" switched the PG away from under us. And because we
> don't know what did it, we follow it's lead.
> 
> Failing the paths would be wrong. They are not failed. They are healthy.
> They are just not used right now. We _could_ force a switch-back if we
> absolutely had to.
> 
> Whether this is expressed by switching the orders of PGs around or by
> having a bypassed flag, now that's something we could argue about, but
> about the principle need for this distinction I'm very convinced.
> 
I guess we agree indeed. I *am* arguing about the bypass flag.

> The switch-back (to the default PG, if one is such defined) should not
> automatically be initiated by the kernel, but by user-space (ie,
> multipathd) after a certain time of paths being available again and if
> we were the node which originally switched the PG. (ie, not the one
> which just followed a lead.) This would catch most scenarios.
> 
Agreed

> In more complex scenarios, the switching of LUs from one PG to the other
> might even be coordinated by smarter cluster software.
> 
> In a single node scenario, it's easier. You can switch back to the
> default PG as soon as a path there is healthy again (but even then,
> giving it some time to settle may be wise).
> 
Yes

regards,
-- 
christophe varoqui <christophe.varoqui at free.fr>