[dm-devel] 2.6.10-rc1-udm1: multipath work in progress

Tue Nov 2 22:43:45 UTC 2004

On 2004-11-02T23:22:04, christophe varoqui <christophe.varoqui at free.fr> wrote:

> Is the following example illustrates what you have in mind ?
> 
> | pg1 | pg2 |	pg1 maps paths to ctr1, pg2 - ctr2
> ====================================================================
> | A A | A A |	paths in pg2 are marked A but are unusable
> | F F | A A |	ctr1 shuts down, ctr2 takes over, now pg2 paths
> 		are really up, maybe with a little help from
> 		pg_init_fn. Event is caught by multipathd

This is not what would happen in the model proposed by Alasdair and me,
or at least not the case which we're discussing.

This is what would happen if all paths in PG1 were really _failed_. But
that's still an interesting scenario to discuss, obviously.

> |-A -A| A A |	now you want multipathd to disable pg1 and reinstate
> 		its paths

Yes, eventually multipathd would find those paths healthy again, and
then it should reinstate them, but likely w/o causing an immediate
switch-back to the PG (for the multi-node scenario at least). This
would, as you point out correctly, be implemented by disabling PG1.

> |-A -A| F F |	so that when ctr2 shuts, kernel can switch over to pg1
> 		and pray for its paths to be up

Yes. We could fail-over ahead of time, if we _had_ to because now the
paths in PG2 failed.

> | A A |-A -A|	then for multipathd to regularize.

Yes. This is what it would eventually stabilize at in this case, or what
would happen even after a timer elapsed and multipathd caused a
switch-back to the default PG.

> The current model being :
> ====================================================================
> | A A | A A |	paths in pg2 are marked A but are unusable
> | F F | A A |	ctr1 shuts down, ctr2 takes over, now pg2 paths
> 		are really up, maybe with a little help from
> 		pg_init_fn. Event is caught by multipathd
> | A A | A A |	multipathd swaps pg1 and pg2, ctr1 paths are marked up
> 		by the table reload

The swapping has the problems Alasdair points out though. The low-memory
behaviour and the loss of all state. That we sent a superfluous pg_init
in that case wouldn't be too bad I guess.

But it would imply that multipathd had to keep more state about which PG
is the default one (or parse more status from the DM tables). In the
model with the "bypassed" PGs, the default one would always be the first
PG, even if using the other one.

This would make the mapping more readable for the admins, too.

> > [Consider the primary pg_init_fn finds the paths would be OK but
> > aren't current, so fails them all so the currently-preferred secondary can
> > be used.  But the secondary paths turn out to have genuinely failed so you
> > *do* want to use the primary after all, but you can't now.  How do you tell
> > the primary to *forcibly* use the paths?  This method has effectively
> > transferred the pg_init_fn to userspace.  
> > 
> Note I did see pg_init_fn as a best effort fn to try to activate the
> paths in a PG that is going to be used as soon as the fn returns.
> Whatever the return value.

Not true. If we try to call the pg_init_fn at a bad time, the PG we
might want to use may actually say "No, activation failed" (think during
an update or even if a path failure occured while executing the
pg_init_fn), in which case we'd have to reevaluate whether to fail the
path and try the next one, fail all paths in the PG, or to switch-over
to the other PG and try that one.

> > [I see queue_if_no_paths very much as a last resort: it's there
> > as an option for not-so-good hardware.  In any decent system there should 
> > never be no paths without catastrophic hardware failure.]
> So what is wrong with letting it be the default if it is not used at all
> for sane hardware. Seems harmless.

Because sane scenarios will want to return errors immediately after all
paths have failed.

> >  so pg_init_fn's have to be run again etc.
> Don't they run too when a disabled PG is used as a last resort ?

Yes.

> That I can't argue against.
> But in a low memory situation I feel your scheme won't bring much more
> garanties : it relies on userspace too after all.

Userspace which is already running though and locked into memory, and
thus shouldn't be affected by the low memory condition either.

Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business