[dm-devel] RFC: multipath IO multiplex

Christophe Varoqui christophe.varoqui at gmail.com
Sun Nov 7 10:30:49 UTC 2010


Wouldn't it practical to bypass mpio completely on submit your io to the paths instead ?

Cheers,
cvaroqui

----- Message d'origine -----
> On 2010-11-06T11:51:02, Alasdair G Kergon <agk at redhat.com> wrote:
> 
> Hi Neil, Alasdair,
> 
> thanks for the feedback. Answering your points in reverse order -
> 
> > > Might it make sense to configure a range of the device where writes
> > > always went down all paths?   That would seem to fit with your
> > > problem description and might be easiest??
> > Indeed - a persistent property of the device (even another interface
> > with a different minor number) not the I/O.
> 
> I'm not so sure that would be required though. The equivalent of our
> "mkfs" tool wouldn't need this. Also, typically, this would be a
> partition (kpartx) on top of a regular MPIO mapping (that we want to be
> managed by multipathd).
> 
> Handling this completely differently would complicate setup, no?
> 
> > And what is the nature of the data being written, given that I/O to
> > one path might get delayed and arrive long after it was sent,
> > overwriting data sent later.   Successful stale writes will always be
> > recognised as such by readers - how?
> 
> The very particular use case I am thinking of is the "poison pill" for
> node-level fencing. Nodes constantly monitor their slot (using direct
> IO, bypassing all caching, etc), and either can successfully read it or
> commit suicide (assisted by a hardware watchdog to protect against
> stalls).
> 
> The writer knows that, once the message has been successfully written,
> the target node will either have read it (and committed suicide), or
> been self-fenced because of a timeout/read error.
> 
> Allowing for the additional timeouts incurred by MPIO here really slows
> this mechanism down to the point of being unusable.
> 
> Now, even if a write was delayed - which is not very likely, it's more
> likely that some of the IO will just fail if indeed one of the paths
> happens to go down, and this would not resubmit it to other paths -, the
> worst that could happen would be a double fence. (If it gets written
> after the node has cycled once and cleared its message slot; that would
> imply a significant delay already, since servers take a bit to boot.)
> 
> For the 'heartbeat' mechanism and others (if/when we get around for
> adding them), we could ignore the exact contents that have been written
> and just watch for changes; worst, the node death detection will take a
> bit longer.
> 
> Basically, the thing we need to get around is the possible IO latency in
> MPIO, for things like poison pill fencing ("storage-based death") or
> qdisk-style plugins. I'm open for other suggestions as well.
> 
> 
> 
> Regards,
>         Lars
> 
> -- 
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20101107/d974dd02/attachment.htm>


More information about the dm-devel mailing list