<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="generator" content="Osso Notes">
<title></title></head>
<body>
<p>Wouldn't it practical to bypass mpio completely on submit your io to the paths instead ?
<br>
<br>Cheers,
<br>cvaroqui
<br>
<br>----- Message d'origine -----
<br>> On 2010-11-06T11:51:02, Alasdair G Kergon <<a href="mailto:agk@redhat.com">agk@redhat.com</a>> wrote:
<br>>
<br>> Hi Neil, Alasdair,
<br>>
<br>> thanks for the feedback. Answering your points in reverse order -
<br>>
<br>> > > Might it make sense to configure a range of the device where writes
<br>> > > always went down all paths? That would seem to fit with your
<br>> > > problem description and might be easiest??
<br>> > Indeed - a persistent property of the device (even another interface
<br>> > with a different minor number) not the I/O.
<br>>
<br>> I'm not so sure that would be required though. The equivalent of our
<br>> "mkfs" tool wouldn't need this. Also, typically, this would be a
<br>> partition (kpartx) on top of a regular MPIO mapping (that we want to be
<br>> managed by multipathd).
<br>>
<br>> Handling this completely differently would complicate setup, no?
<br>>
<br>> > And what is the nature of the data being written, given that I/O to
<br>> > one path might get delayed and arrive long after it was sent,
<br>> > overwriting data sent later. Successful stale writes will always be
<br>> > recognised as such by readers - how?
<br>>
<br>> The very particular use case I am thinking of is the "poison pill" for
<br>> node-level fencing. Nodes constantly monitor their slot (using direct
<br>> IO, bypassing all caching, etc), and either can successfully read it or
<br>> commit suicide (assisted by a hardware watchdog to protect against
<br>> stalls).
<br>>
<br>> The writer knows that, once the message has been successfully written,
<br>> the target node will either have read it (and committed suicide), or
<br>> been self-fenced because of a timeout/read error.
<br>>
<br>> Allowing for the additional timeouts incurred by MPIO here really slows
<br>> this mechanism down to the point of being unusable.
<br>>
<br>> Now, even if a write was delayed - which is not very likely, it's more
<br>> likely that some of the IO will just fail if indeed one of the paths
<br>> happens to go down, and this would not resubmit it to other paths -, the
<br>> worst that could happen would be a double fence. (If it gets written
<br>> after the node has cycled once and cleared its message slot; that would
<br>> imply a significant delay already, since servers take a bit to boot.)
<br>>
<br>> For the 'heartbeat' mechanism and others (if/when we get around for
<br>> adding them), we could ignore the exact contents that have been written
<br>> and just watch for changes; worst, the node death detection will take a
<br>> bit longer.
<br>>
<br>> Basically, the thing we need to get around is the possible IO latency in
<br>> MPIO, for things like poison pill fencing ("storage-based death") or
<br>> qdisk-style plugins. I'm open for other suggestions as well.
<br>>
<br>>
<br>>
<br>> Regards,
<br>> Lars
<br>>
<br>> --
<br>> Architect Storage/HA, OPS Engineering, Novell, Inc.
<br>> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
<br>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
<br>>
<br>> --
<br>> dm-devel mailing list
<br>> <a href="mailto:dm-devel@redhat.com">dm-devel@redhat.com</a>
<br>> <a href="https://www.redhat.com/mailman/listinfo/dm-devel">https://www.redhat.com/mailman/listinfo/dm-devel</a>
<br><br></p>
</body>
</html>