[dm-devel] RFC: multipath IO multiplex

Neil Brown neilb at suse.de
Sat Nov 6 09:32:03 UTC 2010


On Fri, 5 Nov 2010 19:39:46 +0100
Lars Marowsky-Bree <lmb at novell.com> wrote:

> Hi all,
> 
> this is a topic that came up during our HA miniconference at LPC. I
> inherited the action item to code this, but before coding it, I thought
> I'd get some validation on the design.
> 
> In a cluster environment, we occasionally have time critical IO - both
> read and writes, for a mix of via-disk heartbeating, or the exchange of
> poison pills.
> 
> MPIO plays hell with this, since an IO could potentially experience very
> high latency during a path switch. Extending the timeouts to allow for
> this is reasonably impractical.
> 
> However, our IO has certain properties that make it special - we have
> rather careful patterns, they don't overlap, they are effectively single
> page/single atomic write unit, and each node effectively writes to its
> own area.
> 
> So the idea would be to, instead of relying on the active/passive access
> pattern, to send the IO down all paths in parallel - and reporting
> either the first success or the last failure.

Hi Lars,
 the only issue that occurs to me is that if you want to report the first
 success, then you need to copy the data to a private buffer before
 submitting the write.  Then wait for all writes to complete before freeing
 the buffer.  If you just return the first write the page would be unlocked
 and so could be changed will another path was still writing it out.

 Finding a way to signal 'write all paths sounds tricky.  This flag needs to
 be state of the filedescriptor, not the whole device, so it would need to be
 an fcntl rather than an ioctl.  And defining new fcntls is a lot harder
 because they need to be more generic - you cannot really make them device
 specific...
 Might it make sense to configure a range of the device where writes always
 went down all paths?  That would seem to fit with your problem description
 and might be easiest??

NeilBrown


> 
> (Clearly, this only works for active/active arrays; active/passive
> setups still may have problems.)
> 
> Doing this in user-space is somewhat icky; short of scanning the devices
> ourselves, or asking multipathd for each IO for the current list, we
> have no good way to do that. But the kernel obviously has the correct
> list at all times.
> 
> So, I think a special IO flag for block IO (ioctl, open() flag on the
> device, whatever) that would cause dm-multipath to send the IO down all
> paths (and, as mentioned, report either the last failure or first
> success), seems to be the easiest way.
> 
> How would you prefer such a flag to be implemented and passed in, and
> what do you think of the general use case?
> 
> 
> Regards,
>     Lars
> 




More information about the dm-devel mailing list