[dm-devel] [LSF/MM ATTEND][LSF/MM TOPIC] Multipath redesign
Hannes Reinecke
hare at suse.de
Fri Jan 15 07:12:02 UTC 2016
On 01/14/2016 08:09 PM, Bart Van Assche wrote:
> On 01/13/2016 11:25 PM, Hannes Reinecke wrote:
>> On 01/13/2016 06:52 PM, Benjamin Marzinski wrote:
>>> On Wed, Jan 13, 2016 at 10:10:43AM +0100, Hannes Reinecke wrote:
>>>> c) implement block or scsi events whenever a remote port becomes
>>>> unavailable. This removes the need of the 'path_checker'
>>>> functionality in multipath-tools.
>>>
>>> I'm not convinced that we will be able to find out when paths
>>> come back
>>> online in all cases without some sort of actual polling. Again,
>>> I'd love
>>> this to be simpler, but asking all the types of storage we plan to
>>> support to notify us when they are up and down may not be realistic.
>>
>> Currently we have three main transports: FC, iSCSI, and SAS.
>
> Hello Hannes,
>
> Since several years the Linux SRP initiator driver also has reliable
> and efficient H.A. support. The IB spec supports port state change
> notifications. But whether or not port state information affects the
> path state should be configurable. Several IB users wouldn't like it
> if port state information would affect the path state because the
> time during which a port is down can be shorter than the time during
> which an IB HCA keeps retrying to send a packet.
>
Oooh, but of course I've forgotten SRP. Sorry, Bart; it's just not
on my radar (what with me having no Infiniband equipment to speak of
...)
But the above really sounds similar to the dev_loss_tmo mechanism we
have on FC. Maybe it's worth looking into if we could have a similar
mechanism on SRP.
The point here is that (on FC) we have the following flow of events:
Path loss
-> start dev_loss_tmo
-> rport set to 'blocked'
-> RSCN received
-> move to final rport state (online or gone)
-> unblock rport
-> stop dev_loss_tmo (if rport is online) or
-> dev_loss_tmo fires and removes rport
atm we're being notified once the port is moved to the final state,
as that's when I/O continues or is being aborted and we're getting
the I/O completion back.
With path events we could react to the actual path loss, and
redirect I/O to another path directly when the path loss occurs.
But this really is a matter of policy; it might be that the path
switch is taking long then the path interruption.
So this needs to be evaluated properly.
But at least we'll be notified allowing us to _do_ these kind of test.
ATM we don't really have a chance to do that.
I'm very willing to look at SRP to see if we can improve things there.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare at suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
More information about the dm-devel
mailing list