[dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications.

Hannes Reinecke hare at suse.de
Wed Mar 31 07:25:57 UTC 2021


Hi Erwin,

On 3/31/21 2:22 AM, Erwin van Londen wrote:
> Hello Muneendra, benjamin,
> 
> The fpin options that are developed do have a whole plethora of options
> and do not mainly trigger paths being in a marginal state. Th mpio layer
> could utilise the various triggers like congestion and latency and not
> just use a marginal state as a decisive point. If a path is somewhat
> congested the amount of io's dispersed over these paths could just be
> reduced by a flexible margin depending on how often and which fpins are
> actually received. If for instance and fpin is recieved that an upstream
> port is throwing physical errors you may exclude is entirely from
> queueing IO's to it. If it is a latency related problem where credit
> shortages come in play you may just need to queue very small IO's to it.
> The scsi CDB will tell the size of the IO. Congestion notifications may
> just be used for potentially adding an artificial  delay to reduce the
> workload on these paths and schedule them on another.
> 
As correctly noted, FPINs come with a variety of options.
And I'm not certain we can everything correctly; a degraded path is
simple, but for congestion there is only _so_ much we can do.
The typical cause for congestion is, say, a 32G host port talking to a
16G (or even 8G) target port _and_ a 32G target port.

So the host cannot 'tune down' it's link to 8G; doing so would impact
performance on the 32G target port.
(And we would suffer reverse congestion whenever that target port sends
frames).

And throttling things on the SCSI layer only helps _so_ much, as the
real congestion is due to the speed with which the frames are sequenced
onto the wire. Which is not something we from the OS can control.

>From another POV this is arguably a fabric mis-design; so it _could_ be
alleviated by separating out the ports with lower speeds into its own
zone (or even on a separate SAN); that would trivially make the
congestion go away.

But for that the admin first should be _alerted_, and this really is my
primary goal: having FPINs showing up in the message log, to alert the
admin that his fabric is not performing well.

A second step will be to massaging FPINs into DM multipath, and have it
influencing the path priority or path status. But this is currently
under discussion how it could be integrated best.

> Not really sure what the possibilities are from a DM-Multipath
> viewpoint, but I feel if the OS options are not properly aligned with
> what the FC protocol and HBA drivers are able to provide we may miss a
> good opportunity to optimize the dispersion of IO's and improve overall
> performance. 
> 
Looking at the size of the commands is one possibility, but at this time
this presumes too much on how we _think_ FPINs will be generated.
I'd rather do some more tests to figure out under which circumstances we
can expect which type of FPINs, and then start looking for ways on how
to integrate them.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare at suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer





More information about the dm-devel mailing list