[dm-devel] Fwd: how do i push my changes in dm layer to main stream

muneendra kumar muneendra737 at gmail.com
Thu Dec 15 09:39:34 UTC 2016


Hello,

I'm working on device-mapper multipath (dm-multipath).

This patch set adds a new hook for device-mapper in deciding the health of the

Of the multipath which helps in getting the deterministic Application
IO throughput.



This patch set is preliminary tested on active-active 2 paths storage.

But the patch set still needs work and is not ready for inclusion.

I'm posting it because I'd like to get comments about high-level

design before going further in details.







This patch set should be applied on top of 3.10.0 #18





====================================================================

Background

=-=-=-=-=-=



•        “Sick but not Dead” MPIO Path

‒       Path goes into Failed state because of path IO error as seen
by DM driver

‒       When the multipath daemon issues TUR command  finds health of
the failed path is good, makes the same path into Active state

‒       Path repeatedly toggles between Failed and Active Path States

•        DM IO is retried on path where we are hitting multiple errors.

•        Causing erratic (non-deterministic) Application IO throughput



The current existing DM layer doesn't consider the amount of errors to
decide the health of the path.

Since the failed path is becoming active immediately when the tur
command succeeds the end user will be in a

Assumption that all the multipaths are in good state.

When we run some of the field tests with this scenario we saw a
non-deterministic io throughput









=====================================================================

Design Overview

=-=-=-=-=-=-=-=-=



•        Deterministically bring the path to “Faulty” state

‒       Configure per-DM device data with

•        IO error threshold and time window for the error threshold to be hit

‒       Declare a path Faulty when error threshold is hit within the
configured time window

‒       Place the path in the failed state for a predefined time
configured by the administrator

 using the config file

‒       Even though multipath daemon validates the path using TUR
command which succeeds

 and tries to re-instantiate the path ignore the re-instantiate of the
path for a predefined time if the err threshold is hit.

•        Give time for Administrator to correct the “Sick But not
Dead” path and bring Path to Active

•        Auto Enablement of a Faulty Path to Active State after a
fixed time duration (given as a config data for each DM)

‒       Admin can set the Deterministic MPIO behavior on per-DM device basis

-         It implies the failed path will be reinstantiated  either by
admin or when the timeout expires.

•        The above configs will be made persistent across server reboots



Expected benefit:

-Deterministic Application IO throughput.

-We can give a time for the administrator to analyze the path failure
and recover the path.

- user space tools need minimum change .



The above feature will be enabled only if the corresponding variables
are defined in multipath.conf



Since these changes are irrespective of the underlying algorithms
which they are using in dm layer.

The changes are applied in dm.c and dm-mpath.c



alloc_dev(),reinstate_path(),parse_path(),fail_path() are the
functions which are going to be changed.





Need more comments on this as we started the testing and the results
look determenestic.



Regards,

Muneendra.


On Thu, Dec 15, 2016 at 3:00 PM, muneendra kumar <muneendra737 at gmail.com>
wrote:

> Hello,
>
> This is the place where iam currently working and the details are given below
>
>
>
> I'm working on device-mapper multipath (dm-multipath).
>
> This patch set adds a new hook for device-mapper in deciding the health of the
>
> Of the multipath which helps in getting the deterministic Application IO throughput.
>
>
>
> This patch set is preliminary tested on active-active 2 paths storage.
>
> But the patch set still needs work and is not ready for inclusion.
>
> I'm posting it because I'd like to get comments about high-level
>
> design before going further in details.
>
>
>
>
>
>
>
> This patch set should be applied on top of 3.10.0 #18
>
>
>
>
>
> ====================================================================
>
> Background
>
> =-=-=-=-=-=
>
>
>
> •        “Sick but not Dead” MPIO Path
>
> ‒       Path goes into Failed state because of path IO error as seen by DM driver
>
> ‒       When the multipath daemon issues TUR command  finds health of the failed path is good, makes the same path into Active state
>
> ‒       Path repeatedly toggles between Failed and Active Path States
>
> •        DM IO is retried on path where we are hitting multiple errors.
>
> •        Causing erratic (non-deterministic) Application IO throughput
>
>
>
> The current existing DM layer doesn't consider the amount of errors to decide the health of the path.
>
> Since the failed path is becoming active immediately when the tur command succeeds the end user will be in a
>
> Assumption that all the multipaths are in good state.
>
> When we run some of the field tests with this scenario we saw a non-deterministic io throughput
>
>
>
>
>
>
>
>
>
> =====================================================================
>
> Design Overview
>
> =-=-=-=-=-=-=-=-=
>
>
>
> •        Deterministically bring the path to “Faulty” state
>
> ‒       Configure per-DM device data with
>
> •        IO error threshold and time window for the error threshold to be hit
>
> ‒       Declare a path Faulty when error threshold is hit within the configured time window
>
> ‒       Place the path in the failed state for a predefined time configured by the administrator
>
>  using the config file
>
> ‒       Even though multipath daemon validates the path using TUR command which succeeds
>
>  and tries to re-instantiate the path ignore the re-instantiate of the path for a predefined time if the err threshold is hit.
>
> •        Give time for Administrator to correct the “Sick But not Dead” path and bring Path to Active
>
> •        Auto Enablement of a Faulty Path to Active State after a fixed time duration (given as a config data for each DM)
>
> ‒       Admin can set the Deterministic MPIO behavior on per-DM device basis
>
> -         It implies the failed path will be reinstantiated  either by admin or when the timeout expires.
>
> •        The above configs will be made persistent across server reboots
>
>
>
> Expected benefit:
>
> -Deterministic Application IO throughput.
>
> -We can give a time for the administrator to analyze the path failure and recover the path.
>
> - user space tools need minimum change .
>
>
>
>
>
> Since these changes are irrespective of the underlying algorithms which they are using in dm layer.
>
> The changes are applied in dm.c and dm-mpath.c
>
>
>
> alloc_dev(),reinstate_path(),parse_path(),fail_path() are the functions which are going to be changed.
>
>
>
>
>
> Need more comments on this as we started the testing and the results look determenestic.
>
>
> On Mon, Dec 5, 2016 at 9:35 PM, muneendra kumar <muneendra737 at gmail.com>
> wrote:
>
>> Thanks a lot for sharing the info.
>> I will discuss the problem in detail in my earlier mail,
>>
>> Regards,
>> Muneendra.
>>
>> On Mon, Dec 5, 2016 at 5:45 PM, Zdenek Kabelac <zkabelac at redhat.com>
>> wrote:
>>
>>> Dne 5.12.2016 v 07:29 muneendra kumar napsal(a):
>>>
>>>> Hi,
>>>> This is a general question.
>>>> If i do any changes in both multipath tool and dm driver (kernel).
>>>> How do i push my changes into main stream.
>>>> Can someone explain me the process so that it will help me a lot.
>>>>
>>>>
>>>
>>> Hi
>>>
>>> You propose your changes here on the list - you get a review and
>>> it the patches are found useful - maintainer of dm subsystem
>>> will accept them.
>>>
>>> Note - it's usually better to ask and discuss 'ahead' what is your
>>> problem
>>> and how do you want to improve/fix it.
>>> So you avoid losing time on implementing unacceptable patch.
>>>
>>> Regards
>>>
>>> Zdenek
>>>
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20161215/4acc9413/attachment.htm>


More information about the dm-devel mailing list