[dm-devel] [PATCH] multipathd: check and cleanup zombie paths

Chongyun Wu wu.chongyun at h3c.com
Tue Mar 20 03:19:38 UTC 2018


On 2018/3/20 5:42, Martin Wilck wrote:
> On Fri, 2018-03-09 at 10:22 -0600, Benjamin Marzinski wrote:
>> On Fri, Mar 09, 2018 at 06:47:30AM +0000, Chongyun Wu wrote:
>>> On 2018/3/8 23:45, Benjamin Marzinski wrote:
>>>>
>>>> If there are multiple routes to the storage, Some of them can be
>>>> down,
>>>> even if everything is fine on the storage.  This will cause some
>>>> paths
>>>> to be up and some to be down, regardless of the state of the LUN.
>>>> In
>>>> every other multipath case but this one, there is just one LUN,
>>>> and not
>>>> all the paths have the same state.
>>>>
>>>> Ideally, there would be a way to determine if a path is a zombie,
>>>> simply
>>>> by looking at it alone.  The additional sense code "LOGICAL UNIT
>>>> NOT
>>>> SUPPORTED" that you posted earlier isn't one that I recall seeing
>>>> for
>>>> failed multipathd paths.  I'll check around more, but a quick
>>>> look makes
>>>> it appear that this code is only used when you are accessing a
>>>> LUN that
>>>> really isn't there. It's possible that the TUR checker could
>>>> return a
>>>> special path state for this, that would cause multipathd to
>>>> remove the
>>>> device.  Also, even if that additional sense code is only
>>>> supposed to be
>>>> used for this condition, we should still removing a device that
>>>> returns
>>>> it configurable, because I can almost guarantee that there will
>>>> be a
>>>> scsi device that does follow the standard for this.
>>>>
>>>
>>> Hi Ben,
>>> You just mentioned *the TUR checker could return a special path
>>> state
>>> for this*, what is the special path state?  Thanks~
>>>
>>
>> We would have to add a new state, like PATH_NOT_SUPPORTED, that the
>> TUR
>> checker could return in this case.  multipathd could be configured to
>> remove the path if it returned this state. If it wasn't configured to
>> do
>> so, multipathd would just change the state to PATH_DOWN.
> 
> Is it really multipathd's job to do remove devices that return "LOGICAL
> UNIT NOT SUPPORTED"? To me it sounds like a misconfiguration on the
> SCSI/storage level, and I'm unsure if that's a thing multipathd should
> mess with.
> 
> Martin
> 
Actually there are two scenario:
(1)Export the LUN to a server at the same time using different LUN nubmer.
As you mentioned this scenario can be considered a misconfiguration 
which we might not care about it.
(2)Export the LUN to a server not at the same time using different LUN 
number.
This scenario's operation may be right, the customer just want to 
reassignment the export relations in the storage.
But the former export operation leave a residual device in the system 
which will been adopted by the latter exported device's multipath. Also 
there are lots of syslog for the former device which actually not 
exist(at lest customer don't think it exists, the customer want only the 
new exported device exist)

Regards,
Chongyun






More information about the dm-devel mailing list