[dm-devel] Improve processing efficiency for addition and deletion of multipath devices

Peter Rajnoha prajnoha at redhat.com
Mon Nov 28 12:23:55 UTC 2016


On 11/28/2016 01:08 PM, Hannes Reinecke wrote:
> On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
>> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
>>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
>>>> Dne 28.11.2016 v 03:19 tang.junhui at zte.com.cn napsal(a):
>>>>> Hello Christophe, Ben, Hannes, Martin, Bart,
>>>>> I am a member of host-side software development team of ZXUSP storage
>>>>> project
>>>>> in ZTE Corporation. Facing the market demand, our team decides to
>>>>> write code to
>>>>> promote multipath efficiency next month. The whole idea is in the mail
>>>>> below.We
>>>>> hope to participate in and make progress with the open source
>>>>> community, so any
>>>>> suggestion and comment would be welcome.
>>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>> First - we are aware of these issue.
>>>>
>>>> The solution proposed in this mail would surely help - but there is
>>>> likely a bigger issue to be solved first.
>>>>
>>>> The core trouble is to avoid  'blkid' disk identification to be
>>>> executed.
>>>> Recent version of multipath is already marking plain 'RELOAD' operation
>>>> of table (which should not be changing disk content) with extra DM bit,
>>>> so udev rules ATM skips 'pvscan' - we also would like to extend the
>>>> functionality to skip rules more and reimport existing 'symlinks' from
>>>> udev database (so they would not get deleted).
>>>>
>>>> I believe the processing of udev rules is 'relatively' quick as long
>>>> as it does not need to read/write ANYTHING from real disks.
>>>>
>>> Hmm. You sure this is an issue?
>>> We definitely need to skip uevent handling when a path goes down (but I
>>> think we do that already), but for 'add' events we absolutely need to
>>> call blkid to figure out if the device has changed.
>>> There are storage arrays out there who use a 'path down/path up' cycle
>>> to inform initiators about any device layout change.
>>> So we wouldn't be able to handle those properly if we don't call blkid
>>> here.
>>
>> The core trouble is -
>>
>>
>> With multipath device - you ONLY want to 'scan' device (with blkid)  when
>> only the initial first member device of multipath gets in.
>>
>> So you start multipath (resume -> CHANGE) - it should be the ONLY place
>> to run 'blkid' test (which really goes though over 3/4MB of disk read,
>> to check if there is not ZFS somewhere)
>>
>> Then any next disk being a member of multipath (recognized by 'multipath
>> -c',
>> should NOT scan)  - as far  as  I can tell current order is opposite,
>> fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
>>
>> Thus every add disk fires very lengthy blkid scan.
>>
>> Of course I'm not here an expert on dm multipath rules so passing this
>> on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
>>
>> There should be exactly ONE blkid for a single multipath device - as
>> long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
>> component devices)
>>
> ATM 'multipath -c' is just a simple test if the device is supposed to be
> handled by multipath.
> 
> And the number of bytes read by blkid should be _that_ large; a simple
> 'blkid' on my device caused it to read 35k ...
> 
> Also udev will become very unhappy if we're not calling blkid for every
> device; you'd be having a hard time reconstructing the event for those
> devices.

What do you mean with event reconstruction?

I don't think we really need to call blkid for every device. If we have
configured that certain device is surely an mpath component (based on
WWN in mpath config), I think we don't need to call blkid at all - it's
mpath component and the top-level device should be simply used for any
scanning.

I mean, I still don't see why do we need to call blkid and then
overwrite the ID_FS_TYPE variable right away based on the fact that it's
multipath -c. If we reverse this order, we could save the extra blkid
that's not actually needed.

-- 
Peter




More information about the dm-devel mailing list