[dm-devel] Improve processing efficiency for addition and deletion of multipath devices
Hannes Reinecke
hare at suse.de
Mon Nov 28 15:37:27 UTC 2016
On 11/28/2016 04:25 PM, Benjamin Marzinski wrote:
> On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui at zte.com.cn wrote:
>> Hello Christophe, Ben, Hannes, Martin, Bart,
>> I am a member of host-side software development team of ZXUSP storage
>> project
>> in ZTE Corporation. Facing the market demand, our team decides to write
>> code to
>> promote multipath efficiency next month. The whole idea is in the mail
>> below.We
>> hope to participate in and make progress with the open source community,
>> so any
>> suggestion and comment would be welcome.
>
> Like I mentioned before, I think this is a good idea in general, but the
> devil is in the details here.
>
>>
>> Thanks,
>> Tang
>>
>> ------------------------------------------------------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------------------------------------------------------
>> 1. Problem
>> In these scenarios, multipath processing efficiency is low:
>> 1) Many paths exist in each multipath device,
>> 2) Devices addition or deletion during iSCSI login/logout or FC link
>> up/down.
>
> <snip>
>
>> 4. Proposal
>> Other than processing uevents one by one, uevents which coming from the
>> same LUN devices can be mergered to one, and then uevent processing
>> thread only needs to process it once, and it only produces one DM addition
>> uevent which could reduce system resource consumption.
>>
>> The example in Chapter 2 is continued to use to explain the proposal:
>> 1) Multipath receives block device addition uevents from udev:
>> UDEV [89068.806214] add
>> /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block)
>> UDEV [89068.909457] add
>> /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block)
>> UDEV [89068.944956] add
>> /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block)
>> UDEV [89068.959215] add
>> /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block)
>> UDEV [89068.978558] add
>> /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block)
>> UDEV [89069.004217] add
>> /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block)
>> UDEV [89069.486361] add
>> /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block)
>> UDEV [89069.495194] add
>> /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block)
>> UDEV [89069.511628] add
>> /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block)
>> UDEV [89069.716292] add
>> /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block)
>> UDEV [89069.748456] add
>> /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block)
>> UDEV [89069.789662] add
>> /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block)
>>
>> 2) Multipath merges these 12 uevents to 3 internal uvents
>> UEVENT add sdc sdh sdd sdl
>> UEVENT add sde sdj sdf sdm
>> UEVENT add sdg sdk sdi sdn
>>
>> 3) Multipath process these 3 uevents one by one, and only produce 3
>> addition
>> DM uvents, no dm change uevent exists.
>> KERNEL[89068.899614] add /devices/virtual/block/dm-2 (block)
>> KERNEL[89068.955364] add /devices/virtual/block/dm-3 (block)
>> KERNEL[89069.018903] add /devices/virtual/block/dm-4 (block)
>
> Just because I'm pedantic: There will, of cource, be dm change events.
> Without them, you couldn't have a multipath device. Whenever you load a
> table in a dm device (including during the initial creation), you get a
> change event.
>
>> 4) Udev process these uevents above, and transfer it to multipath
>> UDEV [89068.926428] add /devices/virtual/block/dm-2 (block)
>> UDEV [89069.007511] add /devices/virtual/block/dm-3 (block)
>> UDEV [89069.098054] add /devices/virtual/block/dm-4 (block)
>
> multipathd ignores add events for dm devices (look at uev_trigger). A dm
> device isn't set up until it's initial change event happens.
>
>> 5) Multipath processes these uevents above, and finishes the creation of
>> multipath
>> devices.
>>
>> 5. Coding
>> After taking over uevents form uevent listening thread, uevent processing
>> thread can
>> merger these uevents before processing��
>> int uevent_dispatch(int (*uev_trigger)(struct uevent *, void *
>> trigger_data),
>> void * trigger_data)
>> {
>> ...
>> while (1) {
>> ...
>> list_splice_init(&uevq, &uevq_tmp);
>> ...
>> list_merger_uevents(&uevq_tmp);
>> service_uevq(&uevq_tmp);
>> }
>> ...
>> }
>> In structure of ��struct uevent�� , an additional member of ��char
>> wwid[WWID_SIZE]�� is
>> added to record each device WWID for addition or change uevent to identify
>> whether
>> these uevents coming from the same LUN, and an additional member of
>> ��struct list_head merger_node�� is added to record the list of uevents
>> which having been
>> merged with this uevent:
>> struct uevent {
>> struct list_head node;
>> struct list_head merger_node;
>> char wwid[WWID_SIZE]
>> struct udev_device *udev;
>> ...
>> };
>
> You can't just get the wwid with no work (look at all work uev_add_path
> does, specifically alloc_path_with_pathinfo). Now you could reorder
> this, but there isn't much point, since it is doing useful things, like
> checking if this is a spurious uevent, and necessary things, like
> figuring out the device type and using that that the configuration to
> figure out HOW to get the wwid. It seems like what you want to do is to
> call uev_add_path multiple times, but defer most of the work that
> ev_add_path does (creating or updating the multipath device), until
> you've processed all that paths.
>
Which is kinda what I've proposed; split off uev_add_path() and
ev_add_path().
Then uev_add_path could generate a list of fully-formed paths which
ev_add_path() would process.
IE generalize coalesce_paths() to work on a passed-in list rather than
the normal vecs->pathvec.
Or consider using vecs->pathvec to hold 'orphan' paths when processing
uevents; then we could add the paths to the pathvec in uev_add_path()
and have another thread going through the list of paths and trying to
call coalesce_paths() for the changed mpps.
Will be a tad tricky, as we'd need to identify which mpps needs to be
updated; but nothing too awkward.
Meanwhile I'm working on converting vecs->pathvec and vecs->mpvec to RCU
lists; that should get rid of the need for most locks, which I suspect
being the main reason for the scaling issues.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare at suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
More information about the dm-devel
mailing list