[dm-devel] Re: [k-ueda at ct.jp.nec.com: Re: request-based dm-multipath]

Hannes Reinecke hare at suse.de
Thu Apr 16 07:29:22 UTC 2009


Mikulas Patocka wrote:
> On Wed, 15 Apr 2009, Mike Snitzer wrote:
> 
>> On Wed, Apr 15 2009 at  3:09pm -0400,
>> Mikulas Patocka <mpatocka at redhat.com> wrote:
>>
>>> On Fri, 10 Apr 2009, Mike Snitzer wrote:
>>>
>>>> Hi Mikulaus,
>>>>
>>>> Figured I'd give you this heads up on the request-based multipath
>>>> patches too considering your recent "bottom-layer barrier support"
>>>> patchset (where you said multipath support is coming later).
>>>>
>>>> We likely want to coordinate with the NEC guys so as to make sure things
>>>> are in order for the request-based patches to get merged along with your
>>>> remaining barrier work for 2.6.31.
>>>>
>>>> Mike
>>>>
>>>> p.s. below you can see I mistakenly said to Kiyoshi that the recent
>>>> barrier patches that got merged upstream were "the last of the DM
>>>> barrier support"...
>>> Hi
>>>
>>> I would say one thing about the request-based patches --- don't do this.
>>>
>>> Your patch adds an alternate I/O path to request processing on device 
>>> mapper.
>>>
>>> So, with your patch, there will be two I/O request paths. It means that 
>>> any work on generic device-mapper code that will have to be done in the 
>>> future (such as for example barriers that I did) will be twice harder. It 
>>> will take twice the time to understand request processing, twice brain 
>>> capacity to remember it, twice the time for coding, twice the time for 
>>> code review, twice the time for testing.
>>>
>>> If the patch goes in, it will make a lot of things twice harder. And once 
>>> the patch is in productive kernels, there'd be very little possibility to 
>>> pull it out.
>>>
>>> What is the exact reason for your patch? I suppose that it's some 
>>> performance degradation caused by the fact that dm-multipath doesn't 
>>> distributes requests optimally across both paths. dm-multipath has 
>>> pluggable path selectors, so you could improve dm-round-robin.c (or write 
>>> alternate path selector module) and you don't have to touch generic dm 
>>> code to solve this problem.
>>>
>>> The point is that improving dm-multipath target with better path selector 
>>> is much less intrusive than patching device mapper core. If you improve 
>>> dm-multipath target, only people hacking on dm-multipath will have to 
>>> learn about your code. If you modify generic dm.c file, anyone doing 
>>> anything on device mapper must learn about your code --- so human time 
>>> consumed in much worse in this case.
>>>
>>> So, try the alternate solution (write new path selector for dm-multipath) 
>>> and then you can compare them and see the result --- and then it can be 
>>> consisdered if the high human time consumed by patching dm.c is worth the 
>>> performance improvement.
>> Mikulas,
>>
>> Section 3.1 of the the following 2007 Linux Symposium paper answers the
>> "why?" on request-based dm-multipath:
>> http://ols.108.redhat.com/2007/Reprints/ueda-Reprint.pdf
>>
>> In summary:
>> With request-based multipath performance and path error handling is
>> improved.  
>>
>> Performance:
>> The I/O scheduler is leveraged to merge bios into requests; and these
>> requests are then able to be more evenly balanced across the available
>> paths (no need to starve other paths like the bio-based multipath is
>> prone to do).
> 
> So you can improve the bio-based selector. You can count number&size of 
> outstanding requests on each path and select the less loaded path.
> 
And which is what you _cannot_ do.
You have _no_ idea at all how the bios are merged in to requests.
And as you do scheduling decisions based on the _bios_ you will interfere
with the elevator. Hence you always have to select large scheduling
intervals as to have the disturbance of the elevator as small as possible.

Just decrease 'min_rr_io' setting and watch the performace to drop.

> You can remember several end positions of last requests and when new 
> request matches one of them, send it to the appropriate path, assuming 
> that the lower device's scheduler will marge that. Or --- another solution 
> is to access queues of the underlying devices and ask them if there's 
> anything to merge --- and then send the request down the path that has 
> some adjacent request.
> 
But this would duplicate the elevator merging logic, wouldn't it?
You would have to out-guess the elevator which requests it would merge
next ...

> I know that the round-robin selector is silly, but you just haven't even 
> try to improve it.
> 
> If there is non-intrusive solution (improve path selector), it should be 
> tried first, before making an intrusive solution (alternate request path 
> in dm core).
> 
>> Error handling:
>> Finer grained error statistics are available when interfacing more
>> directly with the hardware like the request-based multipath does.
> 
> You can signal it via flags in bios. No need to rewrite dm core.
> 
But this makes failover costly, as you have to fail over each
individual bio. Hence you cannot (by design) have a real load
balancing as the cost of failover a single path is prohibitively
high.

Using request-based multipathing OTOH the cost for failover becomes
really small and you can do real load-balancing.
Like setting rr_min_io setting to '1' and you won't suffer any
performance drawback.

>> NEC may already have comparative performance data that will help
>> illustrate the improvement associated with request-based multipath?
>> They apparently have dynamic load balancing patches that they developed
>> for use with the current bio-based multipath.
> 
> So where is it better and why? Does it save CPU time or disk throughtput? 
> How? On which workload?
> 
> Did they really try to implement some smart path ballancing that takes 
> into account merging?
> 
No, they didn't, because of the abovementioned points.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)




More information about the dm-devel mailing list