[dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu

Brian King brking at linux.vnet.ibm.com
Thu Jun 29 12:59:14 UTC 2017


On 06/28/2017 05:19 PM, Jens Axboe wrote:
> On 06/28/2017 04:07 PM, Brian King wrote:
>> On 06/28/2017 04:59 PM, Jens Axboe wrote:
>>> On 06/28/2017 03:54 PM, Jens Axboe wrote:
>>>> On 06/28/2017 03:12 PM, Brian King wrote:
>>>>> -static inline int part_in_flight(struct hd_struct *part)
>>>>> +static inline unsigned long part_in_flight(struct hd_struct *part)
>>>>>  {
>>>>> -	return atomic_read(&part->in_flight[0]) + atomic_read(&part->in_flight[1]);
>>>>> +	return part_stat_read(part, in_flight[0]) + part_stat_read(part, in_flight[1]);
>>>>
>>>> One obvious improvement would be to not do this twice, but only have to
>>>> loop once. Instead of making this an array, make it a structure with a
>>>> read and write count.
>>>>
>>>> It still doesn't really fix the issue of someone running on a kernel
>>>> with a ton of possible CPUs configured. But it does reduce the overhead
>>>> by 50%.
>>>
>>> Or something as simple as this:
>>>
>>> #define part_stat_read_double(part, field1, field2)			\
>>> ({									\
>>> 	typeof((part)->dkstats->field1) res = 0;			\
>>> 	unsigned int _cpu;						\
>>> 	for_each_possible_cpu(_cpu) {					\
>>> 		res += per_cpu_ptr((part)->dkstats, _cpu)->field1;	\
>>> 		res += per_cpu_ptr((part)->dkstats, _cpu)->field2;	\
>>> 	}								\
>>> 	res;								\
>>> })
>>>
>>> static inline unsigned long part_in_flight(struct hd_struct *part)
>>> {
>>> 	return part_stat_read_double(part, in_flight[0], in_flight[1]);
>>> }
>>>
>>
>> I'll give this a try and also see about running some more exhaustive
>> runs to see if there are any cases where we go backwards in performance.
>>
>> I'll also run with partitions and see how that impacts this.
> 
> And do something nuts, like setting NR_CPUS to 512 or whatever. What do
> distros ship with?

Both RHEL and SLES set NR_CPUS=2048 for the Power architecture. I can easily
switch the SMT mode of the machine I used for this from 4 to 8 to have a total
of 160 online logical CPUs and see how that affects the performance. I'll
see if I can find a larger machine as well.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center




More information about the dm-devel mailing list