[dm-devel] blk-mq request allocation stalls

Mon Jan 12 17:09:02 UTC 2015

On 01/12/2015 10:04 AM, Bart Van Assche wrote:
> On 01/12/15 17:34, Jens Axboe wrote:
>> On 01/12/2015 09:12 AM, Bart Van Assche wrote:
>>> On 01/12/15 16:42, Jens Axboe wrote:
>>>> On 01/12/2015 07:46 AM, Bart Van Assche wrote:
>>>>> bt_get: __bt_get() returned -1
>>>>> queue_num=2, nr_tags=62, reserved_tags=0, bits_per_word=3
>>>>> nr_free=62, nr_reserved=0, may_queue=0
>>>>> active_queues=8
>>>>
>>>> Can you add dumping of hctx->nr_active when this fails? You case is that
>>>> the may_queue logic says no-can-do, so it smells like the nr_active
>>>> accounting is wonky since you have supposedly no allocated tags, yet it
>>>> clearly thinks that you do.
>>>
>>> Hello Jens,
>>>
>>> The requested output is as follows:
>>>
>>> bt_get: __bt_get() returned -1
>>> queue_num=0, nr_tags=62, reserved_tags=0, bits_per_word=3
>>> nr_free=62, nr_reserved=0, hctx->tags->active_queues=7,
>>> hctx->nr_active=9, hctx_may_queue()=0
>>> active_queues=7
>>
>> So that does look a bit off, we have (supposedly) 9 active requests, but
>> nothing allocated. When the mkfs is done and things are idle, can you
>> try and cat the 'active' file in the mq directory? I want to see if it
>> drops to zero or stays elevated.
>
> Hello Jens,
>
> Sorry that I hadn't been more clear but the __bt_get() data in the
> previous e-mail was gathered when multipathd started instead of when
> mkfs was started. That was the time at which Mike's debug patch reported
> for the first time that __bt_get() returned -1. What is also remarkable
> is that all "__bt_get() returned -1" reports that I checked referred to
> the /dev/dm-0 device and none to any of the underlying devices
> (/dev/sd[cd]).
>
> The tag state after having stopped multipathd (systemctl stop
> multipathd) is as follows:
> # dmsetup table /dev/dm-0
> 0 256000 multipath 3 queue_if_no_path pg_init_retries 50 0 1 1
> service-time 0 2 2 8:48 1 1 8:32 1 1
> # ls -l /dev/sd[cd]
> brw-rw---- 1 root disk 8, 32 Jan 12 17:47 /dev/sdc
> brw-rw---- 1 root disk 8, 48 Jan 12 17:47 /dev/sdd
> # for d in sdc sdd dm-0; do echo ==== $d; (cd /sys/block/$d/mq &&
>    find|cut -c3-|grep active|xargs grep -aH ''); done
> ==== sdc
> 0/active:10
> 1/active:14
> 2/active:7
> 3/active:13
> 4/active:6
> 5/active:10
> ==== sdd
> 0/active:17
> 1/active:8
> 2/active:9
> 3/active:13
> 4/active:5
> 5/active:10
> ==== dm-0
> -bash: cd: /sys/block/dm-0/mq: No such file or directory

OK, so it's definitely leaking, but only partially - the requests are 
freed, yet the active count isn't decremented. I wonder if we're losing 
that flag along the way. It's numbered high enough that a cast to int 
will drop it, perhaps the cmd_flags is being copied/passed around as an 
int and not the appropriate u64? We've had bugs like that before.

-- 
Jens Axboe