[dm-devel] [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues
Bart Van Assche
bart.vanassche at sandisk.com
Thu Sep 1 20:15:17 UTC 2016
On 09/01/2016 12:05 PM, Mike Snitzer wrote:
> On Thu, Sep 01 2016 at 1:59pm -0400,
> Bart Van Assche <bart.vanassche at sandisk.com> wrote:
>> On 09/01/2016 09:12 AM, Mike Snitzer wrote:
>>> Please see/test the dm-4.8 and dm-4.9 branches (dm-4.9 being rebased
>>> ontop of dm-4.8):
>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8
>>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.9
>>
>> Hello Mike,
>>
>> The result of my tests of the dm-4.9 branch is as follows:
>> * With patch "dm mpath: check if path's request_queue is dying in
>> activate_path()" I still see every now and then that CPU usage of
>> one of the kworker threads jumps to 100%.
>
> So you're saying that the dying queue check is still needed in the path
> selector? Would be useful to know why the 100% is occuring. Can you
> get a stack trace during this time?
Hello Mike,
A few days ago I had already tried to obtain a stack trace with perf but
the information reported by perf wasn't entirely accurate. What I know
about that 100% CPU usage is as follows:
* "dmsetup table" showed three SRP SCSI device nodes but these SRP SCSI
device nodes were not visible in /sys/block. This means that
scsi_remove_host() had already removed these from sysfs.
* hctx->run_work kept being requeued over and over again on the kernel
thread with name "kworker/3:1H". I assume this means that
blk_mq_run_hw_queue() was called with the second argument (async) set
to true. This probably means that the following dm-rq code was
triggered:
if (map_request(tio, rq, md) == DM_MAPIO_REQUEUE) {
/* Undo dm_start_request() before requeuing */
rq_end_stats(md, rq);
rq_completed(md, rq_data_dir(rq), false);
return BLK_MQ_RQ_QUEUE_BUSY;
}
Bart.
More information about the dm-devel
mailing list