[dm-devel] [PATCH 8/9] dm: Fix two race conditions related to stopping and starting queues

Mike Snitzer snitzer at redhat.com
Thu Sep 1 23:47:54 UTC 2016


On Thu, Sep 01 2016 at  7:17pm -0400,
Bart Van Assche <bart.vanassche at sandisk.com> wrote:

> On 09/01/2016 03:27 PM, Mike Snitzer wrote:
> >On Thu, Sep 01 2016 at  6:22pm -0400,
> >Bart Van Assche <bart.vanassche at sandisk.com> wrote:
> >
> >>On 09/01/2016 03:18 PM, Mike Snitzer wrote:
> >>>FYI I get the same 'dmsetup suspend --nolockfs --noflush mp' hang,
> >>>running mptest's test_02_sdev_delete, when I try your unmodified
> >>>patchset, see:
> >>>
> >>>http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=devel.bart
> >>
> >>Hello Mike,
> >>
> >>Are you aware that the code on that branch is a *modified* version
> >>of my patch series? The following patch is not present on that
> >>branch: "dm path selector: Avoid that device removal triggers an
> >>infinite loop". There are also other (smaller) differences.
> >
> >No, you're obviously talking about the 'devel' branch and not the
> >'devel.bart' branch I pointed to.  The 'devel.bart' branch is the
> >_exact_ patchset you sent.  It has the same problem as the 'devel'
> >branch.
> 
> Hello Mike,
> 
> Sorry that I misread your previous e-mail. After I received your
> latest e-mail I rebased my tree on top of the devel.bart branch
> mentioned above. My tests still pass. The only two patches in my
> tree that are relevant and that are not in the devel.bart branch
> have been attached to this e-mail. Did your test involve the sd
> driver? If so, do the attached two patches help? If the sd driver
> was not involved, can you provide more information about the hang
> you ran into? The output and log messages generated by the following
> commands after the hang has been reproduced would be very welcome:
> * echo w > /proc/sysrq-trigger
> * (cd /sys/block && grep -a '' dm*/mq/*/{pending,cpu*/rq_list})

sd is used.  I'll apply those patches and test, tomorrow, but I'm pretty
skeptical.

Haven't had any problems with these tests for quite a while.  The tests
I'm running are just those in the mptest testsuite, see:
https://github.com/snitm/mptest

Running them should be as simple as you doing:

git clone git://github.com/snitm/mptest.git
cd mptest
./runtest

The default is to use dm-mq on scsi-mq ontop of tcmloop.

multipath -ll shows:

mp () dm-4 LIO-ORG ,rd
size=1.0G features='4 queue_if_no_path retain_attached_hw_handler queue_mode mq' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=-1 status=active
| |- 7:0:1:0  sdj   8:144 active ready running
| `- 8:0:1:0  sdk   8:160 active ready running
`-+- policy='queue-length 0' prio=-1 status=enabled
  |- 9:0:1:0  sdl   8:176 active ready running
  `- 10:0:1:0 sdm   8:192 active ready running

[ 4839.452237] scsi host7: TCM_Loopback
[ 4839.472788] scsi host8: TCM_Loopback
[ 4839.492867] scsi host9: TCM_Loopback
[ 4839.512841] scsi host10: TCM_Loopback
[ 4839.549430] scsi 7:0:1:0: Direct-Access     LIO-ORG  rd               4.0  PQ: 0 ANSI: 5
[ 4839.570556] scsi 7:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.577562] scsi 7:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 1
[ 4839.587810] sd 7:0:1:0: [sdj] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.587830] sd 7:0:1:0: Attached scsi generic sg10 type 0
[ 4839.593569] sd 7:0:1:0: alua: transition timeout set to 60 seconds
[ 4839.593572] sd 7:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.608254] scsi 8:0:1:0: Direct-Access     LIO-ORG  rd               4.0  PQ: 0 ANSI: 5
[ 4839.626620] sd 7:0:1:0: [sdj] Write Protect is off
[ 4839.631974] sd 7:0:1:0: [sdj] Mode Sense: 43 00 00 08
[ 4839.631999] sd 7:0:1:0: [sdj] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.642209] loopback/naa.50014056fcae4fb4: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.652646] sd 7:0:1:0: [sdj] Attached SCSI disk
[ 4839.673568] scsi 8:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.680573] scsi 8:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 2
[ 4839.690814] sd 8:0:1:0: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.690888] sd 8:0:1:0: Attached scsi generic sg11 type 0
[ 4839.696543] sd 8:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.711419] scsi 9:0:1:0: Direct-Access     LIO-ORG  rd               4.0  PQ: 0 ANSI: 5
[ 4839.722730] sd 8:0:1:0: [sdk] Write Protect is off
[ 4839.728076] sd 8:0:1:0: [sdk] Mode Sense: 43 00 00 08
[ 4839.728094] sd 8:0:1:0: [sdk] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.738298] loopback/naa.500140553365fbe6: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.748700] sd 8:0:1:0: [sdk] Attached SCSI disk
[ 4839.771561] scsi 9:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.778567] scsi 9:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 3
[ 4839.788794] sd 9:0:1:0: [sdl] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.788823] sd 9:0:1:0: Attached scsi generic sg12 type 0
[ 4839.794546] sd 9:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.809308] scsi 10:0:1:0: Direct-Access     LIO-ORG  rd               4.0  PQ: 0 ANSI: 5
[ 4839.820806] sd 9:0:1:0: [sdl] Write Protect is off
[ 4839.826161] sd 9:0:1:0: [sdl] Mode Sense: 43 00 00 08
[ 4839.826181] sd 9:0:1:0: [sdl] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.836379] loopback/naa.5001405631dca816: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.846762] sd 9:0:1:0: [sdl] Attached SCSI disk
[ 4839.856572] scsi 10:0:1:0: alua: supports implicit and explicit TPGS
[ 4839.863673] scsi 10:0:1:0: alua: device naa.600140559050dd34f6e46deb7e0e9f24 port group 0 rel port 4
[ 4839.874002] sd 10:0:1:0: [sdm] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 4839.874033] sd 10:0:1:0: Attached scsi generic sg13 type 0
[ 4839.879549] sd 10:0:1:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 4839.897162] sd 10:0:1:0: [sdm] Write Protect is off
[ 4839.902613] sd 10:0:1:0: [sdm] Mode Sense: 43 00 00 08
[ 4839.902632] sd 10:0:1:0: [sdm] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 4839.912935] loopback/naa.5001405afca06b48: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 4839.923291] sd 10:0:1:0: [sdm] Attached SCSI disk
[ 4841.065972] device-mapper: multipath queue-length: version 0.2.0 loaded




More information about the dm-devel mailing list