[dm-devel] dm-mq and end_clone_request()

Bart Van Assche bart.vanassche at sandisk.com
Thu Jul 28 15:23:47 UTC 2016


On 07/28/2016 06:33 AM, Mike Snitzer wrote:
> On Wed, Jul 27 2016 at  7:05pm -0400,
> Bart Van Assche <bart.vanassche at sandisk.com> wrote:
>> Thanks again for having made this patch available. I will test it as
>> soon as I have the time. BTW, in the meantime I ran a few tests with
>> DM_MQ_DEFAULT=n since until now I ran all tests with
>> DM_MQ_DEFAULT=y. The result of these tests is as follows:
>> * v4.6.0, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=y: first simulated
>> path removal triggers I/O errors.
>> * v4.6.4, v4.6.5 and v4.7.0 with DM_MQ_DEFAULT=n: test passes more
>> than 100 iterations.
>
> I think this may point to an SRP issue then.  Is the synthetic "cable
> pull" (by writing to /sys/class/srp_remote_ports/port-*/delete)
> representitive of what actually happens if a cable is physically pulled?
>
> Or is your synthetic method hitting the device way harder than would
> happen with an actual production fault?
>
> Again, there hasn't been any report of failures (EIO or otherwise) with
> extensive scsi-mq and dm-mq testing on a larger FC testbed.

Hello Mike,

Sorry but I disagree that the ib_srp driver would be causing the EIO 
errors because:
* All tests, including the tests that pass, were run with
   CONFIG_SCSI_MQ_DEFAULT=y in the kernel config. The same code paths
   were triggered in the ib_srp driver by all the tests
   (CONFIG_DM_MQ_DEFAULT=y and CONFIG_DM_MQ_DEFAULT=n).
* In my previous e-mails I have shown that the EIO error code is
   generated by the dm-mpath driver after all (SRP) paths have gone. So
   how could the ib_srp driver be involved?

There is an important difference between the SCSI FC drivers and ib_srp: 
after dev_loss_tmo expires FC drivers call scsi_remove_target() while 
the SRP transport layer triggers a call of scsi_remove_host().

Both writing into /sys/class/srp_remote_ports/*/delete and pulling a
cable make the ib_srp driver call scsi_remove_host(). The only 
difference is the timing. With the former method it is more likely that 
the time between submitting I/O and calling scsi_remove_host() is small.

>> I have not yet run any tests with kernel v4.5.x because in the test
>> I ran the ib_srp and ib_srpt drivers are loaded on the same system
>> and because I need five v4.7 LIO patches to run this test pass but
>> unfortunately these patches do not apply cleanly on the v4.5.x code
>> base.
>>
>> Please let me know if you need more information.
>
> Can the target core be made to use SRP in loopback (local test machine)
> mode?  The mptest harness currently defaults to using tcmloop.  Would be
> great if I could somehow exercise the SRP code without needing a
> fullblown IB setup.
>
> But if there isn't a way to achieve that test coverage I can
> probably/hopefully get access to a subset of a larger IB/SRP testbed.

All InfiniBand HCAs that I have encountered so far support loopback as 
long as at least one HCA port is up (either connected to a switch or 
connected to another HCA port and opensm is running against one of these 
two ports).

The scripts I used to test the ib_srp driver are available at 
https://github.com/bvanassche/srp-test.

Thanks,

Bart.




More information about the dm-devel mailing list