[dm-devel] dm-mq and end_clone_request()

Laurence Oberman loberman at redhat.com
Thu Aug 11 16:51:47 UTC 2016



----- Original Message -----
> From: "Laurence Oberman" <loberman at redhat.com>
> To: "Bart Van Assche" <bart.vanassche at sandisk.com>
> Cc: dm-devel at redhat.com, linux-scsi at vger.kernel.org, "Mike Snitzer" <snitzer at redhat.com>, "Johannes Thumshirn"
> <jthumshirn at suse.de>
> Sent: Wednesday, August 10, 2016 5:38:16 PM
> Subject: Re: [dm-devel] dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman at redhat.com>
> > To: "Bart Van Assche" <bart.vanassche at sandisk.com>
> > Cc: dm-devel at redhat.com, linux-scsi at vger.kernel.org, "Mike Snitzer"
> > <snitzer at redhat.com>, "Johannes Thumshirn"
> > <jthumshirn at suse.de>
> > Sent: Tuesday, August 9, 2016 1:21:15 PM
> > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Bart Van Assche" <bart.vanassche at sandisk.com>
> > > To: "Laurence Oberman" <loberman at redhat.com>
> > > Cc: dm-devel at redhat.com, linux-scsi at vger.kernel.org, "Mike Snitzer"
> > > <snitzer at redhat.com>, "Johannes Thumshirn"
> > > <jthumshirn at suse.de>
> > > Sent: Tuesday, August 9, 2016 1:16:52 PM
> > > Subject: Re: [dm-devel] dm-mq and end_clone_request()
> > > 
> > > On 08/09/2016 10:12 AM, Laurence Oberman wrote:
> > > > I was talking about this patch
> > > >
> > > > --- a/drivers/scsi/scsi_scan.c
> > > > +++ b/drivers/scsi/scsi_scan.c
> > > > @@ -1890,10 +1890,11 @@ void scsi_forget_host(struct Scsi_Host *shost)
> > > >   restart:
> > > >          spin_lock_irqsave(shost->host_lock, flags);
> > > >          list_for_each_entry(sdev, &shost->__devices, siblings) {
> > > > -                if (sdev->sdev_state == SDEV_DEL)
> > > > +                if (sdev->sdev_state == SDEV_DEL ||
> > > > scsi_device_get(sdev)
> > > > < 0)
> > > >                          continue;
> > > >                  spin_unlock_irqrestore(shost->host_lock, flags);
> > > >                  __scsi_remove_device(sdev);
> > > > +                scsi_device_put(sdev);
> > > >                  goto restart;
> > > >          }
> > > >          spin_unlock_irqrestore(shost->host_lock, flags);
> > > 
> > > Hello Laurence,
> > > 
> > > Did you run your tests with that patch applied? If so, it would help if
> > > you could rerun your tests without that patch. If the above patch makes
> > > a difference it means that it can happen that __scsi_remove_device()
> > > does not change the device state into SDEV_DEL. That's a bug and we need
> > > to know whether or not __scsi_remove_device() behaves correctly.
> > > 
> > > Thanks,
> > > 
> > > Bart.
> > > 
> > Yes Sir, I ran all yesterdays tests on your kernel with that patch applied.
> > Of course it may well just be luck/coincidence that the host delete race is
> > no longer happening
> > so I agree we need to re-run the tests so I will revert and re-run.
> > I will probably only get back to you tomorrow with the results.
> > 
> > Thanks
> > Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> Hello Bart
> 
> I only just got time now to revert that patch and build a kernel.
> Will test this tonight and let you know if I am back to seeing panics
> sporadically without the patch.
> As already mentioned, this is a different configuration to what I had when I
> was able to reproduce the panic.
> This means the lack of hitting this stack trace and panic may turn out to
> have nothing to do with the patch I applied.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Bart

I can no longer reproduce the stack even with the patch reverted so its behaving as you expected and the patch is as you already said, not valid.
I ran about 30 fio tests with your kernel and multiple host deletions and and did experience only one hard fio error.
My tests now produce the same results as you are seeing.

The single fio errors was with many more executions of the test so its not easy to get these fio errors.

Away from tomorrow on vacation for 10 days

Thanks
Laurence




More information about the dm-devel mailing list