[dm-devel] [PATCH 0/6] scsi_dh : Couple of fixes for scsi device handlers

Moger, Babu Babu.Moger at lsi.com
Sat Jul 31 05:02:09 UTC 2010



> -----Original Message-----
> From: Shyam Iyer [mailto:shyam_iyer at dell.com]
> Sent: Friday, July 30, 2010 7:22 PM
> To: sekharan at us.ibm.com
> Cc: Moger, Babu; device-mapper development; linux-scsi at vger.kernel.org;
> Qi, Yanling; Chauhan, Vijay; Dachepalli, Sudhir; Stankey, Robert
> Subject: Re: [PATCH 0/6] scsi_dh : Couple of fixes for scsi device
> handlers
> 
> On 07/30/2010 07:56 PM, Chandra Seetharaman wrote:
> > On Fri, 2010-07-30 at 12:12 -0600, Moger, Babu wrote:
> >
> >
> >>>>    Yes, We can do that.  Problem is I am hitting the issue with
> BUG_ON
> >>>>   in get_rdac_data which is there in the beginning of rdac_activate.
> >>>>
> >>> I do not understand the problem.
> >>>
> >>> If the BUG_ON on get_rdac_data() is being triggered, that means
> >>> sdev->scsi_dh_data is NULL, if that is the case, how can you access
> >>> sdev->scsi_dh_data->kref in scsi_dh_activate (in patch 2/6) ? Wouldn't
> >>> it trip a oops ?
> >>>
> >> Test case is deleting both active and passive paths almost together
> during
> >>   the multipath testing.  Looks like DM picked up the active path
> failure
> >> first. Then failed the active path and started scheduling activate_path
> to
> >> failover to passive path. Passive path is also about to go down pretty
> soon.
> >> When the control was in scsi_dh_activate, Looks like scsi_dh_data was
> still
> >> valid because I did not see panic here.  But scsi_dh_data became NULL
> when
> >> control went to rdac_activate. That is when I hit the bug on.
> >>
> >> kernel BUG at /usr/src/packages/BUILD/lsi-scsi_dh_rdac-
> 01.00/obj/default/scsi_dh_rdac.c:232!
> >> RIP: 0010:  rdac_activate+0x257/0x387 [scsi_dh_rdac]
> >>
> >> My understanding is someone triggered scsi_dh->detach for passive path
> during
> >>   this small window.  Only way I could see problem go away is holding
> reference
> >> counts between these calls.  Did i miss anything here? See the code
> snippet below..
> >>
> > So, basically, the new patch just reduces the window of race. IOW, if it
> > just spins for few seconds (or preempted) just before calling the
> > kref_get() in the new patch, it will generate an oops as scsi_dh_data
> > would be NULL.

  That is correct. 

> >
> > Looks like you have to create a lock to protect scsi_dh_data and then
> > call kref_get() with the protection of lock.
> >
> >
> Second that.. Sounds like each kref_get/kref_put .. needs a lock
> protection not just this scenario.

  I agree. I will attempt one more set of patches. This time I will try to match get and put sequence in a one file. Please feel free to comment.





More information about the dm-devel mailing list