[dm-devel] [BUG] Oops when SCSI device under multipath is removed

Alan Stern stern at rowland.harvard.edu
Thu Aug 11 15:16:17 UTC 2011


On Thu, 11 Aug 2011, James Bottomley wrote:

> > > Well, it's just hiding the problem.  The essential problem is that only
> > > block has the correctly refcounted knowledge to know the last release of
> > > the queue reference.  Until that time, the holder of the reference can
> > > use the queue regardless of whether blk_cleanup_queue() has been called.
> > > This is the race you complain about since use of the queue involves the
> > > lock which should be guarded by QUEUE_DEAD checks.
> > > 
> > > This is essentially unfixable with function calls.  The only way to fix
> > > it is to have a callback model for freeing the external lock.
> > 
> > Assuming the queue is associated with a device, the queue could take a
> > reference to the device, dropping that reference when the queue is
> > freed.  Then the lock could safely be freed at the same time as the 
> > device.
> 
> If that assumption is correct, there's no point refcounting the queue at
> all because its use is entirely subordinated to the lifecycle of the
> associated device.

That's true.  Why wasn't it done that way originally?  Are there queues 
that aren't associated with devices?

>  Plus all the wittering about my previous patch is
> pointless, because blk_cleanup_queue() has to do the final put of the
> queue in the lock free path (otherwise the assumption is violated).
> 
> However, much as I'd like to accept this rosy view, the original oops
> that started all of this in 2.6.38 was someone caught something with a
> reference to a SCSI queue after the device release function had been
> called.

Not according to your commit log.  You wrote that the reference was
taken after scsi_remove_device() had been called -- but the device
release function is scsi_device_dev_release_usercontext().

Alan Stern




More information about the dm-devel mailing list