[dm-devel] raid1d crash at boot

NeilBrown neilb at suse.de
Tue Nov 22 01:26:57 UTC 2011


On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux at rere.qmqm.pl>
wrote:

> On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > <James.Bottomley at HansenPartnership.com> wrote:
> > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > Thank for the report.
> > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > 
> > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux at rere.qmqm.pl>
> > > > wrote:
> > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > during 3.1-rcX.
> > > > > 
> > > > > [    6.246170] ------------[ cut here ]------------
> > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > 
> > > I can tell you what it is:
> > > 
> > >         /*
> > >          * Filesystem requests must transfer data.
> > >          */
> > >         BUG_ON(!req->nr_phys_segments);
> > > 
> > > But the fault is in the layer above SCSI.  It means something sent a
> > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > supposed to be impossible, hence the bug on.
> > 
> > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > doesn't help.
> > 
> > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> 
> The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> drives.  The boot doesn't survive to the point where the initrd script asks
> for md-crypt's key password.
>

That gives us lots of room for pointing the finger of blame, doesn't it?
I think it is -> his problem. :-)

From the md part of the stack trace it looks most like a write request.  It
could be a retried read, but that is extremely unlike that early in boot.

So presumably it is some sort of zero-length REQ_FLUSH or something like that.
md/raid1 will just pass those unchanged down. 
My guess is that ext4 is generating this and something in the stack is
stripping the REQ_FLUSH .... though why it even tries before asking for a
password is beyond me.

Maybe someone of dm-devel can help?

If not we might need to try a debugging patch like this:


diff --git a/block/blk-core.c b/block/blk-core.c
index f43c8a5..59cb2ad 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
 			goto end_io;
 		}
 	}
-
+	WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
 	if ((bio->bi_rw & REQ_DISCARD) &&
 	    (!blk_queue_discard(q) ||
 	     ((bio->bi_rw & REQ_SECURE) &&


NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20111122/86b1ae33/attachment.sig>


More information about the dm-devel mailing list