[dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

Chris Mason chris.mason at oracle.com
Tue Dec 7 14:21:45 UTC 2010


On Sun, Dec 05, 2010 at 12:47:11AM +0100, Matt wrote:
> > OK.
> 
> meanwhile I think I got some interesting news:
> 
> after some time of running (around 1 to 1.5 hours) I noticed the
> following BUG with ext4:
> 
> [ 4421.503477] ------------[ cut here ]------------
> [ 4421.503482] kernel BUG at fs/ext4/inode.c:2714!
>
> kernel compiled was from sources checked out at
> 1de3e3df917459422cb2aecac440febc8879d410

Looking at 1de3e3df917459422cb2aecac440febc8879d410:

Line 2714 in fs/ext4/inode.c is this:

       /*
         * If the page does not have buffers (for whatever reason),
         * try to create them using block_prepare_write.  If this
         * fails, redirty the page and move on.
         */
        if (!page_buffers(page)) {
	^^^^^^^^^^^^^^^^^^^^^^^^^^^
                if (block_prepare_write(page, 0, len,
                                        noalloc_get_block_write)) {
                redirty_page:
                        redirty_page_for_writepage(wbc, page);
                        unlock_page(page);
                        return 0;
                }
                commit_write = 1;
        }

Which means we're really hitting this:

/* If we *know* page->private refers to buffer_heads */
#define page_buffers(page)                                      \
        ({                                                      \
                BUG_ON(!PagePrivate(page));                     \
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                ((struct buffer_head *)page_private(page));     \
        })
#define page_has_buffers(page)  PagePrivate(page)

Looks like Ted fixed it here:

commit b1142e8fec6a594723e5054055a7b53379b90490
Author: Theodore Ts'o <tytso at mit.edu>
Date:   Thu Oct 28 17:33:57 2010 -0400

    ext4: BUG_ON fix: check if page has buffers before calling page_buffers()


Basically, once you hit this oops, ext4 is done.  No files you created
after the oops will be there when you reboot, and the rest of your
lockups etc are because the jbd process had some locks held when it
crashed.

Was there also a report of corruption w/dm-crypt and XFS?

Last night I ran dm-crypt + the cpu scalability patch + ext4 +
2.6.37-rc3 in a long stress, and it passed without any problems.  If
dm-crypt were not doing the IO properly, this test probably would have
found it (+/- strange block sizes, races with O_DIRECT and other exotic
fun).

-chris




More information about the dm-devel mailing list