(large, external) data journal BUG (Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL")

Matt Bernstein mb/ext3 at dcs.qmul.ac.uk
Wed Nov 16 08:52:44 UTC 2005


A couple of our important servers, both running FC4 but one i386 and one 
x86_64, have been crashing recently. They both are running ext3 
data=journal with large external journals and high commit intervals. 
Both machines use the gdth driver for their hardware RAID sets, if 
that's of any use. I think the hardware is good in both cases.

I hope someone finds this data useful enough to be able to fix the bug.

IMAP server crash (once only, thus far):

Assertion failure in __journal_drop_transaction() at 
fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL"
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at "fs/jbd/checkpoint.c":626
invalid operand: 0000 [1] SMP
Modules linked in: loop iptable_nat ip_conntrack_amanda ipt_ULOG 
ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables w83627hf 
eeprom lm85 i2c_sensor i2c_isa md5 ipv6 video button battery ac ohci_hcd 
i2c_amd8111 i2c_amd756 i2c_core shpchp e100 mii tg3 floppy sg 
dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod gdth sata_sil libata 
sd_mod scsi_mod
Pid: 1485, comm: kjournald Not tainted 2.6.12-1.1398_FC4smp
RIP: 0010:[<ffffffff8807d56f>] 
RSP: 0018:ffff8100fade9de8  EFLAGS: 00010292
RAX: 0000000000000074 RBX: ffff8100c5f0ea80 RCX: ffffffff8042d908
RDX: ffffffff8042d908 RSI: 0000000000000296 RDI: ffffffff8042d900
RBP: ffff8100f8b55000 R08: ffff81008234c040 R09: 0000000000000030
R10: 0000000000000000 R11: ffffffff8011d680 R12: ffff81003b333080
R13: ffff8100c5f0ea80 R14: ffff8100f8b55000 R15: 0000000000000000
FS:  00002aaaaadfcf00(0000) GS:ffffffff8050d780(0000) knlGS:00000000f7ff16c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaab51a0000 CR3: 00000000e2980000 CR4: 00000000000006e0
Process kjournald (pid: 1485, threadinfo ffff8100fade8000, task 
Stack: ffff8100020ba898 ffff81008caebce8 0000000000000000 ffffffff8807c9d2
        ffff8100f8b55024 0000000000000cf7 ffff8100f8b5515c 0000000000000000
        0000000000000000 0000000000000000
Call Trace:<ffffffff8807c9d2>{:jbd:journal_commit_transaction+4194}
        <ffffffff8010f76b>{child_rip+8} <ffffffff8807f3c0>{:jbd:kjournald+0}

Code: 0f 0b fe 15 08 88 ff ff ff ff 72 02 48 83 7b 50 00 74 34 49
RIP <ffffffff8807d56f>{:jbd:__journal_drop_transaction+319} RSP 
  <3>Debug: sleeping function called from invalid context at 
in_atomic():0, irqs_disabled():1

Call Trace:<ffffffff8013abd5>{profile_task_exit+21} 
        <ffffffff8022178d>{vgacon_cursor+221} <ffffffff8011066d>{die+77}
        <ffffffff8010f76b>{child_rip+8} <ffffffff8807f3c0>{:jbd:kjournald+0}

File server crash (has happened a few times now):

Assertion failure in __journal_drop_transaction() at 
fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:626!
invalid operand: 0000 [#1]
Modules linked in: loop nfsd exportfs lockd nfs_acl sunrpc autofs4 ipv6 
ip_conntrack_amanda ipt_REJECT ipt_state ip_conntrack iptable_filter 
ip_tables dm_mod video button battery ac ohci_hcd i2c_amd756 i2c_core 
3c59x mii ns83820 floppy sg ext3 jbd gdth sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f88a997c>]    Not tainted VLI
EFLAGS: 00010296   (2.6.13-1.1526_FC4smp)
EIP is at __journal_drop_transaction+0x117/0x2fa [jbd]
eax: 00000074   ebx: f064d2e0   ecx: c036fbf4   edx: 00000286
esi: f699a200   edi: c2f50000   ebp: e775df84   esp: c2f50ec4
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1168, threadinfo=c2f50000 task=c2e64020)
Stack: f88acfa8 f88b2e92 f88ada14 00000272 f88ada7c f064d2e0 f699a200 
        c2f50000 d142414c e775df84 f88a8f61 e775df84 f88a9700 c2f50000 
        f064d2e0 000000f5 e85cc160 defc4598 f699a200 00000000 defc4560 
Call Trace:
  [<f88a9781>] __journal_remove_checkpoint+0x56/0x75 [jbd]
  [<f88a8f61>] __try_to_free_cp_buf+0x31/0x68 [jbd]
  [<f88a9700>] __journal_clean_checkpoint_list+0x6f/0x9a [jbd]
  [<f88a7846>] journal_commit_transaction+0x147/0xff1 [jbd]
  [<c01295f7>] lock_timer_base+0x15/0x2f
  [<c0129803>] try_to_del_timer_sync+0x45/0x4d
  [<f88aa68b>] kjournald+0xc5/0x20d [jbd]
  [<f88aa5c0>] commit_timeout+0x0/0x5 [jbd]
  [<c01347c2>] autoremove_wake_function+0x0/0x37
  [<f88aa5c6>] kjournald+0x0/0x20d [jbd]
  [<c0101ca1>] kernel_thread_helper+0x5/0xb
Code: 44 24 10 7c da 8a f8 c7 44 24 0c 72 02 00 00 c7 44 24 08 14 da 8a 
f8 c7 44 24 04 92 2e 8b f8 c7 04 24 a8 cf 8a f8 e8 cb 7c 87 c7 <0f> 0b 
72 02 14 da 8a f8 8b 4b 2c 85 c9 74 34 c7 44 24 10 c4 d0

More information about the Ext3-users mailing list