Oops in at __journal_drop_transaction

Jure Pe_ar pegasus at nerv.eu.org
Wed Feb 16 01:36:58 UTC 2005

One of our mx servers started misbehaving today (postfix would timeout
internally, load rising) and after I tried to reboot it, I got this:

Assertion failure in __journal_drop_transaction() at
fs/jbd/checkpoint.c:613: "transaction->t_forget == NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:613!
invalid operand: 0000 [#1]
Modules linked in: ipv6 aic79xx serverworks eepro100 sworks_agp agpgart
floppy evdev pcspkr ohci_hcd usbcore e100 mii capability commoncap ide_cd
ide_core cdrom rtc ext2 ext3 jbd mbcache sd_mod aic7xxx scsi_mod raid1 md
unix font vesafb cfbcopyarea cfbimgblt cfbfillrect
CPU:    0
EIP:    0060:[<f88a77a0>]    Not tainted
EFLAGS: 00010286   (2.6.8-1-686-smp)
EIP is at __journal_drop_transaction+0x350/0x3f7 [jbd]
eax: 00000071   ebx: c86f5620   ecx: c02d9fbc   edx: c02d9fbc
esi: f7af5800   edi: e9d5583c   ebp: c86f5620   esp: f70e3d58
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 523, threadinfo=f70e2000 task=f70a37f0)
Stack: f88ac120 f88ab380 f88acefc 00000265 f88acf4c c86f5620 f7af5800
       f7af5800 c86f5620 dbc24db0 00000000 f88a66e6 e9d5583c e9d5583c
       f88a72c8 e9d5583c e9d5583c 000000e1 c86f55c0 d865c500 f70e2000
Call Trace:
 [<f88a731a>] __journal_remove_checkpoint+0x4a/0xa0 [jbd]
 [<f88a66e6>] __try_to_free_cp_buf+0x76/0xc0 [jbd]
 [<f88a72c8>] __journal_clean_checkpoint_list+0xa8/0xb0 [jbd]
 [<f88a4958>] journal_commit_transaction+0x2b8/0x1690 [jbd]
 [<c011e420>] autoremove_wake_function+0x0/0x60
 [<c02347ca>] netif_receive_skb+0x1ba/0x230
 [<c011e420>] autoremove_wake_function+0x0/0x60
 [<c023450a>] net_tx_action+0x5a/0x160
 [<c0119f78>] recalc_task_prio+0xa8/0x1a0
 [<c029c9b7>] schedule+0x4b7/0x8a0
 [<c01296da>] del_timer_sync+0x9a/0xe0
 [<f88a8642>] kjournald+0xf2/0x2e0 [jbd]
 [<c011e420>] autoremove_wake_function+0x0/0x60
 [<c011e420>] autoremove_wake_function+0x0/0x60
 [<c01060d2>] ret_from_fork+0x6/0x14
 [<f88a8530>] commit_timeout+0x0/0x10 [jbd]
 [<f88a8550>] kjournald+0x0/0x2e0 [jbd]
 [<c01042c5>] kernel_thread_helper+0x5/0x10
Code: 0f 0b 65 02 fc ce 8a f8 e9 6e fd ff ff 8d 76 00 c7 04 24 20
 <6>note: kjournald[523] exited with preempt_count 3

The machine is still alive and kicking (and would not reboot even with
reboot -f), most probably because the system is on one disk and postfix
spool is on another one, that seems to be the cause of the problem. Relevant
/etc/fstab line:

/dev/sdc1 /var/spool/postfix ext3
rw,noatime,nodiratime,data=journal,commit=60 0 0

It also has a large journal, 400MB if I remember correctly.

Debian Sarge.


