[Linux-cachefs] Kernel traceback on 3.10.10

Mark Moseley moseleymark at gmail.com
Wed Sep 18 03:59:33 UTC 2013


I thought I'd be smart and give 3.11.1 a whirl instead, but got the same
error. Stack trace is almost identical (gdb output at the end).

kernel: [11242.664430] BUG: unable to handle kernel NULL pointer
dereference at 0000000000000010
kernel: [11242.715321] IP: [<ffffffff812a1903>]
__fscache_uncache_page+0x23/0x160
kernel: [11242.722639] PGD 0
kernel: [11242.724908] Thread overran stack, or stack corrupted
kernel: [11242.730464] Oops: 0000 [#1] SMP
kernel: [11242.734111] Modules linked in: dell_rbu wmi sb_edac edac_core
hid_generic usbhid hid
kernel: [11242.742907] CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
kernel: [11242.749336] Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS
1.3.5 08/21/2012
kernel: [11242.757710] task: ffff8804203460c0 ti: ffff880420346640 task.ti:
ffff880420346640
kernel: [11242.766082] RIP: 0010:[<ffffffff812a1903>]  [<ffffffff812a1903>]
__fscache_uncache_page+0x23/0x160
kernel: [11242.776126] RSP: 0018:ffff8801053af878  EFLAGS: 00210286
kernel: [11242.782092] RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX:
ffff88022ffae5e8
kernel: [11242.790087] RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI:
ffff8800be2f8780
kernel: [11242.798090] RBP: ffff8801053af898 R08: 0000000000000001 R09:
0000000000000003
kernel: [11242.806092] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffea00055ff440
kernel: [11242.814093] R13: 0000000000001000 R14: ffff8800c50be538 R15:
0000000000000000
kernel: [11242.822097] FS:  0000000000000000(0000)
GS:ffff88042fc60000(0063) knlGS:00000000e439c700
kernel: [11242.831188] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
kernel: [11242.837640] CR2: 0000000000000010 CR3: 0000000001d8f000 CR4:
00000000000607f0
kernel: [11242.845643] Stack:
kernel: [11242.847914]  ffffea00055ff440 ffffea00055ff440 ffff8800be2f8780
0000000000001000
kernel: [11242.856298]  ffff8801053af8b8 ffffffff81365a72 ffffea00055ff440
0000000000000000
kernel: [11242.864680]  ffff8801053af8e8 ffffffff813553d5 000000000000000e
ffffea00055ff440
kernel: [11242.873061] Call Trace:
kernel: [11242.875827]  [<ffffffff81365a72>]
__nfs_fscache_invalidate_page+0x42/0x70
kernel: [11242.883448]  [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
kernel: [11242.890098]  [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
kernel: [11242.896748]  [<ffffffff811b90ad>]
truncate_inode_pages_range.part.12+0x14d/0x620
kernel: [11242.905072]  [<ffffffff81d6387d>] ?
__mutex_lock_slowpath+0x1fd/0x2e0
kernel: [11242.912302]  [<ffffffff811b95d3>]
truncate_inode_pages_range+0x53/0x70
kernel: [11242.919625]  [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
kernel: [11242.926364]  [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
kernel: [11242.932915]  [<ffffffff81356840>]
nfs_setattr_update_inode+0xa0/0x120
kernel: [11242.940148]  [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
kernel: [11242.946605]  [<ffffffff81357f78>] nfs_setattr+0xc8/0x150
kernel: [11242.952574]  [<ffffffff8122d95b>] notify_change+0x1cb/0x390
kernel: [11242.958836]  [<ffffffff8120a55b>] do_truncate+0x7b/0xc0
kernel: [11242.964708]  [<ffffffff8121f96c>] do_last+0xa4c/0xfd0
kernel: [11242.970385]  [<ffffffff8121ffbc>] path_openat+0xcc/0x670
kernel: [11242.976356]  [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
kernel: [11242.982325]  [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
kernel: [11242.988392]  [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
kernel: [11242.994658]  [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24
kernel: [11243.001013] Code: e8 03 67 f1 ff eb 84 90 66 66 66 66 90 55 48
89 e5 48 83 ec 20 48 89 5d e8 48 89 fb 4c 89 65 f0 4c 89 6d f8 49 89 f4 48
8b 43 48 <80> 78 10 00 0f 84 c0 00 00 00 4d 85 e4 0f 84 ed 00 00 00 f0 ff
kernel: [11243.023181] RIP  [<ffffffff812a1903>]
__fscache_uncache_page+0x23/0x160
kernel: [11243.030623]  RSP <ffff8801053af878>
kernel: [11243.034548] CR2: 0000000000000010
kernel: [11243.038929] ---[ end trace 1eecf1a047ce4413 ]---




This next part seems to be related. It happened on the previous stacktrace
too but I chalked it up to coincidence. I had run that same 3.10.10 (but
having rebooted without fscache turned on) over the weekend without any
error like this. Could still be coincidental but so far I've only seen it
after (though not immediately after) the fscache traceback. The load on the
box skyrockets and an 8 core box is up near 60% in sys (usually quite a bit
less, like in the 20%-40% range at most).

kernel: [11607.613260] ------------[ cut here ]------------
kernel: [11607.618538] WARNING: CPU: 0 PID: 6824 at
net/sched/sch_generic.c:260 dev_watchdog+0x267/0x270()
kernel: [11607.628403] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed
out
kernel: [11607.635526] Modules linked in: dell_rbu wmi sb_edac edac_core
hid_generic usbhid hid
kernel: [11607.645083] CPU: 0 PID: 6824 Comm: /usr/sbin/apach Tainted:
G      D      3.11.1 #1
kernel: [11607.653777] Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS
1.3.5 08/21/2012
kernel: [11607.662272]  ffffffff8112c2ac 0000000000000000 0000000000000104
ffff880227c03d38
kernel: [11607.671013]  ffffffff81d59846 000000000000257e ffff880227c03d88
ffff880227c03d78
kernel: [11607.679758]  ffffffff810bafbc ffff880227c03d88 ffffffff82607450
0000000000000104
kernel: [11607.688501] Call Trace:
kernel: [11607.691332]  <IRQ>  [<ffffffff8112c2ac>] ?
print_modules+0x6c/0xd0
kernel: [11607.698526]  [<ffffffff81d59846>] dump_stack+0x46/0x58
kernel: [11607.704377]  [<ffffffff810bafbc>] warn_slowpath_common+0x8c/0xc0
kernel: [11607.711196]  [<ffffffff810bb111>] warn_slowpath_fmt+0x61/0x70
kernel: [11607.717728]  [<ffffffff81bf7376>] ? destroy_conntrack+0xa6/0xe0
kernel: [11607.724454]  [<ffffffff81bb27b7>] dev_watchdog+0x267/0x270
kernel: [11607.730687]  [<ffffffff81bb2550>] ? pfifo_fast_dequeue+0xf0/0xf0
kernel: [11607.737508]  [<ffffffff81bb2550>] ? pfifo_fast_dequeue+0xf0/0xf0
kernel: [11607.744331]  [<ffffffff810ca788>] call_timer_fn+0x38/0x140
kernel: [11607.750563]  [<ffffffff81bb2550>] ? pfifo_fast_dequeue+0xf0/0xf0
kernel: [11607.757382]  [<ffffffff810cbaec>] run_timer_softirq+0x25c/0x2b0
kernel: [11607.764099]  [<ffffffff810c07ff>] __do_softirq+0xcf/0x270
kernel: [11607.770237]  [<ffffffff810c0b30>] ? irq_exit+0xb0/0xd0
kernel: [11607.776087]  [<ffffffff81d71f8c>] call_softirq+0x1c/0x30
kernel: [11607.782120]  <EOI>  [<ffffffff81055e9d>] do_softirq+0x4d/0x80
kernel: [11607.788821]  [<ffffffff810c0724>] local_bh_enable+0x84/0x90
kernel: [11607.795157]  [<ffffffff81cbb25f>] ipt_do_table+0x20f/0x5b0
kernel: [11607.801400]  [<ffffffff81bff1f6>] ?
__nf_ct_ext_add_length+0x156/0x2a0
kernel: [11607.808806]  [<ffffffff81cbde45>] iptable_filter_hook+0x55/0xa0
kernel: [11607.815533]  [<ffffffff81beff5f>] nf_iterate+0x8f/0xd0
kernel: [11607.821379]  [<ffffffff81c62150>] ?
ip_forward_options+0x200/0x200
kernel: [11607.828395]  [<ffffffff81bf001d>] nf_hook_slow+0x7d/0x150
kernel: [11607.834537]  [<ffffffff81c62150>] ?
ip_forward_options+0x200/0x200
kernel: [11607.841548]  [<ffffffff81c64312>] __ip_local_out+0x72/0x80
kernel: [11607.847786]  [<ffffffff81c64339>] ip_local_out+0x19/0x40
kernel: [11607.853820]  [<ffffffff81c644aa>] ip_queue_xmit+0x14a/0x3d0
kernel: [11607.860151]  [<ffffffff81c7dcff>] tcp_transmit_skb+0x44f/0x950
kernel: [11607.866779]  [<ffffffff81b75679>] ? __alloc_skb+0x89/0x2a0
kernel: [11607.873016]  [<ffffffff81c7e3c8>] tcp_write_xmit+0x118/0xaa0
kernel: [11607.879445]  [<ffffffff812140a3>] ?
__check_object_size+0x53/0x140
kernel: [11607.886460]  [<ffffffff81c7edfc>]
__tcp_push_pending_frames+0x4c/0xd0
kernel: [11607.893768]  [<ffffffff81c6f29d>] tcp_sendmsg+0x15d/0x1a30
kernel: [11607.900005]  [<ffffffff81c9c6eb>] inet_sendmsg+0x7b/0xc0
kernel: [11607.906051]  [<ffffffff81572b48>] ?
security_socket_sendmsg+0x38/0x50
kernel: [11607.913360]  [<ffffffff81b6ab05>] sock_aio_write+0x165/0x190
kernel: [11607.919787]  [<ffffffff81b6a9a0>] ?
__sock_recv_ts_and_drops+0x160/0x160
kernel: [11607.927387]  [<ffffffff8120beba>] do_sync_readv_writev+0x9a/0xe0
kernel: [11607.934210]  [<ffffffff8120ddef>] do_readv_writev+0xdf/0x310
kernel: [11607.940638]  [<ffffffff8120e07c>] vfs_writev+0x5c/0x80
kernel: [11607.946485]  [<ffffffff8120e247>] SyS_writev+0x57/0xd0
kernel: [11607.952335]  [<ffffffff81d706f0>] tracesys+0xe0/0xe5
kernel: [11607.957987] ---[ end trace 1eecf1a047ce4414 ]---
kernel: [11607.963255] tg3 0000:02:00.0 eth0: transmit timed out, resetting
kernel: [11609.228275] tg3 0000:02:00.0 eth0: 0x00000000: 0x165f14e4,
0x00100406, 0x02000000, 0x00800010
kernel: [11609.237934] tg3 0000:02:00.0 eth0: 0x00000010: 0xd90a000c,
0x00000000, 0xd90b000c, 0x00000000
kernel: [11609.247595] tg3 0000:02:00.0 eth0: 0x00000020: 0xd90c000c,
0x00000000, 0x00000000, 0x04f81028
kernel: [11609.257256] tg3 0000:02:00.0 eth0: 0x00000030: 0xdd000000,
0x00000048, 0x00000000, 0x0000010f
kernel: [11609.266918] tg3 0000:02:00.0 eth0: 0x00000040: 0x00000000,
0x1c000000, 0xc8035001, 0x64002008
kernel: [11609.276580] tg3 0000:02:00.0 eth0: 0x00000050: 0x818c5803,
0x78000000, 0x0086a005, 0x00000000
.... lots more ....
kernel: [11612.376448] tg3 0000:02:00.0 eth0: 0: Host status block
[00000001:0000009c:(0000:01d6:0000):(0000:0113)]
kernel: [11612.387179] tg3 0000:02:00.0 eth0: 0: NAPI info
[0000001c:0000001c:(0113:0026:01ff):0000:(04de:0000:0000:0000)]
kernel: [11612.398590] tg3 0000:02:00.0 eth0: 1: Host status block
[00000001:00000019:(0000:0000:0000):(00fd:0000)]
kernel: [11612.409322] tg3 0000:02:00.0 eth0: 1: NAPI info
[00000019:00000019:(0000:0000:01ff):00fd:(00fd:00fd:0000:0000)]
kernel: [11612.420732] tg3 0000:02:00.0 eth0: 2: Host status block
[00000001:00000006:(0214:0000:0000):(0000:0000)]
kernel: [11612.431462] tg3 0000:02:00.0 eth0: 2: NAPI info
[00000006:00000006:(0000:0000:01ff):0214:(0214:0214:0000:0000)]
kernel: [11612.442874] tg3 0000:02:00.0 eth0: 3: Host status block
[00000001:000000c1:(0000:0000:0000):(0000:0000)]
kernel: [11612.453606] tg3 0000:02:00.0 eth0: 3: NAPI info
[000000a4:000000a4:(0000:0000:01ff):0246:(0246:0246:0000:0000)]
kernel: [11612.465017] tg3 0000:02:00.0 eth0: 4: Host status block
[00000001:000000ee:(0000:0000:0ebf):(0000:0000)]
kernel: [11612.475748] tg3 0000:02:00.0 eth0: 4: NAPI info
[000000ee:000000ee:(0000:0000:01ff):0ebf:(06bf:06bf:0000:0000)]
kernel: [11612.530306] tg3 0000:02:00.0 eth0: Link is down
kernel: [11616.446912] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full
duplex
kernel: [11616.454423] tg3 0000:02:00.0 eth0: Flow control is on for TX and
on for RX
kernel: [11616.462213] tg3 0000:02:00.0 eth0: EEE is disabled


This is from the 3.11.1 vmlinux:

(gdb) disas __fscache_uncache_page
Dump of assembler code for function __fscache_uncache_page:
   0xffffffff812a18e0 <+0>:    callq  0xffffffff81d6ff00
   0xffffffff812a18e5 <+5>:    push   %rbp
   0xffffffff812a18e6 <+6>:    mov    %rsp,%rbp
   0xffffffff812a18e9 <+9>:    sub    $0x20,%rsp
   0xffffffff812a18ed <+13>:    mov    %rbx,-0x18(%rbp)
   0xffffffff812a18f1 <+17>:    mov    %rdi,%rbx
   0xffffffff812a18f4 <+20>:    mov    %r12,-0x10(%rbp)
   0xffffffff812a18f8 <+24>:    mov    %r13,-0x8(%rbp)
   0xffffffff812a18fc <+28>:    mov    %rsi,%r12
   0xffffffff812a18ff <+31>:    mov    0x48(%rbx),%rax
   0xffffffff812a1903 <+35>:    cmpb   $0x0,0x10(%rax)
   0xffffffff812a1907 <+39>:    je     0xffffffff812a19cd
<__fscache_uncache_page+237>
   0xffffffff812a190d <+45>:    test   %r12,%r12
   0xffffffff812a1910 <+48>:    je     0xffffffff812a1a03
<__fscache_uncache_page+291>
   0xffffffff812a1916 <+54>:    lock incl 0x19e8a07(%rip)        #
0xffffffff82c8a324
   0xffffffff812a191d <+61>:    mov    (%r12),%rax
   0xffffffff812a1921 <+65>:    test   $0x10,%ah
   0xffffffff812a1924 <+68>:    jne    0xffffffff812a1938
<__fscache_uncache_page+88>
   0xffffffff812a1926 <+70>:    mov    -0x18(%rbp),%rbx
   0xffffffff812a192a <+74>:    mov    -0x10(%rbp),%r12
   0xffffffff812a192e <+78>:    mov    -0x8(%rbp),%r13
   0xffffffff812a1932 <+82>:    leaveq
   0xffffffff812a1933 <+83>:    retq
   0xffffffff812a1934 <+84>:    nopl   0x0(%rax)
   0xffffffff812a1938 <+88>:    lea    0x10(%rbx),%r13
   0xffffffff812a193c <+92>:    mov    %r13,%rdi
   0xffffffff812a193f <+95>:    callq  0xffffffff81d66ef0 <_raw_spin_lock>
   0xffffffff812a1944 <+100>:    mov    0x40(%rbx),%rdi
   0xffffffff812a1948 <+104>:    test   %rdi,%rdi
   0xffffffff812a194b <+107>:    je     0xffffffff812a19b0
<__fscache_uncache_page+208>
   0xffffffff812a194d <+109>:    lock andb $0xfd,0x70(%rbx)
   0xffffffff812a1952 <+114>:    lock btrl $0xc,(%r12)
   0xffffffff812a1959 <+121>:    sbb    %eax,%eax
   0xffffffff812a195b <+123>:    test   %eax,%eax
   0xffffffff812a195d <+125>:    je     0xffffffff812a19b7
<__fscache_uncache_page+215>
   0xffffffff812a195f <+127>:    mov    0x10(%rdi),%rax
   0xffffffff812a1963 <+131>:    mov    (%rax),%rdx
   0xffffffff812a1966 <+134>:    cmpq   $0x0,0x98(%rdx)
   0xffffffff812a196e <+142>:    je     0xffffffff812a19b7
<__fscache_uncache_page+215>
   0xffffffff812a1970 <+144>:    lock incl 0x19e88f1(%rip)        #
0xffffffff82c8a268
   0xffffffff812a1977 <+151>:    jno    0xffffffff812a1982
<__fscache_uncache_page+162>
   0xffffffff812a1979 <+153>:    lock decl 0x19e88e8(%rip)        #
0xffffffff82c8a268
   0xffffffff812a1980 <+160>:    int    $0x4
   0xffffffff812a1982 <+162>:    mov    (%rax),%rax
   0xffffffff812a1985 <+165>:    add    $0xffffffffffffff80,%rdi
   0xffffffff812a1989 <+169>:    mov    %r12,%rsi
   0xffffffff812a198c <+172>:    callq  *0x98(%rax)
   0xffffffff812a1992 <+178>:    lock decl 0x19e88cf(%rip)        #
0xffffffff82c8a268
   0xffffffff812a1999 <+185>:    jno    0xffffffff812a19a4
<__fscache_uncache_page+196>
   0xffffffff812a199b <+187>:    lock incl 0x19e88c6(%rip)        #
0xffffffff82c8a268
   0xffffffff812a19a2 <+194>:    int    $0x4
   0xffffffff812a19a4 <+196>:    jmp    0xffffffff812a1926
<__fscache_uncache_page+70>
   0xffffffff812a19a6 <+198>:    nopw   %cs:0x0(%rax,%rax,1)
---Type <return> to continue, or q <return> to quit---
   0xffffffff812a19b0 <+208>:    lock andb $0xef,0x1(%r12)
   0xffffffff812a19b7 <+215>:    mov    %r13,%rdi
   0xffffffff812a19ba <+218>:    callq  0xffffffff81d66e40
<_raw_spin_unlock>
   0xffffffff812a19bf <+223>:    mov    -0x18(%rbp),%rbx
   0xffffffff812a19c3 <+227>:    mov    -0x10(%rbp),%r12
   0xffffffff812a19c7 <+231>:    mov    -0x8(%rbp),%r13
   0xffffffff812a19cb <+235>:    leaveq
   0xffffffff812a19cc <+236>:    retq
   0xffffffff812a19cd <+237>:    mov    $0xffffffff8250dc56,%rdi
   0xffffffff812a19d4 <+244>:    xor    %eax,%eax
   0xffffffff812a19d6 <+246>:    callq  0xffffffff81d4fb02 <printk>
   0xffffffff812a19db <+251>:    mov    $0xffffffff8250dc5a,%rdi
   0xffffffff812a19e2 <+258>:    xor    %eax,%eax
   0xffffffff812a19e4 <+260>:    callq  0xffffffff81d4fb02 <printk>
   0xffffffff812a19e9 <+265>:    mov    0x48(%rbx),%rax
   0xffffffff812a19ed <+269>:    xor    %edx,%edx
   0xffffffff812a19ef <+271>:    mov    $0xffffffff8250de53,%rdi
   0xffffffff812a19f6 <+278>:    movzbl 0x10(%rax),%esi
   0xffffffff812a19fa <+282>:    xor    %eax,%eax
   0xffffffff812a19fc <+284>:    callq  0xffffffff81d4fb02 <printk>
   0xffffffff812a1a01 <+289>:    ud2
   0xffffffff812a1a03 <+291>:    mov    $0xffffffff8250dc56,%rdi
   0xffffffff812a1a0a <+298>:    xor    %eax,%eax
   0xffffffff812a1a0c <+300>:    callq  0xffffffff81d4fb02 <printk>
   0xffffffff812a1a11 <+305>:    mov    $0xffffffff8250dc5a,%rdi
   0xffffffff812a1a18 <+312>:    xor    %eax,%eax
   0xffffffff812a1a1a <+314>:    callq  0xffffffff81d4fb02 <printk>
   0xffffffff812a1a1f <+319>:    xor    %edx,%edx
   0xffffffff812a1a21 <+321>:    xor    %esi,%esi
   0xffffffff812a1a23 <+323>:    mov    $0xffffffff8250de53,%rdi
   0xffffffff812a1a2a <+330>:    xor    %eax,%eax
   0xffffffff812a1a2c <+332>:    callq  0xffffffff81d4fb02 <printk>
   0xffffffff812a1a31 <+337>:    ud2
End of assembler dump.


On Wed, Sep 11, 2013 at 3:56 PM, David Howells <dhowells at redhat.com> wrote:

> Mark Moseley <moseleymark at gmail.com> wrote:
>
> > What gdb commands should I use, to be of most use?
>
> Either:
>
>         gdb fscache.ko
>
> or:
>
>         gdb vmlinux
>
> as appropriate.  Then:
>
>         disas __fscache_uncache_page
>
> Note that the code may no longer match after you've recompiled, though it's
> worth a look anyway:-/
>
> David
>



More information about the Linux-cachefs mailing list