[Crash-utility] debug 3th part module which oops the system
Han Pingtian
hanpt at linux.vnet.ibm.com
Thu Jul 4 01:52:35 UTC 2013
On Wed, Jul 03, 2013 at 09:04:50AM -0400, Dave Anderson wrote:
>
>
> ----- Original Message -----
> > Hey there,
> >
> > I'm trying to analyse the vmcore come from an oops caused by a module. The
> > module
> > comes from here:
> >
> > http://www.linuxforu.com/2011/01/understanding-a-kernel-oops
> >
> > This web page wants to teach how to analyse kernel oops. It provided a
> > module named 'oops', which triggers a NULL pointer dereference in its
> > init function.
> >
> > The problem is I cannot figure out how to use crash to analyse vmcore:
> >
> > GNU gdb (GDB) 7.0
> > Copyright (C) 2009 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "powerpc64-unknown-linux-gnu"...
> >
> > KERNEL: /usr/lib/debug/lib/modules/2.6.18-348.el5/vmlinux
> > DUMPFILE: /var/crash/127.0.0.1-2013-07-01-04:43/vmcore
> > CPUS: 20
> > DATE: Mon Jul 1 04:38:49 2013
> > UPTIME: 00:33:44
> > LOAD AVERAGE: 0.22, 0.18, 0.07
> > TASKS: 482
> > NODENAME: lawlp3.upt.austin.ibm.com
> > RELEASE: 2.6.18-348.el5
> > VERSION: #1 SMP Wed Nov 28 21:23:52 EST 2012
> > MACHINE: ppc64 (3550 Mhz)
> > MEMORY: 3.2 GB
> > PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for
> > details)
> > PID: 5402
> > COMMAND: "insmod"
> > TASK: c0000000cfa35150 [THREAD_INFO: c0000000ce5d0000]
> > CPU: 15
> > STATE: TASK_RUNNING (PANIC)
> >
> > crash> log
> > ... ....
> > oops: module license 'unspecified' taints kernel.
> > oops from the module
> > Unable to handle kernel paging request for data at address 0x00000000
> > Faulting instruction address: 0xd000000001460060
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > SMP NR_CPUS=128 NUMA
> > Modules linked in: oops(PU) nfsd exportfs auth_rpcgss autofs4 hidp nfs
> > nfs_acl rfcomm l2cap bluetooth lockd sunrpc ip6t_REJECT xt_tcpudp
> > ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_addr ib_cm
> > ib_sa ib_mad iw_cm iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio
> > cxgb3i libcxgbi libiscsi_tcp libiscsi2 scsi_transport_iscsi2
> > scsi_transport_iscsi dm_multipath scsi_dh snd_powermac snd_seq_dummy
> > snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> > snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore i2c_core
> > parport_pc lp parport sg iw_cxgb3 ib_core cxgb3 ibmveth 8021q dm_raid45
> > dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log
> > dm_mod lpfc ibmvfc scsi_transport_fc ibmvscsic sd_mod scsi_mod ext3 jbd
> > uhci_hcd ohci_hcd ehci_hcd
> > NIP: D000000001460060 LR: D000000001460050 CTR: 0000000000000004
> > REGS: c0000000ce5d39b0 TRAP: 0300 Tainted: P ---- (2.6.18-348.el5)
> > MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022482 XER: 00000006
> > DAR: 0000000000000000, DSISR: 0000000042000000
> > TASK = c0000000cfa35150[5402] 'insmod' THREAD: c0000000ce5d0000 CPU: 15
> > GPR00: D000000001460050 C0000000CE5D3C30 D00000000146C930 0000000000000000
> > GPR04: 8000000000001032 0000000000000000 0000000000000000 0000000000000000
> > GPR08: 0000000000000000 0000000000000000 C0000000015FBB68 0000000000000000
> > GPR12: 0000000000000000 C000000000570B80 0000000000000000 D0000000012B1850
> > GPR16: D0000000012B1810 D0000000014601B0 0000000000000000 0000000000000000
> > GPR20: 0000000000000028 D0000000012B0CE9 C0000000005A12E8 0000000000000029
> > GPR24: D0000000012A0000 000000000000002A C0000000CD6F5A80 C0000000CD6F5AB0
> > GPR28: C0000000005A18C8 D000000001460680 D00000000146C900 D000000001460680
> > NIP [D000000001460060] .my_oops_init+0x2c/0xd4 [oops]
> > LR [D000000001460050] .my_oops_init+0x1c/0xd4 [oops]
> > Call Trace:
> > [C0000000CE5D3C30] [C000000000098944] .sys_init_module+0x1a88/0x1d18
> > (unreliable)
> > [C0000000CE5D3E30] [C0000000000086A4] syscall_exit+0x0/0x40
> > Instruction dump:
> > 4e800020 7c0802a6 fbc1fff0 ebc28000 f8010010 f821ff81 e87e8008 4800002d
> > e8410028 39200000 38210080 38600000 <91290000> e8010010 ebc1fff0 7c0803a6
> > <0>Sending IPI to other cpus...
> > crash> whatis my_oops_init
> > whatis: gdb request failed: whatis my_oops_init
> > crash> mod -s oops
> > MODULE NAME SIZE OBJECT FILE
> > d000000001460680 oops 18752 /lib/modules/2.6.18-348.el5/kernel/oops.ko
> > crash> whatis my_oops_init
> > int my_oops_init(void);
> > crash> dis -l .my_oops_init
> > <nothing outputed>
> > crash> sym -m oops
> > d000000001460000 MODULE START: oops
> > d000000001460000 (t) .my_oops_exit
> > d000000001460000 (t) .cleanup_module
> > d000000001460034 (t) .my_oops_init
> > d000000001460034 (t) .init_module
> > d000000001460130 (r) ____versions
> > d000000001460130 (r) __versions
> > d000000001460680 (D) __this_module
> > d000000001464910 (D) cleanup_module
> > d000000001464910 (d) my_oops_exit
> > d000000001464920 (D) init_module
> > d000000001464920 (d) my_oops_init
> > d000000001464940 MODULE END: oops
> > crash> bt
> > PID: 5402 TASK: c0000000cfa35150 CPU: 15 COMMAND: "insmod"
> >
> > R0: d000000001460050 R1: c0000000ce5d3c30 R2: d00000000146c930
> > R3: 0000000000000000 R4: 8000000000001032 R5: 0000000000000000
> > R6: 0000000000000000 R7: 0000000000000000 R8: 0000000000000000
> > R9: 0000000000000000 R10: c0000000015fbb68 R11: 0000000000000000
> > R12: 0000000000000000 R13: c000000000570b80 R14: 0000000000000000
> > R15: d0000000012b1850 R16: d0000000012b1810 R17: d0000000014601b0
> > R18: 0000000000000000 R19: 0000000000000000 R20: 0000000000000028
> > R21: d0000000012b0ce9 R22: c0000000005a12e8 R23: 0000000000000029
> > R24: d0000000012a0000 R25: 000000000000002a R26: c0000000cd6f5a80
> > R27: c0000000cd6f5ab0 R28: c0000000005a18c8 R29: d000000001460680
> > R30: d00000000146c900 R31: d000000001460680
> > NIP: d000000001460060 MSR: 8000000000009032 OR3: c0000000005a13c0
> > CTR: 0000000000000004 LR: d000000001460050 XER: 0000000000000006
> > CCR: 0000000024022482 MQ: c0000000cd6f5ab0 DAR: 0000000000000000
> > DSISR: 0000000042000000 Syscall Result: 0000000000000000
> > NIP [d000000001460060] .init_module
> > LR [d000000001460050] .init_module
> >
> > #0 [c0000000ce5d3c30] .sys_init_module at c000000000098944
> > #1 [c0000000ce5d3e30] syscall_exit at c0000000000086a4
> > syscall [c00] exception frame:
> > R0: 0000000000000080 R1: 00000000ff91fb60 R2: 000000000fff8eb0
> > R3: 0000000010020028 R4: 000000000001caf8 R5: 0000000010020018
> > R6: 000000000000002d R7: fffffffffeff0000 R8: 000000000002ffe0
> > R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
> > R12: 0000000000000000 R13: 000000001001959c R14: 0000000000000000
> > R15: 0000000000000000 R16: 0000000000000000 R17: 0000000000000000
> > R18: 0000000000000000 R19: 0000000000000000 R20: 0000000000000000
> > R21: 0000000000000000 R22: 0000000000000000 R23: 0000000000000000
> > R24: 000000000ffbf280 R25: 00000000ff91fdf0 R26: 0000000010020018
> > R27: 00000000ff91ff05 R28: 0000000000020000 R29: 000000000001caf8
> > R30: 0000000010020028 R31: 0000000000000003
> > NIP: 000000000ff0496c MSR: 000000000000d032 OR3: 0000000010020028
> > CTR: 000000000ff04964 LR: 0000000010000bf8 XER: 0000000000000000
> > CCR: 0000000044000484 MQ: 0000000002756c28 DAR: 000000001004002c
> > DSISR: 0000000042000000 Syscall Result: 0000000000000000
> >
> > crash>
> >
> > as you can see, the 'bt' command says the problem is at '.init_module',
> > but in fact it should come from '.my_oops_init'. But 'dis -l
> > .my_oops_init' shows nothing. I cannot use crash to figure out which line
> > of source code caused the oops. But using gdb as being stated in the web page
> > I
> > can find the code line easily.
> >
> > Please help. Thanks.
>
> I'm not well-versed in ppc64, but the issue seems to be related
> to the fact that .my_oops_init and .init_module are both being
> assigned the same virtual address:
>
> d000000001460034 (t) .my_oops_init
> d000000001460034 (t) .init_module
>
> If you do an "nm -Bn" on the oops.ko file, do they show the same
> offset value?
Thanks, Dave. Looks like they have the same offset, both are zero:
$ nm -Bn oops.ko
U .printk
0000000000000000 T .cleanup_module
0000000000000000 T .init_module
0000000000000000 t .my_oops_exit
0000000000000000 t .my_oops_init
0000000000000000 r ____versions
0000000000000000 r __mod_srcversion29
0000000000000000 D __this_module
0000000000000000 D cleanup_module
0000000000000000 d my_oops_exit
0000000000000010 D init_module
0000000000000010 d my_oops_init
0000000000000028 r __module_depends
0000000000000038 r __mod_vermagic5
But why gdb isn't affected by the same offset?
More information about the Crash-utility
mailing list