[Crash-utility] debug 3th part module which oops the system

Han Pingtian hanpt at linux.vnet.ibm.com
Tue Jul 9 03:31:42 UTC 2013


On Mon, Jul 08, 2013 at 10:59:53AM -0400, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
> > On Wed, Jul 03, 2013 at 09:04:50AM -0400, Dave Anderson wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > Hey there,
> > > > 
> > > > I'm trying to analyse the vmcore come from an oops caused by a module.
> > > > The
> > > > module
> > > > comes from here:
> > > > 
> > > >     http://www.linuxforu.com/2011/01/understanding-a-kernel-oops
> > > > 
> > > > This web page wants to teach how to analyse kernel oops. It provided a
> > > > module named 'oops', which triggers a NULL pointer dereference in its
> > > > init function.
> > > > 
> > > > The problem is I cannot figure out how to use crash to analyse vmcore:
> > > > 
> > > > GNU gdb (GDB) 7.0
> > > > Copyright (C) 2009 Free Software Foundation, Inc.
> > > > License GPLv3+: GNU GPL version 3 or later
> > > > <http://gnu.org/licenses/gpl.html>
> > > > This is free software: you are free to change and redistribute it.
> > > > There is NO WARRANTY, to the extent permitted by law.  Type "show
> > > > copying"
> > > > and "show warranty" for details.
> > > > This GDB was configured as "powerpc64-unknown-linux-gnu"...
> > > > 
> > > >       KERNEL: /usr/lib/debug/lib/modules/2.6.18-348.el5/vmlinux
> > > >     DUMPFILE: /var/crash/127.0.0.1-2013-07-01-04:43/vmcore
> > > >         CPUS: 20
> > > >         DATE: Mon Jul  1 04:38:49 2013
> > > >       UPTIME: 00:33:44
> > > > LOAD AVERAGE: 0.22, 0.18, 0.07
> > > >        TASKS: 482
> > > >     NODENAME: lawlp3.upt.austin.ibm.com
> > > >      RELEASE: 2.6.18-348.el5
> > > >      VERSION: #1 SMP Wed Nov 28 21:23:52 EST 2012
> > > >      MACHINE: ppc64  (3550 Mhz)
> > > >       MEMORY: 3.2 GB
> > > >        PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log
> > > >        for
> > > >        details)
> > > >          PID: 5402
> > > >      COMMAND: "insmod"
> > > >         TASK: c0000000cfa35150  [THREAD_INFO: c0000000ce5d0000]
> > > >          CPU: 15
> > > >        STATE: TASK_RUNNING (PANIC)
> > > > 
> > > > crash> log
> > > > ... ....
> > > > oops: module license 'unspecified' taints kernel.
> > > > oops from the module
> > > > Unable to handle kernel paging request for data at address 0x00000000
> > > > Faulting instruction address: 0xd000000001460060
> > > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > > SMP NR_CPUS=128 NUMA
> > > > Modules linked in: oops(PU) nfsd exportfs auth_rpcgss autofs4 hidp nfs
> > > > nfs_acl rfcomm l2cap bluetooth lockd sunrpc ip6t_REJECT xt_tcpudp
> > > > ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_addr
> > > > ib_cm
> > > > ib_sa ib_mad iw_cm iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio
> > > > cxgb3i libcxgbi libiscsi_tcp libiscsi2 scsi_transport_iscsi2
> > > > scsi_transport_iscsi dm_multipath scsi_dh snd_powermac snd_seq_dummy
> > > > snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> > > > snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore i2c_core
> > > > parport_pc lp parport sg iw_cxgb3 ib_core cxgb3 ibmveth 8021q dm_raid45
> > > > dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror
> > > > dm_log
> > > > dm_mod lpfc ibmvfc scsi_transport_fc ibmvscsic sd_mod scsi_mod ext3 jbd
> > > > uhci_hcd ohci_hcd ehci_hcd
> > > > NIP: D000000001460060 LR: D000000001460050 CTR: 0000000000000004
> > > > REGS: c0000000ce5d39b0 TRAP: 0300   Tainted: P     ----  (2.6.18-348.el5)
> > > > MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24022482  XER: 00000006
> > > > DAR: 0000000000000000, DSISR: 0000000042000000
> > > > TASK = c0000000cfa35150[5402] 'insmod' THREAD: c0000000ce5d0000 CPU: 15
> > > > GPR00: D000000001460050 C0000000CE5D3C30 D00000000146C930
> > > > 0000000000000000
> > > > GPR04: 8000000000001032 0000000000000000 0000000000000000
> > > > 0000000000000000
> > > > GPR08: 0000000000000000 0000000000000000 C0000000015FBB68
> > > > 0000000000000000
> > > > GPR12: 0000000000000000 C000000000570B80 0000000000000000
> > > > D0000000012B1850
> > > > GPR16: D0000000012B1810 D0000000014601B0 0000000000000000
> > > > 0000000000000000
> > > > GPR20: 0000000000000028 D0000000012B0CE9 C0000000005A12E8
> > > > 0000000000000029
> > > > GPR24: D0000000012A0000 000000000000002A C0000000CD6F5A80
> > > > C0000000CD6F5AB0
> > > > GPR28: C0000000005A18C8 D000000001460680 D00000000146C900
> > > > D000000001460680
> > > > NIP [D000000001460060] .my_oops_init+0x2c/0xd4 [oops]
> > > > LR [D000000001460050] .my_oops_init+0x1c/0xd4 [oops]
> > > > Call Trace:
> > > > [C0000000CE5D3C30] [C000000000098944] .sys_init_module+0x1a88/0x1d18
> > > > (unreliable)
> > > > [C0000000CE5D3E30] [C0000000000086A4] syscall_exit+0x0/0x40
> > > > Instruction dump:
> > > > 4e800020 7c0802a6 fbc1fff0 ebc28000 f8010010 f821ff81 e87e8008 4800002d
> > > > e8410028 39200000 38210080 38600000 <91290000> e8010010 ebc1fff0 7c0803a6
> > > >  <0>Sending IPI to other cpus...
> > > > crash> whatis my_oops_init
> > > > whatis: gdb request failed: whatis my_oops_init
> > > > crash> mod -s oops
> > > >      MODULE       NAME                     SIZE  OBJECT FILE
> > > > d000000001460680  oops                    18752
> > > > /lib/modules/2.6.18-348.el5/kernel/oops.ko
> > > > crash> whatis my_oops_init
> > > > int my_oops_init(void);
> > > > crash> dis -l .my_oops_init
> > > > <nothing outputed>
> > > > crash> sym -m oops
> > > > d000000001460000 MODULE START: oops
> > > > d000000001460000 (t) .my_oops_exit
> > > > d000000001460000 (t) .cleanup_module
> > > > d000000001460034 (t) .my_oops_init
> > > > d000000001460034 (t) .init_module
> > > > d000000001460130 (r) ____versions
> > > > d000000001460130 (r) __versions
> > > > d000000001460680 (D) __this_module
> > > > d000000001464910 (D) cleanup_module
> > > > d000000001464910 (d) my_oops_exit
> > > > d000000001464920 (D) init_module
> > > > d000000001464920 (d) my_oops_init
> > > > d000000001464940 MODULE END: oops
> > > > crash> bt
> > > > PID: 5402   TASK: c0000000cfa35150  CPU: 15  COMMAND: "insmod"
> > > > 
> > > >  R0:  d000000001460050    R1:  c0000000ce5d3c30    R2:  d00000000146c930
> > > >  R3:  0000000000000000    R4:  8000000000001032    R5:  0000000000000000
> > > >  R6:  0000000000000000    R7:  0000000000000000    R8:  0000000000000000
> > > >  R9:  0000000000000000    R10: c0000000015fbb68    R11: 0000000000000000
> > > >  R12: 0000000000000000    R13: c000000000570b80    R14: 0000000000000000
> > > >  R15: d0000000012b1850    R16: d0000000012b1810    R17: d0000000014601b0
> > > >  R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000028
> > > >  R21: d0000000012b0ce9    R22: c0000000005a12e8    R23: 0000000000000029
> > > >  R24: d0000000012a0000    R25: 000000000000002a    R26: c0000000cd6f5a80
> > > >  R27: c0000000cd6f5ab0    R28: c0000000005a18c8    R29: d000000001460680
> > > >  R30: d00000000146c900    R31: d000000001460680
> > > >  NIP: d000000001460060    MSR: 8000000000009032    OR3: c0000000005a13c0
> > > >  CTR: 0000000000000004    LR:  d000000001460050    XER: 0000000000000006
> > > >  CCR: 0000000024022482    MQ:  c0000000cd6f5ab0    DAR: 0000000000000000
> > > >  DSISR: 0000000042000000     Syscall Result: 0000000000000000
> > > >  NIP [d000000001460060] .init_module
> > > >  LR  [d000000001460050] .init_module
> > > > 
> > > >  #0 [c0000000ce5d3c30] .sys_init_module at c000000000098944
> > > >  #1 [c0000000ce5d3e30] syscall_exit at c0000000000086a4
> > > >  syscall  [c00] exception frame:
> > > >  R0:  0000000000000080    R1:  00000000ff91fb60    R2:  000000000fff8eb0
> > > >  R3:  0000000010020028    R4:  000000000001caf8    R5:  0000000010020018
> > > >  R6:  000000000000002d    R7:  fffffffffeff0000    R8:  000000000002ffe0
> > > >  R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
> > > >  R12: 0000000000000000    R13: 000000001001959c    R14: 0000000000000000
> > > >  R15: 0000000000000000    R16: 0000000000000000    R17: 0000000000000000
> > > >  R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000000
> > > >  R21: 0000000000000000    R22: 0000000000000000    R23: 0000000000000000
> > > >  R24: 000000000ffbf280    R25: 00000000ff91fdf0    R26: 0000000010020018
> > > >  R27: 00000000ff91ff05    R28: 0000000000020000    R29: 000000000001caf8
> > > >  R30: 0000000010020028    R31: 0000000000000003
> > > >  NIP: 000000000ff0496c    MSR: 000000000000d032    OR3: 0000000010020028
> > > >  CTR: 000000000ff04964    LR:  0000000010000bf8    XER: 0000000000000000
> > > >  CCR: 0000000044000484    MQ:  0000000002756c28    DAR: 000000001004002c
> > > >  DSISR: 0000000042000000     Syscall Result: 0000000000000000
> > > > 
> > > > crash>
> > > > 
> > > > as you can see, the 'bt' command says the problem is at '.init_module',
> > > > but in fact it should come from '.my_oops_init'. But 'dis -l
> > > > .my_oops_init' shows nothing. I cannot use crash to figure out which line
> > > > of source code caused the oops. But using gdb as being stated in the web
> > > > page
> > > > I
> > > > can find the code line easily.
> > > > 
> > > > Please help. Thanks.
> > > 
> > > I'm not well-versed in ppc64, but the issue seems to be related
> > > to the fact that .my_oops_init and .init_module are both being
> > > assigned the same virtual address:
> > > 
> > > d000000001460034 (t) .my_oops_init
> > > d000000001460034 (t) .init_module
> > > 
> > > If you do an "nm -Bn" on the oops.ko file, do they show the same
> > > offset value?
> > 
> > Thanks, Dave. Looks like they have the same offset, both are zero:
> > 
> > $ nm -Bn oops.ko
> >                  U .printk
> > 0000000000000000 T .cleanup_module
> > 0000000000000000 T .init_module
> > 0000000000000000 t .my_oops_exit
> > 0000000000000000 t .my_oops_init
> > 0000000000000000 r ____versions
> > 0000000000000000 r __mod_srcversion29
> > 0000000000000000 D __this_module
> > 0000000000000000 D cleanup_module
> > 0000000000000000 d my_oops_exit
> > 0000000000000010 D init_module
> > 0000000000000010 d my_oops_init
> > 0000000000000028 r __module_depends
> > 0000000000000038 r __mod_vermagic5
> > 
> > But why gdb isn't affected by the same offset?
> 
> There is some confusion with the ppc64 usage of the symbol name with
> and without the "." preceding the name, i.e. the actual (t) text symbol
> of .my_oops_init versus the (D) data symbol of my_oops_init.
> 
>  $ gdb /root/oops.ko
>  GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
>  Copyright (C) 2010 Free Software Foundation, Inc.
>  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>  This is free software: you are free to change and redistribute it.
>  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>  and "show warranty" for details.
>  This GDB was configured as "ppc64-redhat-linux-gnu".
>  For bug reporting instructions, please see:
>  <http://www.gnu.org/software/gdb/bugs/>...
>  Reading symbols from /root/oops.ko...done.
>  (gdb) disassemble .my_oops_init
>  A syntax error in expression, near `.my_oops_init'.
>  (gdb) disassemble my_oops_init
>  Dump of assembler code for function my_oops_init:
>    0x0000000000000034 <+0>:	mflr    r0
>    0x0000000000000038 <+4>:	std     r30,-16(r1)
>    0x000000000000003c <+8>:	ld      r30,0(r2)
>    0x0000000000000040 <+12>:	std     r0,16(r1)
>    0x0000000000000044 <+16>:	stdu    r1,-128(r1)
>    0x0000000000000048 <+20>:	ld      r3,-32760(r30)
>    0x000000000000004c <+24>:	bl      0x4c <my_oops_init+24>
>    0x0000000000000050 <+28>:	nop
>    0x0000000000000054 <+32>:	li      r9,0
>    0x0000000000000058 <+36>:	addi    r1,r1,128
>    0x000000000000005c <+40>:	li      r3,0
>    0x0000000000000060 <+44>:	stw     r9,0(r9)
>    0x0000000000000064 <+48>:	ld      r0,16(r1)
>    0x0000000000000068 <+52>:	ld      r30,-16(r1)
>    0x000000000000006c <+56>:	mtlr    r0
>    0x0000000000000070 <+60>:	blr
>  End of assembler dump.
>  (gdb) 
> 
> Anyway, the crash utility "dis .my_oops_init" convenience command stops
> immediately because it sees that it has already reached the "next" symbol
> value of .init_module.  You could add an instruction count to force it 
> to continue:
> 
>  crash> dis .my_oops_init
>  crash> dis .my_oops_init 20
>  0xd0000000046f0034 <.init_module>:      mflr    r0
>  0xd0000000046f0038 <.init_module+4>:    std     r30,-16(r1)
>  0xd0000000046f003c <.init_module+8>:    ld      r30,-32768(r2)
>  0xd0000000046f0040 <.init_module+12>:   std     r0,16(r1)
>  0xd0000000046f0044 <.init_module+16>:   stdu    r1,-128(r1)
>  0xd0000000046f0048 <.init_module+20>:   ld      r3,-32760(r30)
>  0xd0000000046f004c <.init_module+24>:   bl      0xd0000000046f0078
>  0xd0000000046f0050 <.init_module+28>:   ld      r2,40(r1)
>  0xd0000000046f0054 <.init_module+32>:   li      r9,0
>  0xd0000000046f0058 <.init_module+36>:   addi    r1,r1,128
>  0xd0000000046f005c <.init_module+40>:   li      r3,0
>  0xd0000000046f0060 <.init_module+44>:   stw     r9,0(r9)
>  0xd0000000046f0064 <.init_module+48>:   ld      r0,16(r1)
>  0xd0000000046f0068 <.init_module+52>:   ld      r30,-16(r1)
>  0xd0000000046f006c <.init_module+56>:   mtlr    r0
>  0xd0000000046f0070 <.init_module+60>:   blr
>  0xd0000000046f0074 <.init_module+64>:   .long 0x0
>  0xd0000000046f0078 <.init_module+68>:   addis   r12,r2,-1
>  0xd0000000046f007c <.init_module+72>:   addi    r12,r12,32544
>  0xd0000000046f0080 <.init_module+76>:   std     r2,40(r1)
>  crash> 
> 
> Or just force it stop at the instruction that cause the crash:
> 
>  crash> dis -r d0000000046f0060
>  0xd0000000046f0034 <.init_module>:      mflr    r0
>  0xd0000000046f0038 <.init_module+4>:    std     r30,-16(r1)
>  0xd0000000046f003c <.init_module+8>:    ld      r30,-32768(r2)
>  0xd0000000046f0040 <.init_module+12>:   std     r0,16(r1)
>  0xd0000000046f0044 <.init_module+16>:   stdu    r1,-128(r1)
>  0xd0000000046f0048 <.init_module+20>:   ld      r3,-32760(r30)
>  0xd0000000046f004c <.init_module+24>:   bl      0xd0000000046f0078
>  0xd0000000046f0050 <.init_module+28>:   ld      r2,40(r1)
>  0xd0000000046f0054 <.init_module+32>:   li      r9,0
>  0xd0000000046f0058 <.init_module+36>:   addi    r1,r1,128
>  0xd0000000046f005c <.init_module+40>:   li      r3,0
>  0xd0000000046f0060 <.init_module+44>:   stw     r9,0(r9)
>  crash>
> 
> Dave
> 
Thanks, Dave. But how could we let the 'dis -l' working here, please?

crash> dis -l .my_oops_init 20
0xd000000001460034 <.init_module>:      mflr    r0
0xd000000001460038 <.init_module+4>:    std     r30,-16(r1)
0xd00000000146003c <.init_module+8>:    ld      r30,-32768(r2)
0xd000000001460040 <.init_module+12>:   std     r0,16(r1)
0xd000000001460044 <.init_module+16>:   stdu    r1,-128(r1)
0xd000000001460048 <.init_module+20>:   ld      r3,-32760(r30)
0xd00000000146004c <.init_module+24>:   bl      0xd000000001460078
0xd000000001460050 <.init_module+28>:   ld      r2,40(r1)
0xd000000001460054 <.init_module+32>:   li      r9,0
0xd000000001460058 <.init_module+36>:   addi    r1,r1,128
0xd00000000146005c <.init_module+40>:   li      r3,0
0xd000000001460060 <.init_module+44>:   stw     r9,0(r9)
0xd000000001460064 <.init_module+48>:   ld      r0,16(r1)
0xd000000001460068 <.init_module+52>:   ld      r30,-16(r1)
0xd00000000146006c <.init_module+56>:   mtlr    r0
0xd000000001460070 <.init_module+60>:   blr
0xd000000001460074 <.init_module+64>:   .long 0x0
0xd000000001460078 <.init_module+68>:   addis   r12,r2,-1
0xd00000000146007c <.init_module+72>:   addi    r12,r12,14152
0xd000000001460080 <.init_module+76>:   std     r2,40(r1)
crash>




More information about the Crash-utility mailing list