[Crash-utility] debug 3th part module which oops the system

Dave Anderson anderson at redhat.com
Mon Jul 8 14:59:53 UTC 2013



----- Original Message -----
> On Wed, Jul 03, 2013 at 09:04:50AM -0400, Dave Anderson wrote:
> > 
> > 
> > ----- Original Message -----
> > > Hey there,
> > > 
> > > I'm trying to analyse the vmcore come from an oops caused by a module.
> > > The
> > > module
> > > comes from here:
> > > 
> > >     http://www.linuxforu.com/2011/01/understanding-a-kernel-oops
> > > 
> > > This web page wants to teach how to analyse kernel oops. It provided a
> > > module named 'oops', which triggers a NULL pointer dereference in its
> > > init function.
> > > 
> > > The problem is I cannot figure out how to use crash to analyse vmcore:
> > > 
> > > GNU gdb (GDB) 7.0
> > > Copyright (C) 2009 Free Software Foundation, Inc.
> > > License GPLv3+: GNU GPL version 3 or later
> > > <http://gnu.org/licenses/gpl.html>
> > > This is free software: you are free to change and redistribute it.
> > > There is NO WARRANTY, to the extent permitted by law.  Type "show
> > > copying"
> > > and "show warranty" for details.
> > > This GDB was configured as "powerpc64-unknown-linux-gnu"...
> > > 
> > >       KERNEL: /usr/lib/debug/lib/modules/2.6.18-348.el5/vmlinux
> > >     DUMPFILE: /var/crash/127.0.0.1-2013-07-01-04:43/vmcore
> > >         CPUS: 20
> > >         DATE: Mon Jul  1 04:38:49 2013
> > >       UPTIME: 00:33:44
> > > LOAD AVERAGE: 0.22, 0.18, 0.07
> > >        TASKS: 482
> > >     NODENAME: lawlp3.upt.austin.ibm.com
> > >      RELEASE: 2.6.18-348.el5
> > >      VERSION: #1 SMP Wed Nov 28 21:23:52 EST 2012
> > >      MACHINE: ppc64  (3550 Mhz)
> > >       MEMORY: 3.2 GB
> > >        PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log
> > >        for
> > >        details)
> > >          PID: 5402
> > >      COMMAND: "insmod"
> > >         TASK: c0000000cfa35150  [THREAD_INFO: c0000000ce5d0000]
> > >          CPU: 15
> > >        STATE: TASK_RUNNING (PANIC)
> > > 
> > > crash> log
> > > ... ....
> > > oops: module license 'unspecified' taints kernel.
> > > oops from the module
> > > Unable to handle kernel paging request for data at address 0x00000000
> > > Faulting instruction address: 0xd000000001460060
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > SMP NR_CPUS=128 NUMA
> > > Modules linked in: oops(PU) nfsd exportfs auth_rpcgss autofs4 hidp nfs
> > > nfs_acl rfcomm l2cap bluetooth lockd sunrpc ip6t_REJECT xt_tcpudp
> > > ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_addr
> > > ib_cm
> > > ib_sa ib_mad iw_cm iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio
> > > cxgb3i libcxgbi libiscsi_tcp libiscsi2 scsi_transport_iscsi2
> > > scsi_transport_iscsi dm_multipath scsi_dh snd_powermac snd_seq_dummy
> > > snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> > > snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore i2c_core
> > > parport_pc lp parport sg iw_cxgb3 ib_core cxgb3 ibmveth 8021q dm_raid45
> > > dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror
> > > dm_log
> > > dm_mod lpfc ibmvfc scsi_transport_fc ibmvscsic sd_mod scsi_mod ext3 jbd
> > > uhci_hcd ohci_hcd ehci_hcd
> > > NIP: D000000001460060 LR: D000000001460050 CTR: 0000000000000004
> > > REGS: c0000000ce5d39b0 TRAP: 0300   Tainted: P     ----  (2.6.18-348.el5)
> > > MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24022482  XER: 00000006
> > > DAR: 0000000000000000, DSISR: 0000000042000000
> > > TASK = c0000000cfa35150[5402] 'insmod' THREAD: c0000000ce5d0000 CPU: 15
> > > GPR00: D000000001460050 C0000000CE5D3C30 D00000000146C930
> > > 0000000000000000
> > > GPR04: 8000000000001032 0000000000000000 0000000000000000
> > > 0000000000000000
> > > GPR08: 0000000000000000 0000000000000000 C0000000015FBB68
> > > 0000000000000000
> > > GPR12: 0000000000000000 C000000000570B80 0000000000000000
> > > D0000000012B1850
> > > GPR16: D0000000012B1810 D0000000014601B0 0000000000000000
> > > 0000000000000000
> > > GPR20: 0000000000000028 D0000000012B0CE9 C0000000005A12E8
> > > 0000000000000029
> > > GPR24: D0000000012A0000 000000000000002A C0000000CD6F5A80
> > > C0000000CD6F5AB0
> > > GPR28: C0000000005A18C8 D000000001460680 D00000000146C900
> > > D000000001460680
> > > NIP [D000000001460060] .my_oops_init+0x2c/0xd4 [oops]
> > > LR [D000000001460050] .my_oops_init+0x1c/0xd4 [oops]
> > > Call Trace:
> > > [C0000000CE5D3C30] [C000000000098944] .sys_init_module+0x1a88/0x1d18
> > > (unreliable)
> > > [C0000000CE5D3E30] [C0000000000086A4] syscall_exit+0x0/0x40
> > > Instruction dump:
> > > 4e800020 7c0802a6 fbc1fff0 ebc28000 f8010010 f821ff81 e87e8008 4800002d
> > > e8410028 39200000 38210080 38600000 <91290000> e8010010 ebc1fff0 7c0803a6
> > >  <0>Sending IPI to other cpus...
> > > crash> whatis my_oops_init
> > > whatis: gdb request failed: whatis my_oops_init
> > > crash> mod -s oops
> > >      MODULE       NAME                     SIZE  OBJECT FILE
> > > d000000001460680  oops                    18752
> > > /lib/modules/2.6.18-348.el5/kernel/oops.ko
> > > crash> whatis my_oops_init
> > > int my_oops_init(void);
> > > crash> dis -l .my_oops_init
> > > <nothing outputed>
> > > crash> sym -m oops
> > > d000000001460000 MODULE START: oops
> > > d000000001460000 (t) .my_oops_exit
> > > d000000001460000 (t) .cleanup_module
> > > d000000001460034 (t) .my_oops_init
> > > d000000001460034 (t) .init_module
> > > d000000001460130 (r) ____versions
> > > d000000001460130 (r) __versions
> > > d000000001460680 (D) __this_module
> > > d000000001464910 (D) cleanup_module
> > > d000000001464910 (d) my_oops_exit
> > > d000000001464920 (D) init_module
> > > d000000001464920 (d) my_oops_init
> > > d000000001464940 MODULE END: oops
> > > crash> bt
> > > PID: 5402   TASK: c0000000cfa35150  CPU: 15  COMMAND: "insmod"
> > > 
> > >  R0:  d000000001460050    R1:  c0000000ce5d3c30    R2:  d00000000146c930
> > >  R3:  0000000000000000    R4:  8000000000001032    R5:  0000000000000000
> > >  R6:  0000000000000000    R7:  0000000000000000    R8:  0000000000000000
> > >  R9:  0000000000000000    R10: c0000000015fbb68    R11: 0000000000000000
> > >  R12: 0000000000000000    R13: c000000000570b80    R14: 0000000000000000
> > >  R15: d0000000012b1850    R16: d0000000012b1810    R17: d0000000014601b0
> > >  R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000028
> > >  R21: d0000000012b0ce9    R22: c0000000005a12e8    R23: 0000000000000029
> > >  R24: d0000000012a0000    R25: 000000000000002a    R26: c0000000cd6f5a80
> > >  R27: c0000000cd6f5ab0    R28: c0000000005a18c8    R29: d000000001460680
> > >  R30: d00000000146c900    R31: d000000001460680
> > >  NIP: d000000001460060    MSR: 8000000000009032    OR3: c0000000005a13c0
> > >  CTR: 0000000000000004    LR:  d000000001460050    XER: 0000000000000006
> > >  CCR: 0000000024022482    MQ:  c0000000cd6f5ab0    DAR: 0000000000000000
> > >  DSISR: 0000000042000000     Syscall Result: 0000000000000000
> > >  NIP [d000000001460060] .init_module
> > >  LR  [d000000001460050] .init_module
> > > 
> > >  #0 [c0000000ce5d3c30] .sys_init_module at c000000000098944
> > >  #1 [c0000000ce5d3e30] syscall_exit at c0000000000086a4
> > >  syscall  [c00] exception frame:
> > >  R0:  0000000000000080    R1:  00000000ff91fb60    R2:  000000000fff8eb0
> > >  R3:  0000000010020028    R4:  000000000001caf8    R5:  0000000010020018
> > >  R6:  000000000000002d    R7:  fffffffffeff0000    R8:  000000000002ffe0
> > >  R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
> > >  R12: 0000000000000000    R13: 000000001001959c    R14: 0000000000000000
> > >  R15: 0000000000000000    R16: 0000000000000000    R17: 0000000000000000
> > >  R18: 0000000000000000    R19: 0000000000000000    R20: 0000000000000000
> > >  R21: 0000000000000000    R22: 0000000000000000    R23: 0000000000000000
> > >  R24: 000000000ffbf280    R25: 00000000ff91fdf0    R26: 0000000010020018
> > >  R27: 00000000ff91ff05    R28: 0000000000020000    R29: 000000000001caf8
> > >  R30: 0000000010020028    R31: 0000000000000003
> > >  NIP: 000000000ff0496c    MSR: 000000000000d032    OR3: 0000000010020028
> > >  CTR: 000000000ff04964    LR:  0000000010000bf8    XER: 0000000000000000
> > >  CCR: 0000000044000484    MQ:  0000000002756c28    DAR: 000000001004002c
> > >  DSISR: 0000000042000000     Syscall Result: 0000000000000000
> > > 
> > > crash>
> > > 
> > > as you can see, the 'bt' command says the problem is at '.init_module',
> > > but in fact it should come from '.my_oops_init'. But 'dis -l
> > > .my_oops_init' shows nothing. I cannot use crash to figure out which line
> > > of source code caused the oops. But using gdb as being stated in the web
> > > page
> > > I
> > > can find the code line easily.
> > > 
> > > Please help. Thanks.
> > 
> > I'm not well-versed in ppc64, but the issue seems to be related
> > to the fact that .my_oops_init and .init_module are both being
> > assigned the same virtual address:
> > 
> > d000000001460034 (t) .my_oops_init
> > d000000001460034 (t) .init_module
> > 
> > If you do an "nm -Bn" on the oops.ko file, do they show the same
> > offset value?
> 
> Thanks, Dave. Looks like they have the same offset, both are zero:
> 
> $ nm -Bn oops.ko
>                  U .printk
> 0000000000000000 T .cleanup_module
> 0000000000000000 T .init_module
> 0000000000000000 t .my_oops_exit
> 0000000000000000 t .my_oops_init
> 0000000000000000 r ____versions
> 0000000000000000 r __mod_srcversion29
> 0000000000000000 D __this_module
> 0000000000000000 D cleanup_module
> 0000000000000000 d my_oops_exit
> 0000000000000010 D init_module
> 0000000000000010 d my_oops_init
> 0000000000000028 r __module_depends
> 0000000000000038 r __mod_vermagic5
> 
> But why gdb isn't affected by the same offset?

There is some confusion with the ppc64 usage of the symbol name with
and without the "." preceding the name, i.e. the actual (t) text symbol
of .my_oops_init versus the (D) data symbol of my_oops_init.

 $ gdb /root/oops.ko
 GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
 Copyright (C) 2010 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "ppc64-redhat-linux-gnu".
 For bug reporting instructions, please see:
 <http://www.gnu.org/software/gdb/bugs/>...
 Reading symbols from /root/oops.ko...done.
 (gdb) disassemble .my_oops_init
 A syntax error in expression, near `.my_oops_init'.
 (gdb) disassemble my_oops_init
 Dump of assembler code for function my_oops_init:
   0x0000000000000034 <+0>:	mflr    r0
   0x0000000000000038 <+4>:	std     r30,-16(r1)
   0x000000000000003c <+8>:	ld      r30,0(r2)
   0x0000000000000040 <+12>:	std     r0,16(r1)
   0x0000000000000044 <+16>:	stdu    r1,-128(r1)
   0x0000000000000048 <+20>:	ld      r3,-32760(r30)
   0x000000000000004c <+24>:	bl      0x4c <my_oops_init+24>
   0x0000000000000050 <+28>:	nop
   0x0000000000000054 <+32>:	li      r9,0
   0x0000000000000058 <+36>:	addi    r1,r1,128
   0x000000000000005c <+40>:	li      r3,0
   0x0000000000000060 <+44>:	stw     r9,0(r9)
   0x0000000000000064 <+48>:	ld      r0,16(r1)
   0x0000000000000068 <+52>:	ld      r30,-16(r1)
   0x000000000000006c <+56>:	mtlr    r0
   0x0000000000000070 <+60>:	blr
 End of assembler dump.
 (gdb) 

Anyway, the crash utility "dis .my_oops_init" convenience command stops
immediately because it sees that it has already reached the "next" symbol
value of .init_module.  You could add an instruction count to force it 
to continue:

 crash> dis .my_oops_init
 crash> dis .my_oops_init 20
 0xd0000000046f0034 <.init_module>:      mflr    r0
 0xd0000000046f0038 <.init_module+4>:    std     r30,-16(r1)
 0xd0000000046f003c <.init_module+8>:    ld      r30,-32768(r2)
 0xd0000000046f0040 <.init_module+12>:   std     r0,16(r1)
 0xd0000000046f0044 <.init_module+16>:   stdu    r1,-128(r1)
 0xd0000000046f0048 <.init_module+20>:   ld      r3,-32760(r30)
 0xd0000000046f004c <.init_module+24>:   bl      0xd0000000046f0078
 0xd0000000046f0050 <.init_module+28>:   ld      r2,40(r1)
 0xd0000000046f0054 <.init_module+32>:   li      r9,0
 0xd0000000046f0058 <.init_module+36>:   addi    r1,r1,128
 0xd0000000046f005c <.init_module+40>:   li      r3,0
 0xd0000000046f0060 <.init_module+44>:   stw     r9,0(r9)
 0xd0000000046f0064 <.init_module+48>:   ld      r0,16(r1)
 0xd0000000046f0068 <.init_module+52>:   ld      r30,-16(r1)
 0xd0000000046f006c <.init_module+56>:   mtlr    r0
 0xd0000000046f0070 <.init_module+60>:   blr
 0xd0000000046f0074 <.init_module+64>:   .long 0x0
 0xd0000000046f0078 <.init_module+68>:   addis   r12,r2,-1
 0xd0000000046f007c <.init_module+72>:   addi    r12,r12,32544
 0xd0000000046f0080 <.init_module+76>:   std     r2,40(r1)
 crash> 

Or just force it stop at the instruction that cause the crash:

 crash> dis -r d0000000046f0060
 0xd0000000046f0034 <.init_module>:      mflr    r0
 0xd0000000046f0038 <.init_module+4>:    std     r30,-16(r1)
 0xd0000000046f003c <.init_module+8>:    ld      r30,-32768(r2)
 0xd0000000046f0040 <.init_module+12>:   std     r0,16(r1)
 0xd0000000046f0044 <.init_module+16>:   stdu    r1,-128(r1)
 0xd0000000046f0048 <.init_module+20>:   ld      r3,-32760(r30)
 0xd0000000046f004c <.init_module+24>:   bl      0xd0000000046f0078
 0xd0000000046f0050 <.init_module+28>:   ld      r2,40(r1)
 0xd0000000046f0054 <.init_module+32>:   li      r9,0
 0xd0000000046f0058 <.init_module+36>:   addi    r1,r1,128
 0xd0000000046f005c <.init_module+40>:   li      r3,0
 0xd0000000046f0060 <.init_module+44>:   stw     r9,0(r9)
 crash>

Dave


 




More information about the Crash-utility mailing list