[Crash-utility] Crash, won't read my vmcore "crash: page excluded: kernel virtual address:"

Tue Feb 11 06:35:39 UTC 2014

Dave Anderson reached out and wrote:

----- Original Message -----
> [root kvm7 127.0.0.1-2014-02-07-19:17:09]# crash
/boot/System.map-2.6.32-220.el6.x86_64.debug
/usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux vmcore
>
> crash 5.1.8-1.el6
> Copyright (C) 2002-2011 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
details.
> GNU gdb (GDB) 7.0
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html
> >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> crash: page excluded: kernel virtual address: ffffffff81542000 type:
"cpu_possible_mask"
>
> I can go into minimal,
>
>
> nm -Bn /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux |
grep _stext
> ffffffff81000198 T _stext
>
> cat /proc/kallsyms | grep _stext
> ffffffff81000198 T _stext
>
> If I use the System Map parm I get this warning
>
> WARNING: kernels compiled by different gcc versions:
> /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64.debug/vmlinux: 4.4.5
> vmcore kernel: 4.4.6
>
>
> Would really like to understand why this system crashed. I know I'm a bit
> behind on my kernel versions however, but I should be able to look at this
> kernel??
>
> Thanks
> Tory

It looks like the vmcore and vmlinux file don't match, like maybe the
crashing
system was running the standard 2.6.32-220.el6.x86_64 kernel, and you're
trying
to debug it using the 2.6.32-220.el6.x86_64.debug kernel variant?

First thing -- *never* use a System.map file unless for some reason you
don't
have the original kernel's vmlinux available *and* you feel that the vmlinux
file you have is very close to the crashing kernel's vmlinux.  Bit with any
RHEL standard (unmodified) vmlinux/vmcore setup, the System.map is
completely
useless.

So the first question is: what kernel generated the vmcore?

Do this:

 $ strings vmcore | grep '2.6.32'

Dave

--
Dave you are right, I thought I had to use the devel kernel and in fact my
system is not running that, so it crashed with the standard
2.6.32-220.el6.x86_64 kernel.

[tblue at kvm7 127.0.0.1-2014-02-07-19:17:09]$ sudo  strings vmcore | grep
'2.6.32'

2.6.32-220.el6.x86_64
OSRELEASE=2.6.32-220.el6.x86_64

But it won't take my vmlinux from /boot

crash: /boot/vmlinuz-2.6.32-220.el6.x86_64: not a supported file format

Yes sir you were correct, I was using the wrong kernel!

please wait... (determining panic task)
WARNING: multiple active tasks have called die

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.el6.x86_64/vmlinux
    DUMPFILE: /libvirt/crash/127.0.0.1-2014-02-07-19:17:09/vmcore  [PARTIAL
DUMP]
        CPUS: 32
        DATE: Fri Feb  7 18:16:05 2014
      UPTIME: 226 days, 21:36:13
LOAD AVERAGE: 2.42, 2.68, 2.69
       TASKS: 816
    NODENAME: kvm7.domain.com
     RELEASE: 2.6.32-220.el6.x86_64
     VERSION: #1 SMP Tue Dec 6 19:48:22 GMT 2011
     MACHINE: x86_64  (2200 Mhz)
      MEMORY: 88 GB
       PANIC: ""
         PID: 0
     COMMAND: "swapper"
        TASK: ffff881665514b40  (1 of 32)  [THREAD_INFO: ffff880c6124e000]
         CPU: 19
       STATE: TASK_RUNNING (PANIC)

Nothing stands out as s bug or reason to fail

divide error: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu31/cache/index2/shared_cpu_map
CPU 19
Modules linked in: ext3 jbd ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables
sunrpc bridge stp llc bonding ipv6 vhost_net macvtap macvlan tun kvm_intel
kvm cdc_ether usbnet mii microcode i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support shpchp igb ioatdma dca ses enclosure sg ext4 mbcache
jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1 IBM System x3650
M4 -[7915AC1]-/00J6528
RIP: 0010:[<ffffffff81054ad5>]  [<ffffffff81054ad5>]
find_busiest_group+0x5c5/0xb20
RSP: 0018:ffff880028363c40  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880028363e64 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8800282cf540 RDI: ffff8800282d5fc0
RBP: ffff880028363dd0 R08: ffff8800282cf860 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffff01
R13: 0000000000015fc0 R14: ffffffffffffffff R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028360000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4e5215c000 CR3: 00000011bea54000 CR4: 00000000000426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880c6124e000, task ffff881665514b40)
Stack:
 ffff880028363d70 ffff880028363ce0 ffff880028363ca0 000000000000024d
<0> ffff8800282cf860 ffff880028363e58 0101881664b121a8 0000000600000000
<0> 0000000600000000 ffff8800282cf540 0000000123386cc0 0000000000000008
Call Trace:
 <IRQ>
 [<ffffffffa02e4669>] ? br_handle_frame_finish+0x179/0x2a0 [bridge]
 [<ffffffff8105fc52>] rebalance_domains+0x1a2/0x5b0
 [<ffffffff81060153>] run_rebalance_domains+0xf3/0x160
 [<ffffffff8107c4f0>] ? get_next_timer_interrupt+0x1b0/0x250
 [<ffffffff81072161>] __do_softirq+0xc1/0x1d0
 [<ffffffff81097e0a>] ? sched_clock_idle_wakeup_event+0x1a/0x20
 [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 [<ffffffff8100de85>] do_softirq+0x65/0xa0
 [<ffffffff81071f45>] irq_exit+0x85/0x90
 [<ffffffff8102a255>] smp_call_function_single_interrupt+0x35/0x40
 [<ffffffff8100bdb3>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff812c4a5e>] ? intel_idle+0xde/0x170
 [<ffffffff812c4a41>] ? intel_idle+0xc1/0x170
 [<ffffffff813f9f47>] cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] cpu_idle+0xb6/0x110
 [<ffffffff814e5f23>] start_secondary+0x202/0x245
Code: d0 b8 01 00 00 00 48 c1 ea 0a 48 85 d2 0f 45 c2 41 89 40 08 66 90 4c
8b 85 e0 fe ff ff 48 8b 45 a8 31 d2 41 8b 48 08 48 c1 e0 0a <48> f7 f1 48
8b 4d b0 48 89 45 a0 31 c0 48 85 c9 74 0c 48 8b 45
RIP  [<ffffffff81054ad5>] find_busiest_group+0x5c5/0xb20
 RSP <ffff880028363c40>

Is there a forum that would help me figure out what exactly cause this
crash as it's not the first time, across this series of servers running KVM

Thank you sir,

Tory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140210/50f731db/attachment.htm>