[Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

Thu Oct 27 18:16:22 UTC 2005

Badari Pulavarty wrote:

> On Thu, 2005-10-27 at 13:17 -0400, Dave Anderson wrote:
> > Badari Pulavarty wrote:
> > > > That debug output certainly seems to pinpoint the issue at hand,
> > > doesn't it?
> > > > Very interesting...
> > > >
> > > > What's strange is that the usage of the cpu_pda[i].data_offset by
> > > the
> > > > per_cpu() macro in "include/asm-x86_64/percpu.h" is unchanged.
> > > >
> > > > It's probably something very simple going on here, but I don't
> > > have
> > > > any more ideas at this point.
> > >
> > > This is the reply I got from Andi Kleen..
> > >
> > > -------- Forwarded Message --------
> > > From: Andi Kleen <ak at suse.de>
> > > To: Badari Pulavarty <pbadari at us.ibm.com>
> > > Subject: Re: cpu_pda->data_offset changed recently ?
> > > Date: Thu, 27 Oct 2005 16:58:54 +0200
> > > On Thursday 27 October 2005 16:53, Badari Pulavarty wrote:
> > > > Hi Andi,
> > > >
> > > > I am trying to fix "crash" utility to make it work on 2.6.14-rc5.
> > > > (Its running fine on 2.6.10). It looks like crash utility reads
> > > > and uses cpu_pda->data_offset values. It looks like there is a
> > > > change between 2.6.10 & 2.6.14-rc5 which is causing "data_offset"
> > > > to be huge values - which is causing "crash" to break.
> > > >
> > > > I added printk() to find out why ? As you can see from following
> > > > what changed - Is this expected ? Please let me know.
> > >
> > > bootmem used to allocate from the end of the direct mapping on NUMA
> > > systems. Now it starts at the beginning, often before the
> > > kernel .text.
> > > This means it is negative. Perfectly legitimate. crash just has to
> > > handle it.
> > >
> > > -Andi
> > >
> > > --
> > >
> > That's what I thought it looked like, although the
> > x8664_pda.data_offset
> > field is an "unsigned long".  Anyway, if you take any of the
> > per_cpu__xxx
> > symbols from the 2.6.14 kernel, subtract a cpu data_offset, does it
> > come up
> > with a legitimate virtual address?
>
> Unfortunately, I don't know x86-64 kernel virtual address space
> well enough to answer your question.
>
> My understanding is x86-64 kernel addresses look something like:
>
> addr: ffffffff80101000
>
> But now (2.6.14-rc5) I do see address like:
>
> pgdat: 0xffff81000000e000
>
> which are causing read problems.
>
> crash: read error: kernel virtual address: ffff81000000fa90  type:
> "pglist_data node_next"
>
> I am not sure what these address are and if they are valid.
> Is there a way to verify these addresses, through gdb or /dev/kmem
> or something like that ?
>
> Thanks,
> Badari

> Here is bottom line we need to understand to fix
> the problem.
>
> 2.6.10:
> pgdat: 0x1000000e000
>
> 2.6.14-rc5:
> pgdat: 0xffff81000000e000
>

Exactly.

On a 2.6.9 kernel, if you do an nm -Bn on the vmlinux file, you'll first
see a bunch of "A" type absolute symbols, followed by the text
symbols, then readonly data, data, and so on.  Eventually you'll
bump into the per-cpu symbols:

$ nm -Bn vmlinux
0000000000088861 A __crc_dev_mc_delete
000000000014bfd1 A __crc_smp_call_function
00000000002de2e0 A __crc___skb_linearize
0000000000442f14 A __crc_tty_register_device
000000000060e766 A __crc_tty_termios_baud_rate
0000000000712c54 A __crc_remove_inode_hash
00000000007f8e0b A __crc_xfrm_policy_alloc
0000000000801678 A __crc_flush_scheduled_work
0000000000a64d75 A __crc_neigh_changeaddr
...  <snip>
00000000ffdf0b3d A __crc_usb_driver_release_interface
00000000ffe031fc A __crc_udp_proc_unregister
00000000ffead192 A __crc_cdrom_number_of_slots
00000000fff9536b A __crc_sock_no_recvmsg
00000000fffb8df8 A __crc_device_unregister
ffffffff80100000 t startup_32
ffffffff80100000 A _text
ffffffff80100081 t reach_compatibility_mode
ffffffff8010008e t second
ffffffff80100100 t reach_long64
ffffffff8010013d T initial_code
ffffffff80100145 T init_rsp
ffffffff80100150 T no_long_mode
ffffffff80100f00 T pGDT32
ffffffff80100f10 t ljumpvector
ffffffff80100f18 T stext
ffffffff80100f18 T _stext
ffffffff80101000 T init_level4_pgt
ffffffff80102000 T level3_ident_pgt
...  <snip>
ffffffff80502100 D per_cpu__init_tss
ffffffff80502200 d per_cpu__prof_old_multiplier
ffffffff80502204 d per_cpu__prof_multiplier
ffffffff80502208 d per_cpu__prof_counter
ffffffff80502220 D per_cpu__mmu_gathers
ffffffff80503280 D per_cpu__kstat
ffffffff80503680 d per_cpu__runqueues
ffffffff805048e0 d per_cpu__cpu_domains
ffffffff80504940 d per_cpu__phys_domains
ffffffff805049a0 d per_cpu__node_domains
ffffffff805049f8 D per_cpu__process_counts
ffffffff80504a00 d per_cpu__tasklet_hi_vec
ffffffff80504a08 d per_cpu__tasklet_vec
ffffffff80504a10 d per_cpu__ksoftirqd
ffffffff80504a80 d per_cpu__tvec_bases
ffffffff80506b00 D per_cpu__rcu_bh_data
ffffffff80506b60 D per_cpu__rcu_data
ffffffff80506bc0 d per_cpu__rcu_tasklet
...

So for any data that was specifically created per-cpu,
the symbol above is the starting point, but to get to
the per-cpu structure, the offset value from the
cpu_data.data_offset needs to be applied.

What I don't understand is where the 0xffff810000000000
addresses come into play.  Are you seeing them as actual
symbols?

Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20051027/d2f22384/attachment.htm>