[Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

Thu Oct 27 17:28:28 UTC 2005

On Thu, 2005-10-27 at 13:17 -0400, Dave Anderson wrote:
> Badari Pulavarty wrote: 
> > > That debug output certainly seems to pinpoint the issue at hand,
> > doesn't it? 
> > > Very interesting... 
> > > 
> > > What's strange is that the usage of the cpu_pda[i].data_offset by
> > the 
> > > per_cpu() macro in "include/asm-x86_64/percpu.h" is unchanged. 
> > > 
> > > It's probably something very simple going on here, but I don't
> > have 
> > > any more ideas at this point. 
> > 
> > This is the reply I got from Andi Kleen.. 
> > 
> > -------- Forwarded Message -------- 
> > From: Andi Kleen <ak at suse.de> 
> > To: Badari Pulavarty <pbadari at us.ibm.com> 
> > Subject: Re: cpu_pda->data_offset changed recently ? 
> > Date: Thu, 27 Oct 2005 16:58:54 +0200 
> > On Thursday 27 October 2005 16:53, Badari Pulavarty wrote: 
> > > Hi Andi, 
> > > 
> > > I am trying to fix "crash" utility to make it work on 2.6.14-rc5. 
> > > (Its running fine on 2.6.10). It looks like crash utility reads 
> > > and uses cpu_pda->data_offset values. It looks like there is a 
> > > change between 2.6.10 & 2.6.14-rc5 which is causing "data_offset" 
> > > to be huge values - which is causing "crash" to break. 
> > > 
> > > I added printk() to find out why ? As you can see from following 
> > > what changed - Is this expected ? Please let me know. 
> > 
> > bootmem used to allocate from the end of the direct mapping on NUMA 
> > systems. Now it starts at the beginning, often before the
> > kernel .text. 
> > This means it is negative. Perfectly legitimate. crash just has to 
> > handle it. 
> > 
> > -Andi 
> > 
> > --
> > 
> That's what I thought it looked like, although the
> x8664_pda.data_offset 
> field is an "unsigned long".  Anyway, if you take any of the
> per_cpu__xxx 
> symbols from the 2.6.14 kernel, subtract a cpu data_offset, does it
> come up 
> with a legitimate virtual address? 

Unfortunately, I don't know x86-64 kernel virtual address space
well enough to answer your question.

My understanding is x86-64 kernel addresses look something like:

addr: ffffffff80101000 

But now (2.6.14-rc5) I do see address like:

pgdat: 0xffff81000000e000

which are causing read problems.

crash: read error: kernel virtual address: ffff81000000fa90  type:
"pglist_data node_next"

I am not sure what these address are and if they are valid.
Is there a way to verify these addresses, through gdb or /dev/kmem
or something like that ?

Thanks,
Badari