[Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

Wed Oct 26 17:23:45 UTC 2005

This message was bounced due to its size of its attachment;
I've since bumped up the maximum allowable message size:

         Re: [Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)
       Date: Wed, 26 Oct 2005 09:15:47 -0700
       From: Badari Pulavarty <pbadari at us.ibm.com>
         To: <crash-utility at redhat.com>
 References: 1, 2, 3, 4

On Wed, 2005-10-26 at 11:51 -0400, Dave Anderson wrote:

> >
> > crash: read error: kernel virtual address: ffff8100050eb084  type:
> > "tss_struct ist array"
> >
>
> I see that the 2.6.13 kernel defines its init_tss
> array like so:
>
> DEFINE_PER_CPU(struct tss_struct, init_tss)
> ____cacheline_maxaligned_in_smp;
>
> whereas, the earlier 2.6 kernels do it like this:
>
> DECLARE_PER_CPU(struct tss_struct,init_tss);
>
> If this change modifies the way that per-cpu variable addresses
> are laid out, then I can't tell you what to do without significant
> further investigation. But until proven otherwise, let's presume
> that the calculations of the per-cpu data is done the same way.
>
> There are two places where that error message comes from, both
> in x86_64_ist_init(), but given that the above per-cpu declarations
> are functionally equivalent, there would be the following
> kernel symbol in your vmlinux, verifiable like so:
>
> $ nm -Bn vmlinux | grep per_cpu__init_tss
> ffffffff80502100 D per_cpu__init_tss
> $
>
> If it's not there, crash is hosed, then signficant work needs
> to be done to find it.  But if the symbol is still intact in
> the 2.6.14 kernel, the failure should have come from an incorrect
> calculation of the vaddr of the init_tss below:

None of the above stuff changed, so we are fine.

> static void
> x86_64_ist_init(void)
> {
>                ...
>
>                 } else if (symbol_exists("per_cpu__init_tss")) {
>                 for (c = 0; c < NR_CPUS; c++) {
>                         if ((kt->flags & SMP) && (kt->flags &
> PER_CPU_OFF)) {
>                                 if (kt->__per_cpu_offset[c] == 0)
>                                         break;
>                                 vaddr = symbol_value
> ("per_cpu__init_tss") +
>                                         kt->__per_cpu_offset[c];
>                         } else
>                                 vaddr = symbol_value
> ("per_cpu__init_tss");
>
>                         vaddr += OFFSET(tss_struct_ist);
>
>                         readmem(vaddr, KVADDR, &ms->stkinfo.ebase
> [c][0],
>                                 sizeof(ulong) * 7, "tss_struct ist
> array",
>                                 FAULT_ON_ERROR);
>

Yes. I realized that the problem is due to messed up
kt->__per_cpu_offset[c] value. These should be offset into the array,
they should be small values. I see huge numbers.

per-cpu offset: 84afdf60

I also realized that this gets set at the lines I touched earlier :(
I can't seem to find out what I screwed up. We are just reading a value
from the kernel structure and setting it.

>                         if (ms->stkinfo.ebase[c][0] == 0)
>                                 break;
>                 }
>         }
>
> I'm also presuming your test kernel is SMP.  But I'm wondering
> whether
> the SMP and PER_CPU_OFF flags are set?

Yes.

> The SMP flag should have been pre-set in kernel_init(), but the
> PER_CPU_OFF flag gets set in x86_64_cpu_pda_init(), which you
> have modified.
>
> You can display the kt->flags contents with a printk x86_64_ist_init
> ().
> If PER_CPU_OFF is not set, then that's probably the issue here.
>
> Can you show your new versions of  x86_64_cpu_pda_init() and
> x86_64_get_smp_cpus()?

Here are new versions of x64-64 for your review.

Thanks,
Badari

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20051026/c82bdd98/attachment.htm>