<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3086" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=405052019-14052007><FONT face=Arial
color=#0000ff size=2>Dave,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=405052019-14052007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=405052019-14052007><FONT face=Arial
color=#0000ff size=2>I agree that we own it from this side to figure out where
the rest of the dump went.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=405052019-14052007><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=405052019-14052007><FONT face=Arial
color=#0000ff size=2>Thank you again for your help,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=405052019-14052007><FONT face=Arial
color=#0000ff size=2><BR>Frank</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> crash-utility-bounces@redhat.com
[mailto:crash-utility-bounces@redhat.com] <B>On Behalf Of </B>Dave
Anderson<BR><B>Sent:</B> Monday, May 14, 2007 3:16 PM<BR><B>To:</B> Discussion
list for crash utility usage, maintenance and development<BR><B>Subject:</B>
Re: [Crash-utility] Seek error type: "tss_struct ist array" problemon8-CPU AMD
system<BR></FONT><BR></DIV>
<DIV></DIV><TT>"Jansen, Frank" wrote:</TT>
<BLOCKQUOTE TYPE="CITE"><TT>> -----Original Message-----</TT>
<BR><TT>> From: crash-utility-bounces@redhat.com</TT> <BR><TT>> [<A
href="mailto:crash-utility-bounces@redhat.com">mailto:crash-utility-bounces@redhat.com</A>]
On Behalf Of Dave Anderson</TT> <BR><TT>> Sent: Monday, May 14, 2007
12:22 PM</TT> <BR><TT>> To: Discussion list for crash utility usage,
maintenance and</TT> <BR><TT>> development</TT> <BR><TT>> Subject: Re:
[Crash-utility] Seek error type: "tss_struct ist</TT> <BR><TT>> array"
problem on8-CPU AMD system</TT> <BR><TT>></TT> <BR><TT>> "Jansen,
Frank" wrote:</TT> <BR><TT>></TT> <BR><TT>> > Looking through the
changelog, I saw that the 'tss_struct ist array'</TT> <BR><TT>> >
problem on 8-CPU systems had been addressed previously.</TT> <BR><TT>>
However, I'm</TT> <BR><TT>> > running into this issue on an AMD server
with crash 4.0-4.1</TT> <BR><TT>> and RHEL4</TT> <BR><TT>> > Update
5 (2.6.9-55.Elsmp).</TT> <BR><TT>> ></TT> <BR><TT>> > The output
from the crash invocation is the following:</TT> <BR><TT>> > +++</TT>
<BR><TT>> > [root@well-rhel4564-ps3 dump]# /fpj/crash
System_map.2.6.9-55.ELsmp</TT> <BR><TT>> >
vmlinux.debug.2.6.9-55.ELsmp ap3.1178895173.dmp</TT> <BR><TT>> ></TT>
<BR><TT>> > crash 4.0-4.1</TT> <BR><TT>> > Copyright (C) 2002,
2003, 2004, 2005, 2006, 2007 Red Hat, Inc.</TT> <BR><TT>> >
Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C)</TT>
<BR><TT>> > 1999-2006 Hewlett-Packard Co Copyright (C) 2005,
2006 Fujitsu</TT> <BR><TT>> > Limited Copyright (C) 2006,
2007 VA Linux Systems Japan K.K.</TT> <BR><TT>> > Copyright (C)
2005 NEC Corporation</TT> <BR><TT>> > Copyright (C) 1999,
2002 Silicon Graphics, Inc.</TT> <BR><TT>> > Copyright (C) 1999,
2000, 2001, 2002 Mission Critical Linux, Inc.</TT> <BR><TT>> >
This program is free software, covered by the GNU General Public</TT>
<BR><TT>> > License, and you are welcome to change it and/or
distribute</TT> <BR><TT>> copies of</TT> <BR><TT>> > it under
certain conditions. Enter "help copying" to see the</TT> <BR><TT>>
> conditions.</TT> <BR><TT>> > This program has absolutely no
warranty. Enter "help warranty" for</TT> <BR><TT>> >
details.</TT> <BR><TT>> ></TT> <BR><TT>> > GNU gdb 6.1</TT>
<BR><TT>> > Copyright 2004 Free Software Foundation, Inc.</TT>
<BR><TT>> > GDB is free software, covered by the GNU General
Public</TT> <BR><TT>> License, and</TT> <BR><TT>> > you are welcome
to change it and/or distribute copies of it under</TT> <BR><TT>> >
certain conditions.</TT> <BR><TT>> > Type "show copying" to see the
conditions.</TT> <BR><TT>> > There is absolutely no warranty for
GDB. Type "show warranty" for</TT> <BR><TT>> > details.</TT>
<BR><TT>> > This GDB was configured as
"x86_64-unknown-linux-gnu"...</TT> <BR><TT>> ></TT> <BR><TT>> >
crash: seek error: kernel virtual address: 10408119e84 type:</TT>
<BR><TT>> > "tss_struct ist array"</TT> <BR><TT>> > ---</TT>
<BR><TT>> ></TT> <BR><TT>> > The server is a 4 dual-core AMD
(2.8GHz) with 64GB.</TT> <BR><TT>> ></TT> <BR><TT>> > Any
insights into how best to troubleshoot this are much</TT> <BR><TT>>
appreciated.</TT> <BR><TT>> ></TT> <BR><TT>> > Thanks,</TT>
<BR><TT>> ></TT> <BR><TT>> > Frank Jansen</TT> <BR><TT>></TT>
<BR><TT>> I doubt this has anything to do with the 8-cpu issue.</TT>
<BR><TT>></TT> <BR><TT>I think that you are right, as the crash -d7 seems
to indicate that the</TT> <BR><TT>dump may be incomplete(cf. attached crash
-d7 output).</TT><TT></TT>
<P><TT>> A few questions:</TT> <BR><TT>></TT> <BR><TT>> Is this an
RHEL4 derivative kernel of some kind? I ask</TT> <BR><TT>> because
you're using a system.map file as an argument.</TT> <BR><TT>></TT>
<BR><TT>It's a standard kernel, to which we add a couple of our
(Egenera)</TT> <BR><TT>drivers. I can read the dump without the system
map argument, but was</TT> <BR><TT>just going off the data provided to me by
the person that ran into the</TT> <BR><TT>problem.</TT><TT></TT>
<P><TT>> Anyway, this dumpfile is Egenera's LKCD off-shoot, correct?</TT>
<BR><TT>> Since you got an "lseek" error, the question is whether
(1)</TT> <BR><TT>> the virtual address of 10408119e84 is legitimate, and
(2)</TT> <BR><TT>> whether it is included in your dumpfile.</TT><TT></TT>
<P><TT>I think that the virtual address is legitimate, but that the dump
is</TT> <BR><TT>incomplete at this point.</TT><TT></TT>
<P><TT>></TT> <BR><TT>> What does "crash -d7 ..." show?</TT><TT></TT>
<P><TT>See attached output</TT><TT></TT>
<P><TT>></TT> <BR><TT>> Does crash work on the live system?</TT>
<BR><TT>Yes, it works</TT></P></BLOCKQUOTE><TT>Right -- if it works on the
live system, there's a good chance that</TT> <BR><TT>it's probably missing
from the dumpfile. The tss_struct for each</TT> <BR><TT>cpu is located
in each cpu's per-cpu data area. I have seen the</TT> <BR><TT>exact same
problem with x86_64 netdump "vmcore-incomplete" dumpfiles,</TT> <BR><TT>where
the per-cpu data areas, allocated with alloc_bootmem_node(),</TT>
<BR><TT>would tend to be located in very high physical memory (beyond the</TT>
<BR><TT>end of the vmcore-incomplete contents).</TT><TT></TT>
<P><TT>On a 64GB system, the virtual address of 10408119e84 (~16GB
physical)</TT> <BR><TT>would certainly not be out of the question. And
if it can be read</TT> <BR><TT>on the live machine (crash -d7 will show the
same address access</TT> <BR><TT>sequence), then it's probably not included in
the dumpfile for</TT> <BR><TT>whatever reason.</TT><TT></TT>
<P><TT>In fact, looking at the -d7 output, the level_pgt pagetable
pointers</TT> <BR><TT>for each non-cpu0 cpu_pda get allocated with
__get_free_pages() -- and</TT> <BR><TT>there's a couple from the 10408xxxxxx
virtual memory location:</TT><TT></TT>
<P><TT>...</TT> <BR><TT><readmem: ffffffff804ed700, KVADDR, "cpu_pda
entry", 128, (FOE), 930580></TT> <BR><TT>CPU0: level4_pgt: ffffffff80101000
data_offset: 10087adef60</TT> <BR><TT><readmem: ffffffff804ed780, KVADDR,
"cpu_pda entry", 128, (FOE), 930580></TT> <BR><TT>CPU1: level4_pgt:
1040802c000 data_offset: 10487bf8d60</TT> <BR><TT><readmem:
ffffffff804ed800, KVADDR, "cpu_pda entry", 128, (FOE), 930580></TT>
<BR><TT>CPU2: level4_pgt: 10408008000 data_offset: 10887bf8d60</TT>
<BR><TT><readmem: ffffffff804ed880, KVADDR, "cpu_pda entry", 128, (FOE),
930580></TT> <BR><TT>CPU3: level4_pgt: 10bf9ff2000 data_offset:
10c87bfbf60</TT> <BR><TT><readmem: ffffffff804ed900, KVADDR, "cpu_pda
entry", 128, (FOE), 930580></TT> <BR><TT>CPU4: level4_pgt: 10008028000
data_offset: 10087ae6f60</TT> <BR><TT><readmem: ffffffff804ed980, KVADDR,
"cpu_pda entry", 128, (FOE), 930580></TT> <BR><TT>CPU5: level4_pgt:
10bf9f8a000 data_offset: 10487c00d60</TT> <BR><TT><readmem:
ffffffff804eda00, KVADDR, "cpu_pda entry", 128, (FOE), 930580></TT>
<BR><TT>CPU6: level4_pgt: 100f7f08000 data_offset: 10887c00d60</TT>
<BR><TT><readmem: ffffffff804eda80, KVADDR, "cpu_pda entry", 128, (FOE),
930580></TT> <BR><TT>CPU7: level4_pgt: 107f9f8e000 data_offset:
10c87c03f60</TT> <BR><TT><readmem: 10008000084, KVADDR, "tss_struct ist
array", 56, (FOE), 90c5b0></TT> <BR><TT><readmem: 10408119e84, KVADDR,
"tss_struct ist array", 56, (FOE), 90c5e8></TT> <BR><TT>crash: seek error:
kernel virtual address: 10408119e84 type: "tss_struct ist
array"</TT><TT></TT>
<P><TT>They weren't *read* from there at that point, but it shows that</TT>
<BR><TT>there was memory in that neighborhood. Anyway, the "seek
error"</TT> <BR><TT>from LKCD means that the physical page couldn't be found
in the</TT> <BR><TT>dumpfile by lkcd_lseek():</TT><TT></TT>
<P><TT>/*</TT> <BR><TT> * Read from an LKCD formatted
dumpfile.</TT> <BR><TT> */</TT> <BR><TT>int</TT>
<BR><TT>read_lkcd_dumpfile(int fd, void *bufptr, int cnt, ulong addr,
physaddr_t paddr)</TT> <BR><TT>{</TT>
<BR><TT>
set_lkcd_fp(fp);</TT><TT></TT>
<P><TT> if (!lkcd_lseek(paddr))</TT>
<BR><TT>
return SEEK_ERROR;</TT><TT></TT>
<P><TT> if (lkcd_read((void
*)bufptr, cnt) != cnt)</TT>
<BR><TT>
return READ_ERROR;</TT><TT></TT>
<P><TT> return cnt;</TT>
<BR><TT>}</TT><TT></TT>
<P><TT>I can't really help you from that point on,
though...</TT><TT></TT>
<P><TT>Dave</TT> <BR><TT></TT> </P></BLOCKQUOTE></BODY></HTML>