[Crash-utility] dom0 analysis for IA64

Itsuro ODA oda at valinux.co.jp
Fri May 11 00:59:21 UTC 2007


Hi Dave,

On Thu, 10 May 2007 15:27:34 -0400
Dave Anderson <anderson at redhat.com> wrote:

> Itsuro ODA wrote:
> 
> > Hi Dave,
> >
> > The attached patch enables to analyze dom0 linux from
> > whole memory dump on IA64. (for crash-4.0-4.1)
> > It is just quick hack.
> > (I was asked from IA64 Xen developers and made it.)
> >
> > Each domain manages own machine memory by domain.arch.mm.pgd
> > in IA64. It is 3-level page table.
> > I thougnt the mfn of domain.arch.mm.pgd can be regarded as
> > p2m_mfn.
> >
> > I intended to modify as less existent code as possible.
> > But this patch is a bit tricky. And the memory usage is
> > large if the machine memory layout is sparse.
> > (maybe xen_kdump_p2m should be prepare for each arch ?)
> >
> > Would you consider to support dom0 analysis for IA64 ?
> >
> > I prepared two sample dumps. Please find from the following
> > URLs.
> >
> > 1) http://people.valinux.co.jp/~oda/20070510-sample-dump-1.tar
> >   contents:
> >   - vmcore.gz
> >     This is taken by a hard assist dump. netdump style ELF vmcore.
> >     So XEN_ELFNOTE_CRASH_INFO does not exist.
> >   - vmcore.ka.gz
> >     It is coverted to kdump style and added XEN_ELFNOTE_CRASH_INFO
> >     manually.
> >   - vmlinux.debug.gz
> >     for dom0 analysis
> >   - xen-syms-2.6.18-8.el5.gz
> >     for xencrash
> >
> >   To get p2m_mfn, xencrash's doms command is usefull.
> > --------------------------------------------------------------------------
> > # crash xen-syms-2.6.18-8.el5 vmcore
> > ...
> > crash> doms
> >    DID       DOMAIN      ST T  MAXPAGE  TOTPAGE VCPU     SHARED_I          P2M_MFN
> >   32753 f000000007ac8080 RU O     0        0      0          0              ----
> >   32754 f000000007acc080 RU X     0        0      0          0              ----
> > > 32767 f000000007ff8080 RU I     0        0      4          0              ----
> >       0 f000000007aa4080 RU 0   10000    fc28     1  f000000007a88000       1abb7
> > >*    1 f000000007a78080 RU U   10603    10603    3  f000000007a5c000       1a909
> > crash>
> > ----------------------------------------------------------------------------
> >
> >   Then normal crash session with --p2m_mfn option.
> > ----------------------------------------------------------------------------
> > # crash --p2m_mfn=1abb7 vmlinux.debug vmcore
> > ...
> > ----------------------------------------------------------------------------
> >
> >   vmcore.ka has XEN_ELFNOTE_CRASH_INFO. so --p2m_mfn option not need.
> > ----------------------------------------------------------------------------
> > # crash vmlinux.debug vmcore.ka
> > ...
> > ----------------------------------------------------------------------------
> >
> >   --p2m_mfn option is effective only if a vmcore has XEN_ELFNOTE_CRASH_INFO
> >   now.
> >   I think specifying --p2m_mfn option is regarded as the vmcore is
> >   XEN_CORE_DUMPFILE(). The patch supports this.
> >   I think it is necessary for dumps which does not have
> >   XEN_ELFNOTE_CRASH_INFO such as above sample.
> >
> 
> OK, I finally got these all downloaded.  However, the xen-syms
> binary in the "sample-1" directory has no debug data:
> 
> # file xen-syms-2.6.18-8.el5
> xen-syms-2.6.18-8.el5: ELF 64-bit LSB executable, IA-64, version 1 (SYSV), statically linked, stripped
> #

Sorry. 
(I have forgoten that I put .debug file into /usr/lib/debug/boot
in my environment.)

I attached .debug file in this mail.
(I put it into 20070510-sample-dump-1.tar too.)

> And I see that check_netdump_xen() is only called if the
> netdump (?) vmcore is used, since it needs the --p2m_mfn
> argument.  I have no idea where check_kdump_xen() would
> apply?

Right, there is no customer.

> In any case, I really prefer not to support whatever that
> first "hard assist dump. netdump style ELF" vmcore file.
> (What is that???)

The sample vmcore was taken by fujitsu's "sadump" tool which
is a hardware assist tool.
Putting the dump button casuse INIT inerrupt and then 
registers are saved and CPUs stop. The hardware dumps all
memory after that. The memory image is converted to ELF
vmcore (netdump style) finally.

It is hard to add XEN_ELFNOTE_CRASH_INFO notes in this
procedure (I think).
(I think it is easy to change to converting kdump style.)

I will ask to fujitsu's people whether it is available. 

It is for IA64 machine now. I don't know X86/X86_64 machine
will be supported.

> I don't see why the support for dom0 ia64 kdumps should
> be any different than for x86 and x86_64, both of which
> have XEN_ELFNOTE_CRASH_INFO notes containing the p2m mfn
> value.
> 
> Therefore, the check_netdump_xen() and check_kdump_xen()
> can be thrown out, and all that is really required is the
> implementation of ia64_xen_kdump_p2m_create() for the vmlinux
> side.  But it will still need a fix to deal with that
> over-sized (?) 512k p2m_frame list.  Can you look into fixing
> that?
> 
> Also, I don't quite understand the changes to xen_kdump_p2m().
> The first (generic) part is probably a safe thing to do:
> 
> +       if (mfn_idx >= xkd->p2m_frames)
> +               return P2M_FAILURE;

Yes. Illegal input to "rd" causes SIGSEGV.

> But if the above code is put into place, how would it
> be possible for the resultant mfn_frame to be 0?
> 
> + #ifdef IA64
> +         if (mfn_frame == 0)
> +                 return P2M_FAILURE;
> + #endif
> 
> And I don't understand this part at all:
> 
> + #ifdef IA64
> +         if (!(*mfnptr & 0x1))
> +                 return P2M_FAILURE;
> +         paddr = *mfnptr & _PFN_MASK;
> + #else
> +         paddr = (physaddr_t)PTOB((ulonglong)(*mfnptr));
> + #endif
> 
> Although, after putting in a debug printf of what the mfns
> actually look like on an ia64, I guess I see why it's
> necessary.
> 
> On x86 and x86_64, the mfnptr points to a simple mfn value.
> 
> But on the ia64, I see mfns that look like 81000007bf3c761,
> where the "1-bit" is always set.  And you don't shift the
> mfn value like x86/x86_64 do.  Can you help me understand
> the format of the ia64 mfns?  In other words, what part of a
> value such as 81000007bf3c761 is the actual mfn?  Are there
> page flags or something in the lower bits of the number?

This is a page table. The 3rd level table contains page
table entry.

IA64 manages phys-to-machine mapping different way from 
x86/x86_64. It is a 3-level page table. 
p2m_mfn containd in XEN_ELFNOTE_CRASH_INFO is the mfn of
the top level page table (ie. pgd).

The only same thing is "3-level".
(So I said "tricky".)

> Thanks,
>   Dave
-- 
Itsuro ODA <oda at valinux.co.jp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xen-syms-2.6.18-8.el5.debug.gz
Type: application/octet-stream
Size: 919237 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20070511/2e163739/attachment.obj>


More information about the Crash-utility mailing list