[Crash-utility] crash version 4.0-3.15 is available
Dave Anderson
anderson at redhat.com
Thu Dec 21 13:54:23 UTC 2006
Isaku Yamahata wrote:
> On Wed, Dec 20, 2006 at 10:15:32AM -0500, Dave Anderson wrote:
>
> > - Introduced support for xendumps of para-virtualized ia64 kernels.
> > It should be noted that currently the ia64 Xen kernel does not
> > lay down a switch_stack for the panic task, so only raw "bt -t"
> > backtraces can be done on the panic task. (anderson at redhat.com)
>
> Hi Dave.
>
> The current "xm dump-core" on ia64 loses some registers infomation
> which is saved on xen register stack.
> e.g. r33, ... aren't saved in domU xendump file.
> Probably ia64 specific code would be necessarry for it.
> This will be addressed as post-3.0.4 effort and the format will be changed.
>
> --
> yamahata
I'm not sure exactly what the ramifications are of an ia64 "xm dump-core"
on a paravirtualized kernel. It would seem to depend upon what, if anything,
was "active" at the time.
My reference to the switch_stack above was for an ia64 kernel that panicked
on its own account; the test dump I used was killed with a write to
/proc/sysrq-trigger:
.crash> bt
PID: 1554 TASK: e000000000988000 CPU: 0 COMMAND: "bash"
bt: xendump: switch_stack possibly not saved -- try "bt -t"
#0 [BSP:e000000000988f00] schedule at a0000001005e0420
crash>
It uses the stale information from the last time it called schedule(),
so the backtrace fails.
Using "bt -t" walks the process stack for kernel return addresses,
and the "reverse" BSP information just above the task_struct shows
the path taken:
crash> bt -t
PID: 1554 TASK: e000000000988000 CPU: 0 COMMAND: "bash"
START: schedule at a0000001005e0420
[e000000000989238] xen_trace_syscall at a000000100065020
[e000000000989288] sys_write at a000000100155d30
[e0000000009892b8] vfs_write at a0000001001551e0
[e000000000989308] write_sysrq_trigger at a0000001001e3250
[e000000000989320] __handle_sysrq at a00000010039bca0
[e000000000989380] sysrq_handle_crashdump at a00000010039c460
[e0000000009893e8] do_wait at a00000010008b080
[e000000000989438] schedule_timeout at a0000001005e22a0
[e000000000989450] ext3_lookup at a0000002001eb7d0
[e000000000989460] cleanup_module at a000000200202a10
[e000000000989470] ext3_find_entry at a0000002001e77c0
[e0000000009894b0] __wait_on_buffer at a00000010015b580
[e0000000009894c0] ll_rw_block at a00000010015c2b0
[e000000000989500] out_of_line_wait_on_bit at a0000001005e2940
[e000000000989520] __wait_on_bit at a0000001005e27d0
[e000000000989540] sync_buffer at a00000010015b7c0
[e000000000989558] io_schedule at a0000001005e2170
[e000000000989588] __delayacct_blkio_start at a0000001000fa4b0
[e000000000989608] io_schedule at a0000001005e21a0
[e000000000989630] __do_IRQ at a0000001000f2450
[e000000000989660] do_softirq at a000000100093100
[e000000000989698] blkif_int at a000000200152070
[e000000000989710] end_that_request_first at a00000010027ae30
[e000000000989748] __end_that_request_first at a00000010027a460
[e000000000989778] bio_endio at a0000001001622b0
[e00000000098fca0] schedule at a0000001005e0420
[e00000000098fd10] vhpt_miss at a000000100000002
[e00000000098fd60] vhpt_miss at a000000100000002
[e00000000098fdc8] dummycon_dummy at a0000001002dd380
[e00000000098fdd0] vhpt_miss at a000000100000003
crash>
Well, it at least shows it going as far as sysrq_handle_crashdump(),
and any further addresses of function calls were never pushed into
the BSP. (?)
Anyway, the problem is that the ia64 shutdown path in a para-virtualized ia64
kernel does not lay down a switch_stack -- as is done by the netdump,
diskdump and kdump facilities. Without a switch_stack register dump,
a backtrace is impossible.
It's a simple thing to do -- at some point during the shutdown
path, presumably xen_panic_event(), the panicking process would
need to make a call to the unw_init_running() function, which lays
down a switch_stack on the kernel stack, and then continues on to
the next function in the shutdown path. For example, the kdump
facility for ia64 does this:
[ system crashes ]
crash_kexec()
machine_kexec()
...
The ia64 version of machine_kexec() does this:
void machine_kexec(struct kimage *image)
{
unw_init_running(ia64_machine_kexec, image);
for(;;);
}
The call to unw_init_running() never returns, but rather
it lays down a switch_stack on the kernel stack, and then
calls the ia64_machine_kexec() function:
extern void *efi_get_pal_addr(void);
static void ia64_machine_kexec(struct unw_frame_info *info, void *arg)
{
struct kimage *image = arg;
relocate_new_kernel_t rnk;
void *pal_addr = efi_get_pal_addr();
unsigned long code_addr = (unsigned long)page_address(image->control_code_page);
unsigned long vector;
int ii;
if (image->type == KEXEC_TYPE_CRASH) {
crash_save_this_cpu();
current->thread.ksp = (__u64)info->sw - 16;
}
... (continue shutdown path)
The address of the switch stack address is found in the unw_frame_info
structure passed in, and gets stored in the current->thread.ksp of
the panicking task. With that simple procedure, the crash utility
will then have all that it needs to do a backtrace of the panicking task.
Since the para-virtualized ia64 kernel shuts down when panic()
calls atomic_notifier_call_chain(), which in turn goes through the
panic_notifier list -- which leads to the ia64 version of xen_panic_event():
static int
xen_panic_event(struct notifier_block *this, unsigned long event, void *ptr)
{
HYPERVISOR_shutdown(SHUTDOWN_crash);
/* we're never actually going to get here... */
return NOTIFY_DONE;
}
The ia64 would need to "jump through the hoop" of a call to
unw_init_running() before it calls HYPERVISOR_shutdown()
Thanks,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20061221/4ae6a0a6/attachment.htm>
More information about the Crash-utility
mailing list