[Crash-utility] [PATCH] bug on get_be_long() and improvement of bt
Hu Tao
hutao at cn.fujitsu.com
Tue Oct 19 00:36:05 UTC 2010
On Mon, Oct 18, 2010 at 09:56:41AM -0400, Dave Anderson wrote:
>
> ----- "Dave Anderson" <anderson at redhat.com> wrote:
>
> > ----- "Hu Tao" <hutao at cn.fujitsu.com> wrote:
> >
> > > Hi Dave,
> > >
> > > There is a bug on get_be_long() that causes high 32 bits truncated.
> > > As a result, we get wrong registers values from dump file. Patch 1
> > > fixes this.
> >
> > Good catch!
> >
> > > Once we can get right cpu registers values, it's better to use the
> > > sp/ip for backtracing the active task. This can show a more accurate
> > > backtrace, not including those invalid frames beyond sp. pathes 2 and
> > > 3 do this on kvmdump case(virsh dump).
> > >
> > > To verify: run that km_probe.c test module on a x86_64 system, then
> > > `echo q > /proc/sysrq-trigger' to trigger the kprobe which does
> > > looping in post_handler. Then vrish dump then crash.
> >
> > However, I'm wondering whether you actually tested this with a
> > *crashed* system's dumpfile, and not just with a *live* system dump with
> > a contrived set of circumstances. And if so, what differences, if any,
> > did you see with the backtraces of the task that either oops'd or called
> > panic(), as well as those of the other active tasks that were brought down
> > by IP interrupt?
> >
> > Anyway, I'll give this a thorough testing with a set of sample
> > dumpfiles that I have on hand.
>
> Actually, the patch fails miserably on SMP dumpfiles with segmentation
> during initialization.
>
> And now looking at your patch, I'm wondering whether you even tested this
> with an SMP system?
My fault. I will do more tests and improve the patch.
* test with a SMP system
* test with a live dump
* test with system panic/oops
* test with x86/x86_64
>
> The change to qemu_load() here:
>
> @@ -904,6 +906,9 @@ qemu_load (const struct qemu_device_loader *devices, uint32_t required_features,
> if (feof (fp) || ferror (fp))
> break;
>
> + if (STREQ(d->vtbl->name, "cpu"))
> + result->dx86 = d;
> +
> if (sec == QEMU_VM_SECTION_END || sec == QEMU_VM_SECTION_FULL)
> result->features |= features;
> }
>
> That function cycles through the "cpu" devices for *each* cpu
> in the system, so this patch will store the device of the last cpu
> device it encounters. So in an SMP machine, it will store the
> device for the highest cpu only, right?
>
> And then there's this change to get_kvmdump_regs():
>
> @@ -310,7 +311,11 @@ kvmdump_memory_dump(FILE *ofp)
> void
> get_kvmdump_regs(struct bt_info *bt, ulong *pc, ulong *sp)
> {
> - machdep->get_stack_frame(bt, pc, sp);
> + if (is_task_active(bt->task)) {
> + *sp = device_list->dx86->regs[R_ESP];
> + *pc = device_list->dx86->eip;
> + } else
> + machdep->get_stack_frame(bt, pc, sp);
> }
>
> is_task_active() returns TRUE for all active tasks in an SMP system,
> not just the panic task. So it would seem you're going to pass
> back the same registers for all active tasks?
>
> And what's the point of this change to kernel.c?
>
> diff --git a/kernel.c b/kernel.c
> index e399099..2627020 100755
> --- a/kernel.c
> +++ b/kernel.c
> @@ -16,6 +16,7 @@
> */
>
> #include "defs.h"
> +#include "qemu-load.h"
> #include "xen_hyper_defs.h"
> #include <elf.h>
>
> Also, the change to main.c is unnecessary -- there are dozens of malloc'd
> memory areas in the program -- so why go to the bother of free'ing
> just this one prior to exiting?
>
> Anyway, instead of saving the device list, I suggest you do something
> like storing the per-cpu IP/SP values in a separate data structure that
> can possibly be used as an alternative source for register values for
> "live dumps" -- and possibly for crashing systems if usable starting
> hooks cannot be determined in the traditional manner. I had thought
> of doing something like that in the past, but when I looked at the
> register values, I must have run into the get_be_long() issue?
Sounds reasonable. Thanks for suggestions.
--
Thanks,
Hu Tao
More information about the Crash-utility
mailing list