[Crash-utility] [PATCH 00/11] sadump: Incremental update patches
HATAYAMA Daisuke
d.hatayama at jp.fujitsu.com
Fri Oct 21 03:08:38 UTC 2011
From: Dave Anderson <anderson at redhat.com>
Subject: Re: [Crash-utility] [PATCH 00/11] sadump: Incremental update patches
Date: Thu, 20 Oct 2011 17:06:54 -0400 (EDT)
>
>
> ----- Original Message -----
>> Hello Dave,
>>
>> The following series fix minor bugs, clean up in sadump module, and
>> address the issue on kdump's first 640kB backup.
>>
>> The last patch is a preparation for makedumpfile's support on
>> sadump-related formats, still work in progress, producing dumpfile in
>> kdump-compressed format from sadump-related formats.
>>
>> This patch set is based on crash 5.1.9.
>
> Hello Daisuke,
>
> As I have stated in our previous sadump-related discussions, you have
> free rein to make whatever changes you like in sadump-specific
> files, or in functions that deal with sadump-specific issues. However,
> if your changes modify behavior when used with non-sadump dumpfiles
> then I may have a problem with them. So when you post a patch-set
> such as this last set, I would prefer that you post two separate
> patch-sets.
>
> This 1/11 patchset is a good example of what I mean. I have no
> problem with the sadump-specific patches. But I do have a big
> problem with the last one, which is not necessarily sadump-specific:
>
> use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
>
I see. I'll send them separately for the future.
> BTW, these are the names of the patches as they were attached, where
> the second one doesn't have "0002-" prepended to it, and there is
> no "0008-" patch?:
>
> 0001-sadump-bug-close-receives-unintened-value.patch.patch
> cleanup_is_sadump.patch.patch
> 0002-sadump-bug-specify-wrong-type.patch.patch
> 0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch
> 0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch
> 0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch
> 0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch
> 0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch
> 0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch
> 0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch
> use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
>
Sorry, it's unkind to you. I used stgit to organize the patch set and
send them. I didn't notice that stgit preserves original file names
during attachment.
> Anyway, I tested this by running "bt -a" on a large set of sample dumpfiles,
> first without, and then with, your patchset. When your patches are applied, I see
> numerous examples where the backtraces are missing huge pieces of information.
>
> Here are typical examples:
>
> Here with un-patched crash-5.1.9, is a RHEL6 crashing process:
>
> PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh"
> #0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb
> #1 [ffff88012b273a40] crash_kexec at ffffffff810b6632
> #2 [ffff88012b273b10] oops_end at ffffffff814df320
> #3 [ffff88012b273b40] no_context at ffffffff81040cbb
> #4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45
> #5 [ffff88012b273be0] bad_area at ffffffff8104106e
> #6 [ffff88012b273c10] __do_page_fault at ffffffff81041793
> #7 [ffff88012b273d30] do_page_fault at ffffffff814e132e
> #8 [ffff88012b273d60] page_fault at ffffffff814de6b5
> [exception RIP: sysrq_handle_crash+22]
> RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096
> RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
> RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000
> R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
> #10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
> #11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
> #12 [ffff88012b273ef0] vfs_write at ffffffff811730c8
> #13 [ffff88012b273f30] sys_write at ffffffff81173ad1
> #14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
>
> With crash-5.1.9 plus your patch -- nothing is shown below the page fault
> exception frame:
>
> PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh"
> [exception RIP: sysrq_handle_crash+22]
> RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096
> RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
> RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000
> R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007
> CS: 0010 SS: 0018
> #0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
> #1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
> #2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
> #3 [ffff88012b273ef0] vfs_write at ffffffff811730c8
> #4 [ffff88012b273f30] sys_write at ffffffff81173ad1
> #5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
> RIP: 00007fad3a2f45e0 RSP: 00007fff7b1783d8 RFLAGS: 00010206
> RAX: 0000000000000001 RBX: ffffffff8100b0b2 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 00007fad3abe6000 RDI: 0000000000000001
> RBP: 00007fad3abe6000 R8: 000000000000000a R9: 00007fad3abe2700
> R10: 00007fff7b178160 R11: 0000000000000246 R12: 0000000000000002
> R13: 00007fad3a5a6780 R14: 0000000000000002 R15: 0000000000000001
> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
>
> Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus
> that received shutdown NMI interrupts from the crashing task:
>
> PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper"
> #0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96
> #1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5
> #2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a
> #3 [ffff880028227ef0] notify_die at ffffffff810942fe
> #4 [ffff880028227f20] do_nmi at ffffffff814df033
> #5 [ffff880028227f50] nmi at ffffffff814de940
> [exception RIP: intel_idle+177]
> RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046
> RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040
> RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020
> R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> --- <NMI exception stack> ---
> #6 [ffff88012cd31e68] intel_idle at ffffffff812bc291
> #7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
> #8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
>
> PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2"
> #0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96
> #1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5
> #2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a
> #3 [ffff880028247ef0] notify_die at ffffffff810942fe
> #4 [ffff880028247f20] do_nmi at ffffffff814df033
> #5 [ffff880028247f50] nmi at ffffffff814de940
> [exception RIP: io_serial_in+22]
> RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006
> RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000
> RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0
> RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000
> R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c
> R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> --- <NMI exception stack> ---
> #6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6
> #7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
> #8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
> #9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
> #10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
> #11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
> #12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
> #13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
> #14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
> #15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
> #16 [ffff88012ce5fee8] kthread at ffffffff8108dff6
> #17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
>
> But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception
> stack are not even shown at all:
>
> PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper"
> [exception RIP: intel_idle+177]
> RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046
> RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040
> RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020
> R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001
> CS: 0010 SS: 0018
> #0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d
> #1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
> #2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
>
> PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2"
> [exception RIP: io_serial_in+22]
> RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006
> RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000
> RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0
> RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000
> R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c
> R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0
> CS: 0010 SS: 0018
> #0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
> #1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
> #2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
> #3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
> #4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
> #5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
> #6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
> #7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
> #8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
> #9 [ffff88012ce5fee8] kthread at ffffffff8108dff6
> #10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
>
> If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch" patch
> the backtraces are correct. Now, it may be true that the changes you made make
> sense with respect to sadump dumpfiles, where the register set stored in the header
> is a reflection of the last location that each cpu ran (?).
>
> But those changes are totally unacceptable for compressed kdump dumpfiles.
I undestand the situtation.
I attach V2 patch. I confirmed this doesn't break the logic explained
above. Could you review this?
Thanks.
HATAYAMA, Daisuke
-------------- next part --------------
A non-text attachment was scrubbed...
Name: use_regs_in_elf_notes_on_kdump_fmt_from_sadump_v2.patch
Type: text/x-patch
Size: 2990 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20111021/c2ea31e0/attachment.bin>
More information about the Crash-utility
mailing list