[Crash-utility] [PATCH 00/11] sadump: Incremental update patches

HATAYAMA Daisuke d.hatayama at jp.fujitsu.com
Fri Oct 21 03:08:38 UTC 2011


From: Dave Anderson <anderson at redhat.com>
Subject: Re: [Crash-utility] [PATCH 00/11] sadump: Incremental update patches
Date: Thu, 20 Oct 2011 17:06:54 -0400 (EDT)

> 
> 
> ----- Original Message -----
>> Hello Dave,
>> 
>> The following series fix minor bugs, clean up in sadump module, and
>> address the issue on kdump's first 640kB backup.
>> 
>> The last patch is a preparation for makedumpfile's support on
>> sadump-related formats, still work in progress, producing dumpfile in
>> kdump-compressed format from sadump-related formats.
>> 
>> This patch set is based on crash 5.1.9.
> 
> Hello Daisuke,
> 
> As I have stated in our previous sadump-related discussions, you have
> free rein to make whatever changes you like in sadump-specific
> files, or in functions that deal with sadump-specific issues.  However, 
> if your changes modify behavior when used with non-sadump dumpfiles
> then I may have a problem with them.  So when you post a patch-set 
> such as this last set, I would prefer that you post two separate 
> patch-sets.
> 
> This 1/11 patchset is a good example of what I mean.  I have no
> problem with the sadump-specific patches.  But I do have a big
> problem with the last one, which is not necessarily sadump-specific:
> 
>   use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
> 

I see. I'll send them separately for the future.

> BTW, these are the names of the patches as they were attached, where
> the second one doesn't have "0002-" prepended to it, and there is
> no "0008-" patch?:
>   
>   0001-sadump-bug-close-receives-unintened-value.patch.patch
>   cleanup_is_sadump.patch.patch
>   0002-sadump-bug-specify-wrong-type.patch.patch
>   0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch
>   0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch
>   0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch
>   0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch
>   0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch
>   0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch
>   0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch
>   use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
> 

Sorry, it's unkind to you. I used stgit to organize the patch set and
send them. I didn't notice that stgit preserves original file names
during attachment.

> Anyway, I tested this by running "bt -a" on a large set of sample dumpfiles, 
> first without, and then with, your patchset.  When your patches are applied, I see 
> numerous examples where the backtraces are missing huge pieces of information.
> 
> Here are typical examples:
> 
> Here with un-patched crash-5.1.9, is a RHEL6 crashing process:
>  
>  PID: 14187  TASK: ffff88012b98e040  CPU: 0   COMMAND: "runtest.sh"
>   #0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb
>   #1 [ffff88012b273a40] crash_kexec at ffffffff810b6632
>   #2 [ffff88012b273b10] oops_end at ffffffff814df320
>   #3 [ffff88012b273b40] no_context at ffffffff81040cbb
>   #4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45
>   #5 [ffff88012b273be0] bad_area at ffffffff8104106e
>   #6 [ffff88012b273c10] __do_page_fault at ffffffff81041793
>   #7 [ffff88012b273d30] do_page_fault at ffffffff814e132e
>   #8 [ffff88012b273d60] page_fault at ffffffff814de6b5
>      [exception RIP: sysrq_handle_crash+22]
>      RIP: ffffffff8131b566  RSP: ffff88012b273e18  RFLAGS: 00010096
>      RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000f95
>      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
>      RBP: ffff88012b273e18   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: 00007fff7b178160  R11: 0000000000000000  R12: 0000000000000000
>      R13: ffffffff81a9a1a0  R14: 0000000000000286  R15: 0000000000000007
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
>  #10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
>  #11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
>  #12 [ffff88012b273ef0] vfs_write at ffffffff811730c8
>  #13 [ffff88012b273f30] sys_write at ffffffff81173ad1
>  #14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
>  
> With crash-5.1.9 plus your patch -- nothing is shown below the page fault
> exception frame:
>  
>  PID: 14187  TASK: ffff88012b98e040  CPU: 0   COMMAND: "runtest.sh"
>      [exception RIP: sysrq_handle_crash+22]
>      RIP: ffffffff8131b566  RSP: ffff88012b273e18  RFLAGS: 00010096
>      RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000f95
>      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
>      RBP: ffff88012b273e18   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: 00007fff7b178160  R11: 0000000000000000  R12: 0000000000000000
>      R13: ffffffff81a9a1a0  R14: 0000000000000286  R15: 0000000000000007
>      CS: 0010  SS: 0018
>   #0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
>   #1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
>   #2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
>   #3 [ffff88012b273ef0] vfs_write at ffffffff811730c8
>   #4 [ffff88012b273f30] sys_write at ffffffff81173ad1
>   #5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
>      RIP: 00007fad3a2f45e0  RSP: 00007fff7b1783d8  RFLAGS: 00010206
>      RAX: 0000000000000001  RBX: ffffffff8100b0b2  RCX: 0000000000000000
>      RDX: 0000000000000002  RSI: 00007fad3abe6000  RDI: 0000000000000001
>      RBP: 00007fad3abe6000   R8: 000000000000000a   R9: 00007fad3abe2700
>      R10: 00007fff7b178160  R11: 0000000000000246  R12: 0000000000000002
>      R13: 00007fad3a5a6780  R14: 0000000000000002  R15: 0000000000000001
>      ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>   
> Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus
> that received shutdown NMI interrupts from the crashing task:
>  
>  PID: 0      TASK: ffff88012cd2f580  CPU: 1   COMMAND: "swapper"
>   #0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96
>   #1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5
>   #2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a
>   #3 [ffff880028227ef0] notify_die at ffffffff810942fe
>   #4 [ffff880028227f20] do_nmi at ffffffff814df033
>   #5 [ffff880028227f50] nmi at ffffffff814de940
>      [exception RIP: intel_idle+177]
>      RIP: ffffffff812bc291  RSP: ffff88012cd31e68  RFLAGS: 00000046
>      RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
>      RDX: 0000000000000000  RSI: ffff88012cd31fd8  RDI: ffffffff81a34040
>      RBP: ffff88012cd31ed8   R8: 0000000000000000   R9: 00000000000000c8
>      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000020
>      R13: 12257c81ed7a34e6  R14: 0000000000000003  R15: 0000000000000001
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  --- <NMI exception stack> ---
>   #6 [ffff88012cd31e68] intel_idle at ffffffff812bc291
>   #7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
>   #8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
> 
>  PID: 37     TASK: ffff88012ce360c0  CPU: 2   COMMAND: "events/2"
>   #0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96
>   #1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5
>   #2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a
>   #3 [ffff880028247ef0] notify_die at ffffffff810942fe
>   #4 [ffff880028247f20] do_nmi at ffffffff814df033
>   #5 [ffff880028247f50] nmi at ffffffff814de940
>      [exception RIP: io_serial_in+22]
>      RIP: ffffffff813324f6  RSP: ffff88012ce5fc70  RFLAGS: 00000006
>      RAX: ffffffffab364400  RBX: ffffffff81f2cca0  RCX: 0000000000000000
>      RDX: 000000000000d055  RSI: 0000000000000005  RDI: ffffffff81f2cca0
>      RBP: ffff88012ce5fc70   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: ffff880127498a60  R11: 0000000000000001  R12: 000000000000270c
>      R13: 0000000000000020  R14: 0000000000000000  R15: ffffffff81332ba0
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  --- <NMI exception stack> ---
>   #6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6
>   #7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
>   #8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
>   #9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
>  #10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
>  #11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
>  #12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
>  #13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
>  #14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
>  #15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
>  #16 [ffff88012ce5fee8] kthread at ffffffff8108dff6
>  #17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
>  
> But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception
> stack are not even shown at all:
>     
>  PID: 0      TASK: ffff88012cd2f580  CPU: 1   COMMAND: "swapper"
>      [exception RIP: intel_idle+177]
>      RIP: ffffffff812bc291  RSP: ffff88012cd31e68  RFLAGS: 00000046
>      RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
>      RDX: 0000000000000000  RSI: ffff88012cd31fd8  RDI: ffffffff81a34040
>      RBP: ffff88012cd31ed8   R8: 0000000000000000   R9: 00000000000000c8
>      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000020
>      R13: 12257c81ed7a34e6  R14: 0000000000000003  R15: 0000000000000001
>      CS: 0010  SS: 0018
>   #0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d
>   #1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
>   #2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
>  
>  PID: 37     TASK: ffff88012ce360c0  CPU: 2   COMMAND: "events/2"
>      [exception RIP: io_serial_in+22]
>      RIP: ffffffff813324f6  RSP: ffff88012ce5fc70  RFLAGS: 00000006
>      RAX: ffffffffab364400  RBX: ffffffff81f2cca0  RCX: 0000000000000000
>      RDX: 000000000000d055  RSI: 0000000000000005  RDI: ffffffff81f2cca0
>      RBP: ffff88012ce5fc70   R8: ffffffff81b9e5c0   R9: 0000000000000000
>      R10: ffff880127498a60  R11: 0000000000000001  R12: 000000000000270c
>      R13: 0000000000000020  R14: 0000000000000000  R15: ffffffff81332ba0
>      CS: 0010  SS: 0018
>   #0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
>   #1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
>   #2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
>   #3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
>   #4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
>   #5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
>   #6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
>   #7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
>   #8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
>   #9 [ffff88012ce5fee8] kthread at ffffffff8108dff6
>  #10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
>  
> If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch" patch
> the backtraces are correct.  Now, it may be true that the changes you made make
> sense with respect to sadump dumpfiles, where the register set stored in the header
> is a reflection of the last location that each cpu ran (?).  
> 
> But those changes are totally unacceptable for compressed kdump dumpfiles.

I undestand the situtation.

I attach V2 patch. I confirmed this doesn't break the logic explained
above. Could you review this?

Thanks.
HATAYAMA, Daisuke
-------------- next part --------------
A non-text attachment was scrubbed...
Name: use_regs_in_elf_notes_on_kdump_fmt_from_sadump_v2.patch
Type: text/x-patch
Size: 2990 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20111021/c2ea31e0/attachment.bin>


More information about the Crash-utility mailing list