[Crash-utility] [RFC][PATCH] use the value of register in the vmcore when we do not find panic task

Dave Anderson anderson at redhat.com
Thu Mar 17 20:59:18 UTC 2011


----- Original Message -----
> We have a new hardware to do dump, and use makedumpfile to generate
> vmcore. Our hardware can work when the OS is out of controll(for
> example: dead loop). When we use crash to analyze the vmcore, bt can
> not work, because there is no panic task.
> 
> We have provide the value of register in the vmcore(the format is
> elf_prstatus, it is same with normal kdump's vmcore). So we can use
> it when we do not find panic task.

Let me state first that I get very nervous when patches are 
made to commonly used functions to handle special cases such
as this.  When those kinds of changes are proposed, I much prefer
that the new code *only* apply to the special cases, and should 
not affect anything else.

I do not have any x86 or x86_64 compressed kdumps that 
contain ELF note data, so I cannot really test this patch.
However, when I do run this patch against a sample set 
of ~150 x86_64 dumpfiles, I see errors on simple backtraces
such as the following examples.

Without the patch:

  crash> bt ffff81007e54c100
  PID: 0      TASK: ffff81007e54c100  CPU: 1   COMMAND: "swapper"
   #0 [ffff81007e565ef0] cpu_idle at ffffffff80047282
  crash>

With your patch:

  crash> bt ffff81007e54c100
  PID: 0      TASK: ffff81007e54c100  CPU: 1   COMMAND: "swapper"
  Segmentation fault

And it doesn't work with Xen kernels very well.

Without the patch:
  
  crash> bt ffff8800009c0040
  PID: 0      TASK: ffff8800009c0040  CPU: 1   COMMAND: "swapper"
   #0 [ffff88001d485ef8] safe_halt at ffffffff8011004c
   #1 [ffff88001d485f28] xen_idle at ffffffff801092f4
   #2 [ffff88001d485f38] cpu_idle at ffffffff801093b5
  crash>

With your patch:
  
  crash> bt ffff8800009c0040
  PID: 0      TASK: ffff8800009c0040  CPU: 1   COMMAND: "swapper"
  bt: cannot determine starting stack pointer
  crash>
  
Without the patch:
  
  crash> bt -a
  ... [ cut ] ...
  
  PID: 0      TASK: ffff880000017040  CPU: 1   COMMAND: "swapper"
   #0 [ffff880033983f38] xen_idle at ffffffff8026c502
   #1 [ffff880033983f48] cpu_idle at ffffffff8024a982
  
  PID: 0      TASK: ffff8800000037e0  CPU: 2   COMMAND: "swapper"
   #0 [ffff880033987f38] xen_idle at ffffffff8026c502
   #1 [ffff880033987f48] cpu_idle at ffffffff8024a982
  
  PID: 0      TASK: ffff880000003080  CPU: 3   COMMAND: "swapper"
   #0 [ffff880033989f38] xen_idle at ffffffff8026c502
   #1 [ffff880033989f48] cpu_idle at ffffffff8024a982
  
With your patch:
  
  crash> bt -a
  ... [ cut ] ...
  
  PID: 0      TASK: ffff880000017040  CPU: 1   COMMAND: "swapper"
  bt: cannot determine starting stack pointer
  
  PID: 0      TASK: ffff8800000037e0  CPU: 2   COMMAND: "swapper"
  bt: cannot determine starting stack pointer
  
  PID: 0      TASK: ffff880000003080  CPU: 3   COMMAND: "swapper"
  bt: cannot determine starting stack pointer
  crash> 
  
So my point is that given that *none* of these kernels even 
contain makedumpfile-generated ELF data -- so your code should
*not* affect them at all.

That being the case, I cannot accept the patch as it is currently
written.  Things work fine as things are now, and I'm not interested
in debugging things that break because your patch changes current 
behavior.

Here are my two major suggestions:

(1) If it is determined during the initial dumpfile scan that
    it contains ELF note data, then set a global flag, perhaps
    in the new pc->flags2, something like MAKEDUMPFILE_ELF_NOTES.

(2) Then, segregate your changes *completely* based upon that
    flag.  I would rather have essentially-redundant code put in
    place instead of breaking currently-existing code.

That way, for all dumpfiles where the MAKEDUMPFILE_ELF_NOTES flag 
is *not* set, then your code should *not* run.

And here are a few specific comments:

 (1) The machdep->process_elf_notes handler was put in place for
     this kind of thing, so please continue to use it for x86 and
     x86_64 in the same way that the s390x architecture does.
     You don't need to pass the machine type as an extra argument,
     given that "machine_type()" can be used anywhere.  The two
     arches can share the same handler.

 (2) get_netdump_regs_x86_64: segregate the code based upon the
     MAKEDUMPFILE_ELF_NOTES flag. 

 (3) get_netdump_regs_x86(): segregate the code based upon the
     MAKEDUMPFILE_ELF_NOTES flag.

 (4) Make a new map_cpus_to_prstatus()-type function for this kind
     of dumpfile.  In task_init(), you can check the MAKEDUMPFILE_ELF_NOTES
     flag and call the new function, and so you won't need your new
     KDUMP_CMPRS_DUMPFILE() function.

 (5) x86_64_get_dumpfile_stack_frame(): use the MAKEDUMPFILE_ELF_NOTES
     flags instead of bt->machdep. 

And two final requests:

 (1) Before posting the patch, please build your changes with "make warn".
     Your changes generate a few warnings.

 (2) Can make your patches attachments to your email instead of inline?

Thanks,
  Dave




More information about the Crash-utility mailing list