[Crash-utility] [PATCH v5 0/4] arm64: more improvement of bt -f

Thu Jun 30 19:52:34 UTC 2016

----- Original Message -----

Takahiro,

Sorry for the previous empty response -- I sent it by mistake while
putting this 3-message collaborative response together.

> Dave,
> 
> If you don't like my patches, that is OK.
> Applying them or not is totally up to you.
> I will *never* submit my patches again.
 
Well, that's certainly unfortunate.

> Having said so, I think I would better explain my intentions on the code.
> 
> On Wed, Jun 29, 2016 at 04:09:01PM -0400, Dave Anderson wrote:
> > 
> >  
> > Hi Takahiro,
> > 
> > I applied patches 1/2 and 2/2 from the v5 patchset.  But I can't
> > believe the results are what you intended?
> > 
> > For example, taking the 4.6 vmcore that you gave to me, here is the
> > current crash utility's output of "bt -a", where the crashing task
> > entered crash_kexec() via the sysrq-c page fault exception, and the
> > tasks on the other cpus have all entered crash_save_cpu() on their
> > IRQ stack as a result of the shutdown IPI, one from user-space and
> > the others from the kernel:
> >   
> >   crash> bt -a
> >   PID: 0      TASK: ffff000008dcd900  CPU: 0   COMMAND: "swapper/0"
> >    #0 [ffff800022f42e50] crash_save_cpu at ffff00000812ae44
> >    #1 [ffff800022f43010] handle_IPI at ffff00000808e718
> >    #2 [ffff800022f43040] gic_handle_irq at ffff0000080815f8
> >    #3 [ffff800022f43080] el1_irq at ffff000008084720
> >   --- <IRQ stack> ---
> 
> This part of dump shown above is the result of "kdump" mechanisms,
> and is *not* useful for you to understand what was happening on this
> CPU when a crash occurred on the other CPU, #3 in this case.
> 
> >        PC: ffff0000080857c0  [arch_cpu_idle+16]
> >        LR: ffff0000080857bc  [arch_cpu_idle+12]
> >        SP: ffff000008dc3f10  PSTATE: 60400149
> >       X29: ffff000008dc3f10  X28: ffff000008dc0000  X27: 0000000000000000
> >       X26: ffff000008dc5000  X25: ffff000008dc5c1c  X24: ffff000008dc5000
> >       X23: ffff000008dc0000  X22: ffff000008bd0270  X21: ffff000008dc0000
> >       X20: ffff000008dc5b88  X19: 0000000000000000  X18: 00000000a632f641
> >       X17: 0000ffff7da57880  X16: ffff0000081d9838  X15: 00000000383a0a79
> >       X14: 00000000b2b0b162  X13: 000000005f2cbeec  X12: 0000000000045a9e
> >       X11: ffff8000213bd800  X10: 0000000000000850   X9: ffff000008dc0000
> >        X8: 000000010000aa07   X7: 000000000000003d   X6: 0015752a00000000
> >        X5: 0100000000000000   X4: 00000000000001c0   X3: 0000000000000000
> >        X2: 0000000000000001   X1: 0000000000000000   X0: 0000000000000000
> >       ORIG_X0: 0000000000000000  SYSCALLNO: 7fffffffffffffff
> >    #4 [ffff000008dc3f10] arch_cpu_idle at ffff0000080857c0
> >    #5 [ffff000008dc3f20] cpu_startup_entry at ffff0000080f26cc
> >    #6 [ffff000008dc3f80] rest_init at ffff0000087c7930
> >    #7 [ffff000008dc3fa0] start_kernel at ffff000008b10b70
> 
> All you need to know is that CPU#0 was idle at the crash point.
> Whatever else do you want to know?
 
Things like:

(1) I want to know whether the active cpu was properly shut down by
    the kdump mechanism.
(2) I would like to actually "test" that the crash utility's transition from 
    the IRQ stack back to the process stack works as expected. (especially given
    the relative rarity of kernel crashes that actually occur while operating
    on IRQ stacks)
(3) There are several other dumping mechanisms that can be utilized on ARM64
    systems, so again, I'd like it to be obvious that this is kdump.
(4) When doing a filtered "foreach bt" it would be helpful to simply make it
    obvious that a task was an active, non-crashing, task -- with more information
    than a "hanging" exception frame dump.
(5) All of the other architectures do it.

... [ cut ] ...

> > But why do you think that it is an improvement to leave out the transition
> > to the IRQ stack?
> 
> I explained the reason in my previous e-mail.
> I still believe that it is an improvement.
 
OK, we're just going to have to agree to disagree.

> > In any case, I haven't even started looking at the "bt -f" part of the patch
> > because quite frankly, this patchset is so complex that I haven't even
> > begun to try to understand it.  When you said it would be "easier to
> > maintain", well, perhaps for *you* maybe, but certainly not for me!
> 
> Do you think so?
> It is because you are the author.

Maybe so, but I absolutely do think so.

> The changes I made were big and look complex to you,
> but the resulting code is simple enough for others IMO.
> 
> I tried to make the main loop of arm64_back_trace_cmd() simple and
> quite resemble the counterpart of dump_backtrace() in the kernel.

>From my perspective, the current arm64_back_trace_cmd() is far simpler,
and is essentially a backport of the kernel's dump_backtrace() code.

> The differences come mainly from the facts:
> 1.you (crash util) need to display not only a one-line frame summary
>   (with a value of stack/frame pointer), but also a full dump of stack

Right, but the current loop remains basically the same as the kernel's.

> 2.your current code will miss a *very important* stack frame that is
>   the one exactly when an interrupt takes place.
> 
> I believe that the latter is a big improvement.

I'm not following when you say it "will miss" a frame.  It's just following
the links established by the architecture.  If you line up the backtrace output
of the current vs. your patch, there are no missing frames, but rather you
essentially move the "el1_da()" stack frame entry to a lower stack address,
and your "bt -f" implementation doesn't dump the raw exception frame data.
 
> Yeah, due to the changes, arm64_unwind_frame() may get a bit complicated,
> but it is very naive implementation based on the nature of the kernel's
> stack usage (or PCS for ARM), the complexity is not the result of my poor
> skill of coding.
> In other words, the complexity is now *encapsulated* in that function.

I understand what you're saying, and certainly appreciate the depth of 
knowledge that you have, but quite frankly I don't find the changes
significant enough to reinvent the wheel. 

> In addition, I wrote down that function step-by-step, from generic cases
> to exceptional cases, with a bunch of comments for better understandings.
> And you will be able to remove such exceptional cases *if* you don't like
> them.
> 
> Given those stuffs, I think that my code is easier to maintain.
> 
> Again, surely up to you.
> 
> > Without the kernel's backtrace code from which the current code is based, I don't
> > have anything to work from anymore.  I'm really not sure whether the coverage of
> > the "corner cases" you referred to make this effort worth it.  I would like to
> > see examples of how the current code fails.  But anyway, I will continue to test it to
> > see if there actually is any significant upgrade from what we already have in
> > place.  From a kernel debugging perspective, all we really need is a basic unwinder,
> > and optionally the full dump of the stack data in between those frames.  And the
> > current code does do that at a minimum, and there's much to be said for
> > simplicity.
> >

... [ cut ] ...

> > Here is another thing that I would prefer not to change/omit.
> > 
> > In the current code, the raw exception frame data is dumped as
> > part of the "bt -[fF]" output, just prior to it being translated
> > as an exception frame:
> >   
> >   crash> bt -F
> >   PID: 1223   TASK: ffff800020ef5780  CPU: 3   COMMAND: "sh"
> >    ... [ cut ] ...
> >    #5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c
> >       ffff800020b0bb70: ffff800020b0bd40 el1_da+24
> >       ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063
> >       ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32
> >       ffff800020b0bba0: 0000000000000002 textbuf.34610
> >       ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128
> >       ffff800020b0bbc0: 000000000000000f 0000000100000000
> >       ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56
> >       ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063
> >       ffff800020b0bbf0: sysrq_crash_op   0000000000000009
> >       ffff800020b0bc00: 0000000000000000 0000000000000015
> >       ffff800020b0bc10: 0000000000000120 0000000000000040
> >       ffff800020b0bc20: 0000000000000001 0000000000000000
> >       ffff800020b0bc30: log_wait+8       0000000000000000
> >       ffff800020b0bc40: 0000000000000000 00000000000047d4
> >       ffff800020b0bc50: ffff800022f337a4 0000000000000000
> >       ffff800020b0bc60: 0000000000000106 0000000000000001
> >       ffff800020b0bc70: 0000000000000002 0000000000000106
> >       ffff800020b0bc80: log_buf_len      cont
> >       ffff800020b0bc90: 0000ffff83cc28f0 text.34829+13
> >       ffff800020b0bca0: sys_write        0000ffff83d266c0
> >       ffff800020b0bcb0: 0000000000000006 cpu_cgrp_subsys+152
> >       ffff800020b0bcc0: 0000000000000063 sysrq_crash_op
> >       ffff800020b0bcd0: 0000000000000009 0000000000000000
> >       ffff800020b0bce0: 0000000000000015 0000000000000120
> >       ffff800020b0bcf0: 0000000000000040 sys_call_table
> >       ffff800020b0bd00: ffff800020b08000 ffff800020b0bd40
> >       ffff800020b0bd10: sysrq_handle_crash+12 ffff800020b0bd40
> >       ffff800020b0bd20: sysrq_handle_crash+32 0000000060400149
> >       ffff800020b0bd30: cpu_cgrp_subsys+152 [kmalloc-1024]
> >    #6 [ffff800020b0bd40] el1_da at ffff000008084568
> >        PC: ffff000008457fc8  [sysrq_handle_crash+32]
> >        LR: ffff000008457fb4  [sysrq_handle_crash+12]
> >        SP: ffff800020b0bd40  PSTATE: 60400149
> >       X29: ffff800020b0bd40  X28: ffff800020b08000  X27: ffff0000087e2000
> >       X26: 0000000000000040  X25: 0000000000000120  X24: 0000000000000015
> >       X23: 0000000000000000  X22: 0000000000000009  X21: ffff000008e071b0
> >       X20: 0000000000000063  X19: ffff000008dda000  X18: 0000000000000006
> >       X17: 0000ffff83d266c0  X16: ffff0000081c68b8  X15: ffff000008e6cc95
> >       X14: 0000ffff83cc28f0  X13: ffff000008e6c758  X12: ffff000008dda7a0
> >       X11: 0000000000000106  X10: 0000000000000002   X9: 0000000000000001
> >        X8: 0000000000000106   X7: 0000000000000000   X6: ffff800022f337a4
> >        X5: 00000000000047d4   X4: 0000000000000000   X3: 0000000000000000
> >        X2: ffff000008dda7b8   X1: 0000000000000000   X0: 0000000000000001
> >       ORIG_X0: ffff000008dda000  SYSCALLNO: ffff80002104d418
> >   ...
> >   
> > whereas with the v5 patchset, the exception frame only gets translated,
> > but the actual raw memory never gets dumped:
> 
> I surely remember that you said that would not be an issue
> when I submitted older version, maybe v1 or v2.

Well, yes, that's true in the case of an exception frame generated
by the IPI interrupt (or any interrupt), where the eframe contents are
incidental to the interrupt's arrival, and not nearly as interesting
as an in-line kernel exception frame generated by a NULL pointer
reference, BUG(), etc., where the contents of the exception frame
are completely related to the problem/bug at hand.  And in that case,
the current code does dump the raw contents just before the register
translation.

> Do you think that those symbolic display are still useful
> though it is not quite easy to recognize which register has what value?

It's not a major item, but it certainly just dumps the raw memory
in between frame like anywhere else in the "bt -f" output.  Seems 
strange to just omit it there.

> 
> Even more, <ffff800020b0bb80-ffff800020b0bc10> is *not* a stack for
> do_mem_abort(). It is just wrong and will confuse people.
> So this is another example of improvement on my patches.

I'm not following -- your patch (as well as the current version) both 
show ffff800020b0bb80-ffff800020b0bc10 below do_mem_abort():

  ...
   #5 [ffff800020b0bb70] do_mem_abort at ffff000008081288
      ffff800020b0bb70: ffff800020b0bd40 el1_da+24
      ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063
      ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32
      ffff800020b0bba0: 0000000000000002 textbuf.34610
      ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128
      ffff800020b0bbc0: 000000000000000f 0000000100000000
      ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56
      ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063
      ffff800020b0bbf0: sysrq_crash_op   0000000000000009
      ffff800020b0bc00: 0000000000000000 0000000000000015
      ffff800020b0bc10: 0000000000000120 0000000000000040
   #6 [ffff800020b0bc20] el1_da at ffff000008084564
  --- <Exception in kernel> ---
       PC: ffff000008457fc8  [sysrq_handle_crash+32]
       LR: ffff000008457fb4  [sysrq_handle_crash+12]
       SP: ffff800020b0bd40  PSTATE: 60400149
  ...

The current code does the above, but dumps the raw exception frame
before printing a stack entry for el1_da():

 ...
 #5 [ffff800020b0bb70] do_mem_abort at ffff00000808128c
    ffff800020b0bb70: ffff800020b0bd40 el1_da+24
    ffff800020b0bb80: cpu_cgrp_subsys+152 0000000000000063
    ffff800020b0bb90: ffff800020b0bd40 sysrq_handle_crash+32
    ffff800020b0bba0: 0000000000000002 textbuf.34610
    ffff800020b0bbb0: ffff800020b0bbd0 kallsyms_token_index+43128
    ffff800020b0bbc0: 000000000000000f 0000000100000000
    ffff800020b0bbd0: ffff800020b0bc70 vprintk_default+56
    ffff800020b0bbe0: cpu_cgrp_subsys+152 0000000000000063
    ffff800020b0bbf0: sysrq_crash_op   0000000000000009
    ffff800020b0bc00: 0000000000000000 0000000000000015
    ffff800020b0bc10: 0000000000000120 0000000000000040
    ffff800020b0bc20: 0000000000000001 0000000000000000
    ffff800020b0bc30: log_wait+8       0000000000000000
    ffff800020b0bc40: 0000000000000000 00000000000047d4
    ffff800020b0bc50: ffff800022f337a4 0000000000000000
    ffff800020b0bc60: 0000000000000106 0000000000000001
    ffff800020b0bc70: 0000000000000002 0000000000000106
    ffff800020b0bc80: log_buf_len      cont
    ffff800020b0bc90: 0000ffff83cc28f0 text.34829+13
    ffff800020b0bca0: sys_write        0000ffff83d266c0
    ffff800020b0bcb0: 0000000000000006 cpu_cgrp_subsys+152
    ffff800020b0bcc0: 0000000000000063 sysrq_crash_op
    ffff800020b0bcd0: 0000000000000009 0000000000000000
    ffff800020b0bce0: 0000000000000015 0000000000000120
    ffff800020b0bcf0: 0000000000000040 sys_call_table
    ffff800020b0bd00: ffff800020b08000 ffff800020b0bd40
    ffff800020b0bd10: sysrq_handle_crash+12 ffff800020b0bd40
    ffff800020b0bd20: sysrq_handle_crash+32 0000000060400149
    ffff800020b0bd30: cpu_cgrp_subsys+152 [kmalloc-1024]
 #6 [ffff800020b0bd40] el1_da at ffff000008084568 
     PC: ffff000008457fc8  [sysrq_handle_crash+32]
     LR: ffff000008457fb4  [sysrq_handle_crash+12]
     SP: ffff800020b0bd40  PSTATE: 60400149
    ...

Yes, again I understand that you feel that the el1_da() frame
should be at ffff800020b0bc20, but on the other hand, the 
link register/pc under do_mem_abort() points to ffff800020b0bd40,
and I kind of prefer to show that.  I *know* you disagree.

... [ cut ] ...

> IMO I think that you'd better improve the output of
> arm64_print_exception_frame() for bt -F.
> And this is an totally independent issue from other parts of my patches.

I don't see where "bt -F" gets involved in arm64_print_exception_frame()?
Are you referring to this, which I agree is OK:

@@ -2130,10 +2343,11 @@ arm64_print_exception_frame(struct bt_info *bt, ulong pt_regs, int mode, FILE *o
        }

        if (is_64_bit) {
-               fprintf(ofp, "ORIG_X0: %016lx  SYSCALLNO: %lx",
-                       (ulong)regs->orig_x0, (ulong)regs->syscallno);
-               if (mode == USER_MODE)
+               if (mode == USER_MODE) {
+                       fprintf(ofp, "ORIG_X0: %016lx  SYSCALLNO: %lx",
+                               (ulong)regs->orig_x0, (ulong)regs->syscallno);
                        fprintf(ofp, "  PSTATE: %08lx", (ulong)regs->pstate);
+               }
                fprintf(ofp, "\n");
        }

Anyway, I'm going to be away for the next few days, and will revisit 
this all again next week.  

But here's a thought:

There already are "bt -o" and "bt -O" for x86 and x86_64, which allow
for the existence of coexisting but separate back trace methods.  
I'm thinking that we can keep the current code, while also introducing 
your patchset as an "optional" back trace method.  There would be
arm64_back_trace_cmd_v1() and arm64_back_trace_cmd_v2(), arm64_unwind_frame_v1()
and arm64_unwind_frame_v2(), and any other functions that can't comfortably 
be utilized by both methods.  And then "bt -o" would use the optional 
version, or the default method selection could be toggled with "bt -O".
 
And again, I hope we don't lose you.  During the KASLR negotiations, you've
already seen what an obstinate ass I can be.  But it's really part of my 
job as a gatekeeper.

Thanks,
  Dave


"Simplicity, simplicity, simplicity."
- Henry David Thoreau