[Crash-utility] [PATCH] ppc64: still allow to move on if the emergency stacks info fails to initialize

lijiang lijiang at redhat.com
Sat Oct 8 03:26:36 UTC 2022


On Fri, Oct 7, 2022 at 9:57 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com>
wrote:

>
> On 2022/10/06 9:42, lijiang wrote:
> > Hi, Kazu
> > Thank you for the comment.
> > On Wed, Oct 5, 2022 at 5:11 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com
> >
> > wrote:
> >
> >> Hi Lianbo,
> >>
> >> On 2022/10/05 11:22, Lianbo Jiang wrote:
> >>> Currently crash will fail and then exit, if the initialization of
> >>> the emergency stacks information fails. In real customer environments,
> >>> sometimes, a vmcore may be partially damaged, although such vmcores
> >>> are rare. For example:
> >>>
> >>>     # ./crash ../3.10.0-1127.18.2.el7.ppc64le/vmcore
> >> ../3.10.0-1127.18.2.el7.ppc64le/vmlinux  -s
> >>>     crash: invalid kernel virtual address: 38  type:
> "paca->emergency_sp"
> >>>     #
> >>>
> >>> Lets try to keep loading vmcore if such issues happen, so call
> >>> the readmem() with the RETURN_ON_ERROR instead of FAULT_ON_ERROR,
> >>> which allows the crash move on.
> >>
> >> Just to confirm, can I have the error messages printed with the patch
> >> for the vmcore?
> >>
> >
> > Yes. Crash has the following error messages printed after applying this
> > patch:
> >
> > # ./crash ../vmcore ../vmlinux -s
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: fc4  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 38  type: "paca->emergency_sp"
> > crash: invalid kernel virtual address: 1000138  type:
> "paca->emergency_sp"
> > crash: invalid kernel virtual address: 100000040  type:
> "paca->emergency_sp"
> > ...
>
> Thanks, I see.  These errors can be printed for the number of CPUs,
> but better than nothing.
>
> Looking at the users of ms->*emergency_sp, when the errors occur,
>
> 1. ppc64_in_emergency_stack() proceed with the initialized value 0,
> and the following condition will happen to fail:
>
>
Good understanding, Kazu.


>    if (addr >= base && addr < top) {
>
> but I would like to add this to check it in advance:
>
> --- a/ppc64.c
> +++ b/ppc64.c
> @@ -1947,7 +1947,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool
> verbose)
>          if (cpu < 0  || cpu >= kt->cpus)
>                  return NONE_STACK;
>
> -       if (ms->emergency_sp) {
> +       if (ms->emergency_sp && ms->emergency_sp[cpu]) {
>

Is it better to use the IS_KVADDR(ms->emergency_sp[cpu]) ? For example:

+       if (ms->emergency_sp && IS_KVADDR(ms->emergency_sp[cpu]) ) {

Thanks.
Lianbo

                 top = ms->emergency_sp[cpu];
>                  base =  top - STACKSIZE();
>                  if (addr >= base && addr < top) {
> @@ -1957,7 +1957,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool
> verbose)
>                  }
>          }
>
> -       if (ms->nmi_emergency_sp) {
> +       if (ms->nmi_emergency_sp && ms->nmi_emergency_sp[cpu]) {
>                  top = ms->nmi_emergency_sp[cpu];
>                  base = top - STACKSIZE();
>                  if (addr >= base && addr < top) {
> @@ -1967,7 +1967,7 @@ ppc64_in_emergency_stack(int cpu, ulong addr, bool
> verbose)
>                  }
>          }
>
> -       if (ms->mc_emergency_sp) {
> +       if (ms->mc_emergency_sp && ms->mc_emergency_sp[cpu]) {
>                  top = ms->mc_emergency_sp[cpu];
>                  base =  top - STACKSIZE();
>                  if (addr >= base && addr < top) {
>
>
> 2. ppc64_set_bt_emergency_stack() is called after
> ppc64_in_emergency_stack()
> returning other than NONE_STACK, so it should have a valid value.  No need
> to check it again.
>
> So, with the additional change above,
>
> Acked-by: Kazuhito Hagio <k-hagio-ab at nec.com>
>
> Lianbo, if the above is ok for you, please apply with it.
>
> Thanks,
> Kazu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20221008/1e8222be/attachment.htm>


More information about the Crash-utility mailing list