[Crash-utility] Running idle threads show wrong CPU numbers

Dave Anderson anderson at redhat.com
Wed Feb 10 19:56:04 UTC 2010


On Wed, 2010-02-10 at 14:01 -0500, Dave Anderson wrote:
> ----- "Michael Holzheu" <holzheu at linux.vnet.ibm.com> wrote:
> 
> > > > It shows all swapper tasks (online and offline), but I get errors for
> > > > the backtrace for the offline CPUs.
> > > 
> > > What kind of errors?
> > 
> > The problem is that for the offline swapper tasks
> > s390x_get_stack_frame() is called. In that function I check with
> > s390x_has_cpu() if the task is currently running on a CPU. Because of
> > the missing CPU online check, s390x_has_cpu() returns TRUE. Therefore I
> > try to read the CPU registers from the lowcore of that CPU. The lowcore
> > pointer is zero, because the CPU is offline. Therefore the read stack
> > pointer (register 15) is wrong and the backtrace fails.
> > 
> > > > 
> > > > The attached patch would solve the problem (and eliminate most of the
> > > > probably redundant s390(x)_has_cpu() function.
> > > 
> > > I don't see what's being solved by the patch (not the s390x_get_smp_cpus
> > > parts) -- does the "old" s390x_has_cpu() fail?
> > 
> > The old s390x_has_cpu() returns TRUE for the offline swapper tasks.  And
> > I think that this is wrong.
> 
> Hmmm...  To me, it is TRUE, i.e., the existing-but-idle swapper task for 
> an offline cpu actually *does* own that cpu.  
> 
> And that's why I was wondering about what error message gets shown.
> 
> > 
> > The new implementation of s390x_has_cpu() should return TRUE if the task
> > is running on a online CPU and FALSE otherwise:
> > 
> > +       if (is_task_active(bt->task) && (kt->cpu_flags[cpu] & ONLINE))
> > +               return TRUE;
> > +       else
> > +               return FALSE;
> 
> This is probably OK, although I am slightly hesitant about throwing out all
> of the old backwards-compatibility code in the s390[x]_has_cpu() functions.

Why? The "is_task_active()" function must also work on all supported
kernel levels. Otherwise crash would probably fail in other s390
independent functions, wouldn't it? Of course, we could also keep my old
code and add the online check to the old code.

> I thought maybe it would be safer to leave well enough alone, and not
> worry about any error messages from backtraces of offline cpus.
> It might be even more useful that there are error messages to alert
> the user that the cpu is not online?

The following shows the output of "bt -a" without the patch:

PID: 0      TASK: 18d38340          CPU: 2   COMMAND: "swapper"
bt: invalid kernel virtual address: ffffffffffffc000  type:
"async_stack"

PID: 0      TASK: 18d40440          CPU: 3   COMMAND: "swapper"
bt: invalid kernel virtual address: ffffffffffffc000  type:
"async_stack"

We can't leave it like that. With my patch at least we get a correct
stack backtrace:

PID: 0      TASK: 18d38340          CPU: 2   COMMAND: "swapper"
 #0 [18d3feb8] ret_from_fork at 117e12

How is the output of a backtrace of offline CPUs on other architectures?

Michael







More information about the Crash-utility mailing list