[Crash-utility] crash aborts with cannot determine idle task
Dave Anderson
anderson at redhat.com
Wed Apr 2 16:00:44 UTC 2008
Chandru wrote:
>
>> Look at the crash function get_idle_threads() in task.c, which is where
>> you're failing. It runs through the history of the symbols that Linux
>> has used over the years for the run queues. For the most recent kernels,
>> it looks for the "per_cpu__runqueues" symbol. At least on 2.6.25-rc2,
>> the kernel still defines them in kernel/sched.c like this:
>>
>> static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
>>
>> So if you do an "nm -Bn vmlinux | grep runqueues", you should see:
>>
>> # nm -Bn vmlinux-2.6.25-rc1-ext4-1 | grep runqueues
>> ffffffff8082b700 d per_cpu__runqueues
>> #
>>
>> I'm guessing that's not the problem -- so presuming that the symbol
>> *does*
>> exist, find out why it's failing to increment "cnt" in this part of
>> get_idle_threads():
>>
>> if (symbol_exists("per_cpu__runqueues") &&
>> VALID_MEMBER(runqueue_idle)) {
>> runqbuf = GETBUF(SIZE(runqueue));
>> for (i = 0; i < nr_cpus; i++) {
>> if ((kt->flags & SMP) && (kt->flags &
>> PER_CPU_OFF)) {
>> runq =
>> symbol_value("per_cpu__runqueues") +
>> kt->__per_cpu_offset[i];
>> } else
>> runq =
>> symbol_value("per_cpu__runqueues");
>>
>> readmem(runq, KVADDR, runqbuf,
>> SIZE(runqueue), "runqueues entry
>> (per_cpu)",
>> FAULT_ON_ERROR);
>> tasklist[i] = ULONG(runqbuf +
>> OFFSET(runqueue_idle));
>> if (IS_KVADDR(tasklist[i]))
>> cnt++;
>> }
>> }
>>
>> Determine whether it even makes it to the inner for loop, whether
>> the pre-determined nr_cpus value makes sense, whether the SMP flag
>> reflects whether the kernel was compiled for SMP, whether the PER_CPU_OFF
>> flag was set, what address was calculated, etc...
>>
>> Dave
>>
> Thanks for the reply Dave. The code makes it to the inner for loop and
> the condition
> if (IS_KVADDR(tasklist[i])) fails which is why 'cnt' doesn't get
> incremented. The tasklist[i] somewhat has this value : 0x3d60657870722024.
>
> I ran gdb on the vmcore file and printed the memory contents .
>
> (gdb) print per_cpu__runqueues
> $1 = {lock = {raw_lock = {slock = 1431524419}}, nr_running =
> 5283422954284598606,
> raw_weighted_load = 5064663116585906736, cpu_load =
> {2316051155752670036, 5929356451801411872,
> 2613857225664584019}, nr_switches = 5644502509443686462,
> nr_uninterruptible = 2316072106569976142, expired_timestamp =
> 5142904381182533935,
> timestamp_last_tick = 7235439831918129227, curr = 0x5f66696c650a5243,
> idle = 0x3d60657870722024, <<<-----
> prev_mm = 0x5243202b20243f60, active = 0xa247b4155535443, expired =
> 0x5352434449527d2f,
>
>
> Does this mean that the kernel data was corrupted when vmcore was
> collected ?.
I don't know.
You cannot expect gdb to be able to handle it at all, unless
the kernel was configured without CONFIG_SMP. In that case,
the per_cpu__runqueues symbol points to the singular instance
of an rq.
However, more likely your kernel is configured with CONFIG_SMP.
In that case, a per-cpu offset has to be applied to the symbol
value of per_cpu__runqueues to calculate where each cpu's instance
of its rq structure is located. I can guarantee you that gdb
cannot do that, and that's probably why you're seeing "garbage"
data above.
So you can see that's what's happening in the get_idle_threads()
function where it's calculating the "runq" address each time
through the loop. If the kernel is configured CONFIG_SMP,
it adds the per-cpu offset value, otherwise it uses the
symbol value of "per_cpu__runqueues" as is.
As I suggested before, you're going to have to determine why
the tasklist[i] is bogus. The first things to determine are:
(1) what "nr_cpus" was calculated to be, and
(2) whether the SMP and PER_CPU_OFF flags are set in kt->flags.
If those variables/settings make sense, then presumably the
problem is in the determination of the per-cpu offset values.
That's done in a machine-specific way, so I can't help you
without knowing what architecture you're dealing with, not
to mention what kernel version, or whether it's configured
CONFIG_SMP or not, and whether you can run crash on the live
system that generated the dumpfile.
Dave
More information about the Crash-utility
mailing list