[Crash-utility] Re: [ANNOUNCE][RFC][PATCH] Crash-utility, tracing: enable crash to analyze tracing from core-file (make tracing can act as a flight recorder)
Dave Anderson
anderson at redhat.com
Thu Aug 6 15:36:02 UTC 2009
----- "Lai Jiangshan" <laijs at cn.fujitsu.com> wrote:
> Dave Anderson wrote:
> > Hello Lai,
> >
> > If ever there was a perfect candidate for a crash utility extension module,
> > this is it. This functionality is far too subsystem-specific to included as
> > a generic command. There has not been a "new" base crash command in many years.
> >
> > Reviewing the patch, the "trace" command can easily be created as an extension
> > module. The only things that need to be done are:
>
> Your suggest is very helpful. We accept it. We're doing it now.
> Thank you very much.
>
>
> >
> > (2) Put the "int nr_cpu_ids" variable into the ftrace.c extension
> > module, where you still will have access to the global "kt"
> > kernel_table pointer.
> >
>
> There is a bug in my box: crash can not recognize the real cpus number,
> kt->cpus is wrong. So I fix it and put nr_cpu_ids in the kernel_table.
> I'll sent a separate patch for it soon.
>
> In current linux kernel, nr_cpu_ids is recommended to be used instead
> of old NR_CPUS. Because CONFIG_NR_CPUS=4096, it's too big for a lot of
> systems.
>
> kmalloc(sizeof(struct foo) * NR_CPUS) ==> kmalloc(sizeof(struct foo) * nr_cpu_ids)
> for (i=0; i < NR_CPUS; i++) ==> for (i=0; i < nr_cpu_ids; i++)
>
> NR_CPUS is also 4096 in crash now, so I also suggest using nr_cpu_ids
> instead of NR_CPUS in crash's code when the symbol "nr_cpu_ids"
> exists.
I understand the problem with NR_CPUS usage in the kernel, but your
original patch did this:
+ if (symbol_exists("nr_cpu_ids"))
+ get_symbol_data("nr_cpu_ids", sizeof(int), &kt->nr_cpu_ids);
+ else
+ kt->nr_cpu_ids = 1;
+
+ if (kt->cpus < kt->nr_cpu_ids)
+ kt->cpus = kt->nr_cpu_ids;
+
As I understand it, the kernel's "nr_cpu_ids" is initialized to NR_CPUS,
and then later reduced to the number of "possible" cpus, neither of which
represent the number of online cpus.
The crash utility's "kt->cpus" is meant to reflect the number of actual
cpus that are online. It almost always is less than NR_CPUS and/or the
number of "possible" cpus -- only if the number of online cpus is actually
equal to the number of possible cpus would they ever be the same. So the
setting of "kt->cpus = kt->nr_cpu_ids" above cannot be the correct thing
to do.
Now, there may be another bug w/respect to your box such that the crash
utility cannot determine the number of cpus. That determination is done
differently by the supported processors -- I'd be interested in exactly
what the bug in your machine is.
Thanks,
Dave
More information about the Crash-utility
mailing list