[Crash-utility] Re: [ANNOUNCE][RFC][PATCH] Crash-utility, tracing: enable crash to analyze tracing from core-file (make tracing can act as a flight recorder)

Dave Anderson anderson at redhat.com
Thu Aug 6 15:36:02 UTC 2009


----- "Lai Jiangshan" <laijs at cn.fujitsu.com> wrote:

> Dave Anderson wrote:
> > Hello Lai,
> >
> > If ever there was a perfect candidate for a crash utility extension module,
> > this is it.  This functionality is far too subsystem-specific to included as
> > a generic command.  There has not been a "new" base crash command in many years.
> >
> > Reviewing the patch, the "trace" command can easily be created as an extension
> > module.  The only things that need to be done are:
>
> Your suggest is very helpful. We accept it. We're doing it now.
> Thank you very much.
>
>
> >  
> >   (2) Put the "int nr_cpu_ids" variable into the ftrace.c extension
> >       module, where you still will have access to the global "kt"
> >       kernel_table pointer.
> >
>
> There is a bug in my box: crash can not recognize the real cpus number,
> kt->cpus is wrong. So I fix it and put nr_cpu_ids in the kernel_table.
> I'll sent a separate patch for it soon.
>
> In current linux kernel, nr_cpu_ids is recommended to be used instead
> of old NR_CPUS. Because CONFIG_NR_CPUS=4096, it's too big for a lot of
> systems.
>
> kmalloc(sizeof(struct foo) * NR_CPUS) ==> kmalloc(sizeof(struct foo) * nr_cpu_ids)
> for (i=0; i < NR_CPUS; i++) ==> for (i=0; i < nr_cpu_ids; i++)
>
> NR_CPUS is also 4096 in crash now, so I also suggest using nr_cpu_ids
> instead of NR_CPUS in crash's code when the symbol "nr_cpu_ids"
> exists.

I understand the problem with NR_CPUS usage in the kernel, but your
original patch did this:

+       if (symbol_exists("nr_cpu_ids"))
+               get_symbol_data("nr_cpu_ids", sizeof(int), &kt->nr_cpu_ids);
+       else
+               kt->nr_cpu_ids = 1;
+
+       if (kt->cpus < kt->nr_cpu_ids)
+               kt->cpus = kt->nr_cpu_ids;
+

As I understand it, the kernel's "nr_cpu_ids" is initialized to NR_CPUS,
and then later reduced to the number of "possible" cpus, neither of which
represent the number of online cpus.

The crash utility's "kt->cpus" is meant to reflect the number of actual
cpus that are online.  It almost always is less than NR_CPUS and/or the
number of "possible" cpus -- only if the number of online cpus is actually
equal to the number of possible cpus would they ever be the same.  So the
setting of "kt->cpus = kt->nr_cpu_ids" above cannot be the correct thing
to do.

Now, there may be another bug w/respect to your box such that the crash
utility cannot determine the number of cpus.  That determination is done
differently by the supported processors -- I'd be interested in exactly
what the bug in your machine is.

Thanks,
  Dave




More information about the Crash-utility mailing list