[Crash-utility] Re: Re: Re: crash and sles 9 dumps (Dave Anderson)

Dave Anderson anderson at redhat.com
Mon Sep 10 18:46:52 UTC 2007


Dave Anderson wrote:
> Daniel Li wrote:
> 
>> Dave Anderson wrote:
>>
>>> Daniel Li wrote:
>>>
>>>> It seems the problem is not one with guest dump, but the version of 
>>>> SLES.
>>>>
>>>> After upgrading my NATIVE SLES 9 system to SP 3, exactly the same 
>>>> problem happened while trying to use 'crash' on the live system, 
>>>> with a debug linux kernel ('vmlinux.dbg' below) built on the same 
>>>> system from matching 'kernel-source' package. (During this upgrade, 
>>>> the linux kernel changed from 2.6.5-7.97-smp to 2.6.5-7.244-smp, the 
>>>> same as that on the guest.)
>>>>
>>>> Has anyone else seen this?
>>>
>>>
>>>
>>> Did anything change in the task_struct between 2.6.5-7.97-smp and
>>> 2.6.5-7.244-smp?
>>>
>>> Or, more likely, anything associated with the pidhash/pid_hash-related
>>> code in the kernel?
>>>
>>> Is the output of the crash command "help -t | grep refresh_task_table"
>>> different when running against 2.6.5-7.97-smp vs. 2.6.5-7.244-smp?
>>>
>>> Dave
>>>
>> The definition of task_struct between 2.6.5-7.97-smp and 
>> 2.6.5-7.244-smp did change. There is one new 8-bytes field called 
>> 'last_ran' before the list_head for tasks. This is what I don't get: 
>> why should it matter as long as the dump and debug kernel are using 
>> the same definition?
>>
> 
> It shouldn't.
> 
> Does the output of "help -o task_struct" on the .97 vs the .244 kernels
> reflect the member offset differences as you would expect?  I.e., 
> everything
> (that's not -1) coming after the new last_ran member is bumped up by 8?
> 
> And are you sure there's nothing different w/respect to the pid_hash
> declarations/usage?
> 
> Dave
 >
 >> The output of "help -t | grep refresh_task_table" didn't change.

The reason I ask about any pid_hash-related changes is because
over the years the manner of task table handling by the crash
utility has had to change to deal with the kernel changes.
The crash-internal tt->refresh_task_table function pointer
that you see in the "help -t" output gets set during task_init()
to one of these functions:

   static void refresh_fixed_task_table(void);
   static void refresh_unlimited_task_table(void);
   static void refresh_pidhash_task_table(void);
   static void refresh_pid_hash_task_table(void);
   static void refresh_hlist_task_table(void);
   static void refresh_hlist_task_table_v2(void);

with later kernels requiring the later function in the list above.

For a 2.6.5 vintage kernel, I'm guessing that when you did
the "help -t" it showed "refresh_pid_hash_task_table()"?

Anyway, in the two kernels that you are comparing, how is the
"pid_hash" variable declared in the kernel sources?  With
respect to the crash-internal setting of tt->refresh_task_table,
it should line up like so:

kernel:  static struct list_head pid_hash[PIDTYPE_MAX][PIDHASH_SIZE];
  crash:  refresh_pid_hash_task_table()

kernel: static struct hlist_head *pid_hash[PIDTYPE_MAX];
  crash: refresh_hlist_task_table()

kernel: static struct hlist_head *pid_hash;
  crash: refresh_hlist_task_table_v2()

For whatever reason it almost looks like the task-gathering is
using the wrong function, or maybe given back-ports and such,
the SUSE kernel task-handling is now a "hybrid" that would need
its own task-gathering function in the crash utility.

With respect to the "last_ran" addition, you could always rebuild
a kernel with that field moved to the end of the task_struct,
run that kernel, and see what happens.  If the "ps" task output
is still screwed up, then it should rule that out as the problem
at hand.

Dave










More information about the Crash-utility mailing list