[Crash-utility] timer: invalid list entry: 1

Sat Mar 2 10:02:30 UTC 2013

I will post the bt -a output in next mail. 

as far as crashdump is concerned, this is how we take it.

basically, we just dump whole ram  (flat physical RAM), and then I have modified crash utlilty to convert ramdump (just plain ramdump) into arm elf32 format.
and so it could get recognized by any debugger as crash utility.
and it has been working great.

I have loaded so many ramdumps, and timer and any other command is working perfectly fine.
but only this scenario it has given such thing. where I suspected timer list corruption/crash utility problem.

Regards,
Oza.

________________________________
 From: Dave Anderson <anderson at redhat.com>
To: paawan oza <paawan1982 at yahoo.com> 
Cc: "Discussion list for crash utility usage, maintenance and development" <crash-utility at redhat.com> 
Sent: Friday, 1 March 2013 10:49 PM
Subject: Re: [Crash-utility]  timer: invalid list entry: 1

----- Original Message -----

> I would give some more info.
>
> It is dual core system.  (ARM)
> both core are stuck at wfi (wait for interrupt)
> and we observe that the timer counter has one much ahead than the comparators.
> so we never get a local timer interrupt, and nobody is there to wake the cpu up.
>
> so we observe the freeze.
>
> Regards,
> Oza.

I don't know much about the ARM architecture, and the only sample
SMP ARM dumpfile I have on hand shows the non-panicking cpu blocked
in default_idle().  So I don't understand how "wfi" would come
into play. 

What does "bt -a" show?

> 
> some more info:
> I am debugging crash utility with gdb, and getting following stack trace.
> 
> crash> timer
> TVEC_BASES[0]: c0a419c0
> JIFFIES
> 4297762
> EXPIRES TIMER_LIST FUNCTION
> 128 c1621ea8 c007260c <idle_worker_timeout>
> 30208 c0b81f04 c04e4244 <inet_frag_secret_rebuild>
> 30720 c0b7f264 c0461440 <flow_cache_new_hashrnd>
> 30840 dba2be04 c0068ebc <process_timeout>
> 38228 dbae5e04 c0068ebc <process_timeout>
> 11796480 c097cb64 c0010aa4 <sched_clock_poll>
> 4294937694 c0a6f118 c026f820 <rx_timeout_handler>
> 4294945658 c16238fc c007412c <delayed_work_timer_fn>
> 4294945667 d811be14 c0068ebc <process_timeout>
> 4294945700 c16237cc c007412c <delayed_work_timer_fn>
> 4294945700 c16236e0 c007412c <delayed_work_timer_fn>
> 4294946020 c0a1dcbc c007412c <delayed_work_timer_fn>
> 4294946029 dca8f884 c007412c <delayed_work_timer_fn>
> 4294946504 c0b871c4 c007412c <delayed_work_timer_fn>
> 4294950720 c0b81d6c c007412c <delayed_work_timer_fn>
> 
> Breakpoint 2, do_list (ld=0xff961c78) at tools.c:3507
> 3507 error(INFO, "\ninvalid list entry: %lx\n", next);
> (gdb) bt
> #0 do_list (ld=0xff961c78) at tools.c:3507
> #1 0x0811de03 in do_timer_list (vec_kvaddr=3699761524, size=256,
> vec=0x85c9f40, option=0x0, highest=0x0, tv=0xff962ec4) at
> kernel.c:6983
> #2 0x0811c9d3 in dump_timer_data_tvec_bases_v2 () at kernel.c:6678
> #3 0x0811afac in dump_timer_data () at kernel.c:6370
> #4 0x0811af8a in cmd_timer () at kernel.c:6329
> #5 0x080910a1 in exec_command () at main.c:818
> #6 0x08090ec7 in main_loop () at main.c:766
> #7 0x081bf35a in current_interp_command_loop ()
> #8 0x081bfbcf in captured_command_loop ()
> #9 0x081beddc in catch_errors ()
> #10 0x081c0a9a in captured_main ()
> #11 0x081beddc in catch_errors ()
> #12 0x081c0adc in gdb_main ()
> #13 0x081c0b29 in gdb_main_entry ()
> #14 0x08121590 in gdb_main_loop (argc=2, argv=0xff964014) at gdb_interface.c:76
> #15 0x08090c01 in main (argc=3, argv=0xff964014) at main.c:671
> 
> here exactly I hit invalid entry.

Right, I understand where the error message came from.

The crash utility's do_list() function is simply reporting what
it sees in the list_head-type linked list that it was following.

I have only seen these types of timer command errors in
vmcores that were generated with the "snap.so" extension
module, or when running the command on a live system.  
And both of those scenarios make perfect sense because the
underlying kernel was running/modifying the timer-related
data structures while the memory was being copied. 

Presuming that the crash was taken with kdump, you would
typically expect that the timer data structures would
be stable.

Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20130302/09f65b73/attachment.htm>