[Crash-utility] Scripting infrastructure in crash

Thu May 3 13:14:49 UTC 2007

On May 3, 2007 02:27:02 am Sachin P. Sant wrote:
> > The same - I am mainly working on 'xportshow.py' - printing nicely
> > various tables/structures, mostly IPv6 aware. Routing tables, TCP/UDP/IP
> > connections analysis, ARP-cache, Netfilter, devpack and so on.
> >  
>
> So pykdump will be used for printing TCP/IP related stuff or can be used
> to extract other information as well ?

Here are some thoughts/ideas about scripted dump-analysis in general, based on 
my practical experience. It is mostly useful in two cases:

1. Working on a specific problem you need to repeat the same steps over and 
over again, or you need to print some things that cannot be done by 'crash' 
easily.

2. Your company provides Linux support and has the related infrastructure 
(Response Centres). It makes sense to develop automated 'first-pass' 
dump-analysis scripts to be used by 1st and 2nd level of support - this saves 
much time for 3rd level and labs.

Now for HPUX (I work in HP) we have a set of excellent automated programs for 
generic 1st pass and a set of programs to analyze other subsystems (NFS, FC, 
LAN, Streams and so on). These programs are extremely useful both for dumps 
and for live kernels. It is easier and safer to run these tools than to write 
DLKMs - especially if we need to gather data on production systems.

Why Python and not C-like interpreter (e.g. SIAL)? Most people agree that for 
rapid prototyping/design the high-level languages (Python, Ruby, Perl) 
typically make it possible to obtain results 5-10 times faster than in C. 
Python has excellent support libraries so if needed, we can plug into expert 
systems or do remote calls. Linux kernel is a moving target and frequent 
updates of internal logic are necessary. It is much easier to maintain this 
in Python than in C-like language as we can use OO-tricks and Python-specific 
tricks. For example, instead of directly using 'struct task_struct' we 
declare 'Task' class and create a dynamic attribute 'last_ran' that maps to 
different 'struct task_struct' members for different kernel versions. Then I 
can use task.last_ran on any kernel and the only place to change if needed 
will be 'class Task'.

As a result, I think that SIAL should be very good for case (1). Using C-like 
language makes it easier to map pieces of Linux kernel C-code to a script.

But in case (2) -  developing/maintaining complicated automated dump-analysis 
suites -  using Python should be better, especially for maintenance. OO 
really makes maintenance much easier when internal kernel structures change 
frequently.

The framework itself ('pykdump') is not networking-specific. But as my primary 
interests are in Linux TCP/IP implementation, I mainly develop programs for 
networking stuff. But I have already started to work on a general 1st-pass 
dumpanalysis program to be used by Response-Centre engineers. 

Regards,
Alex

-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: alexs at hplinux.canada.hp.com
Global Solutions Engineering:   Unix Networking
Hewlett-Packard (Canada)
------------------------------------------------------------------