[Crash-utility] Threaded crash tool? Is it time?

Fri Apr 12 13:52:33 UTC 2013

----- Original Message -----
> Machines are getting ever bigger. I routinely look at crash dumps from
> systems with 2TB or more of memory. I'm finding I'm wasting too much time
> waiting on crash to complete a command. For example "kmem -s" took close to
> an hour on the dump I'm looking at now.
>
> Has anyone ever looked into mutli-threading crash? Given the kmem -s example
> above, a thread could be created for each cache (up to some defined limit of
> threads).

Right, I've felt your pain...

But the problem with "kmem -s" on huge systems is that there are typically 
one or more absurdly large individual caches like the inode or buffer head
caches that consume most of the time.  And that's because the command has
to traverse linked lists containing hundreds of thousands of slab structures, 
verifying each one along the way.  In those extreme cases, it's almost worth
just setting aside a crash session window for the one command, and doing any
work in another.

I should also mention that there is an unadvertised workaround for "kmem -s" 
to skip/ignore one or more problematic caches.  For example:

  crash> kmem -s -I buffer_head,inode_cache
  CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
  ffff88020d648000 nf_conntrack_ffff88020e258000 304     0         0      0     8k
  ffff880209a14100 nfs_direct_cache         208          0         0      0     8k
  ffff880209a14000 nfs_write_data           904        323       544     16    32k
  ffff88020d2f9c00 nfs_inode_cache         1008      17622     17952    561    32k
  ... [ cut ] ...  
  ffff88021583cc00 bdev_cache               768        156       156      4    32k
  ffff88021583cb00 sysfs_dir_cache          112      24027     24048    668     4k
  ffff88021583ca00 inode_cache        [IGNORED]
  ffff88021583c900 dentry                   192     311189    315378  15018     4k
  ffff88021583c800 buffer_head        [IGNORED]
  ffff88021583c700 vm_area_struct           184      18105     20262    921     4k
  ffff88021583c600 mm_struct                880        447       540     15    32k
  ...

Maybe it's worth making that a publicized option?

> Things like "foreach" could spawn threads. I'm sure there are lots of other
> opportunities.

Maybe, but in the case of systems with thousands of threads, the time it
takes for just printing the data probably rivals the actual work done to produce
the data.

> Yes, I know, it's open source, I should just go do it myself. Still, I'd like
> to hear pro's and con's on this idea.

You got that right...   ;-)

Thanks,
  Dave