[Crash-utility] Re: Question about fixing another crash annoyance...t
Dave Anderson
anderson at redhat.com
Tue Sep 29 15:19:49 UTC 2009
----- "Bob Montgomery" <bob.montgomery at hp.com> wrote:
> Dave,
>
> Please pardon the direct question, I'm attempting to cash in on my "dis
> -l" goodwill :-)
>
> The latest problem I'm working on:
>
> We occasionally get dumps that wake up in crash with:
>
> ...
> please wait... (gathering kmem slab cache data)
> crash-4.0.9-fix: page excluded: kernel virtual address:
> ffff88022457a000
> type: "kmem_cache_s buffer"
>
> crash-4.0.9-fix: unable to initialize kmem slab cache subsystem
> ...
>
> These are partial dumps with only kernel pages included.
>
> This problem comes about because readmem fails to read one
> of the kmem_cache structs in the list, for example:
>
> crash-4.0.9-fix> struct kmem_cache 0xffff880224579cc0
> struct kmem_cache struct: page excluded: kernel virtual address:
> ffff88022457a000 type: "gdb_readmem_callback"
> Cannot access memory at address 0xffff880224579cc0
>
> This struct starts toward the end of a page (0xffff880224579cc0)
> and extends into the next page (0xffff88022457a000) which has
> been excluded from the dump because it isn't a kernel page.
>
> That is pretty scary if I assume some bug in the kernel is
> giving pages back to user land that still hold parts of kernel
> structs. But that's not what's happening.
>
> crash-4.0.9-fix> struct -o kmem_cache
> struct kmem_cache {
> [0x0] struct array_cache *array[32];
> ...
> [0x158] struct list_head next;
> [0x168] struct kmem_list3 *nodelists[64];
> }
> SIZE: 0x368
>
> Crash thinks the struct is 0x368 in length, making the
> apparent end of the struct lie in the next page (...a000
> instead of ...9000)
>
> crash-4.0.9-fix> p/x 0xffff880224579cc0+0x368
> $3 = 0xffff88022457a028
>
> But the clever kernel folks did this in slab.c:
>
> /*
> * We put nodelists[] at the end of kmem_cache, because we want to size
> * this array to nr_node_ids slots instead of MAX_NUMNODES
> * (see kmem_cache_init())
> * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
> * is statically defined, so we reserve the max number of nodes.
> */
> struct kmem_list3 *nodelists[MAX_NUMNODES];
>
> So that means crash needs to curtail the read of kmem_cache
> to the actual size of the nodelists array, instead of the
> declared size.
>
> I still need to determine if the actual size is determined
> once for all instances, or per structure.
>
> This should affect partial dumps with kernels that use slab.c.
I never noticed that before -- the buffer_size of the global "cache_cache"
kmem_cache structure gets downsized here in kmem_cache_init() in 2.6.22
and later:
/*
* struct kmem_cache size depends on nr_node_ids, which
* can be less than MAX_NUMNODES.
*/
cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
nr_node_ids * sizeof(struct kmem_list3 *);
So the fix would be to first determine the cache_cache.buffer_size value,
and use that to initialize the size_table.kmem_cache_s value used by the
"SIZE(kmem_cache_s)" macro. Secondly, "vt->kmem_cache_len_nodes", which
is also based upon the same MAX_NUMNODES array index value, needs to be
downsized as well. It looks like if the kernel "nr_node_ids" exists as
symbol (instead of a #define), then it should be used.
> Any other structs in the kernel like this that crash already
> deals with?
None that I'm aware of...
Dave
More information about the Crash-utility
mailing list