[Crash-utility] [PATCH]: nr_node_ids

Fri Sep 7 19:41:15 UTC 2012

On Fri, Sep 7, 2012 at 8:57 PM, Dave Anderson <anderson at redhat.com> wrote:
>
>
> ----- Original Message -----
>> On Fri, Sep 7, 2012 at 4:28 PM, Dave Anderson <anderson at redhat.com>
>> wrote:
>> >
>> >
>> > ----- Original Message -----
>> >> Hi all,
>> >>
>> >> I'm wondering about the use of the kernel 'nr_node_ids' variable in
>> >> memory.c. In kmem_cache_downsize(), vt->kmem_cache_len_nodes defaults
>> >> to 1 when 'nr_node_ids' isn't present. But in vm_init() an error
>> >> message is printed in the same case. The reason I'm asking is that I'm
>> >> getting that error
>> >>
>> >>   "unable to initialize kmem slab cache subsystem"
>> >>
>> >> on a 3.4 kernel. Having vm_init() default to
>> >>
>> >>   vt->kmem_cache_len_nodes=1
>> >>
>> >> as well seems to bring up the slab subsystem, although I'm getting a
>> >> couple of
>> >>
>> >>   "kmem: vm_area_struct: full list: slab: <nn1>  bad next pointer: <nn2>"
>> >>
>> >> mixed into my kmem -S output. I have no idea if it's related.
>> >
>> > Hi Per,
>>
>> Hello =o)
>>
>> >
>> > I don't have any recent sample kernels that have the configuration that your
>> > kernel is running, so I can't confidently answer/test this.  I presume that
>> > your kernel does not configure CONFIG_NODES_SHIFT (or set it to 0), so
>> > that nr_node_ids becomes a #define instead of a variable.  And to get it
>>
>> Indeed, that's exactly what happened.
>>
>> > to work, I'm also presuming that you changed the "else" clause in vm_init()
>> > to something like this:
>> >
>> >                if (MEMBER_TYPE("kmem_cache", "nodelists") == TYPE_CODE_PTR) {
>> >                         int nr_node_ids;
>> >                         /*
>> >                          * nodelists now a pointer to an outside array
>> >                          */
>> >                         vt->flags |= NODELISTS_IS_PTR;
>> >                         if (kernel_symbol_exists("nr_node_ids")) {
>> >                                 get_symbol_data("nr_node_ids", sizeof(int),
>> >                                         &nr_node_ids);
>> >                                 vt->kmem_cache_len_nodes = nr_node_ids;
>> >                         } else {
>> > -                               error(INFO, "nr_node_ids: symbol does not exist\n");
>> > -                               error(INFO, "unable to initialize kmem slab cache subsystem\n\n");
>> > -                               vt->flags |= KMEM_CACHE_UNAVAIL;
>> > +                               vt->kmem_cache_len_nodes = 1;
>> >                         }
>>
>> Again, indeed, that's more or less to the character what I changed it to.
>>
>> >
>> > That looks reasonable to me.
>> >
>>
>> Ok, because that was the main purpose of my first mail, understanding
>> whether there was a reason why the 'nr_node_ids'-has-been-turned-into-a-macro-case
>> was treated as an error in this context. So, you agree we could change it?
>
> Yep -- it's queued for crash-6.1.0.
>
>>
>> > As far as the "kmem -S" output, are you running it on a live system?
>> >
>>
>> Nope, dead as a doornail. Are these messages to be expected then?
>
> Not really.  You could follow the vm_area_struct's full-list in question
> and verify that something's out of whack, starting from the (single)
> kmem_cache->nodelists.slab_full linked list.  The list should either
> point back to itself (empty) or be a simple list_head linked list,
> that leads to a slab with a next value of "nn2".  Although, it would
> also be interesting to know what the "nn2" value was?  In other
> words, was it a bogus address entirely, or a maybe an address in
> a page that wasn't capture in the dump?  (which shouldn't happen...)
>
> It's here in verify_slab_v2():
>
>         list_head = (struct kernel_list_head *)(slab_buf + OFFSET(slab_list));
>         if (!IS_KVADDR((ulong)list_head->next) ||
>             !accessible((ulong)list_head->next)) {
>                 error(INFO, "%s: %s list: slab: %lx  bad next pointer: %lx\n",
>                         si->curname, list, si->slab, (ulong)list_head->next);
>                 errcnt++;
>         }
>

It certainly seems completely unrelated to the nr_node_ids question.
I'm guessing it's to do with the state of my dump, which isn't
accessible to me until after the weekend. In the unlikely event that
the fault's in Crash (see what what I did there?) I'm sure I'll be
back.

/Per

>> Oh, and sorry for putting "[PATCH]" in the title when there wasn't
>> one. It was by accident.
>>
>> /Per
>
> No problem...
>
> Thanks,
>   Dave
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility