[Crash-utility] kmem -[sS] segfault on 2.6.25.17

Thu Oct 16 19:21:20 UTC 2008

On Thu, Oct 16, 2008 at 2:22 PM, Dave Anderson <anderson at redhat.com> wrote:
>
> ----- "Mike Snitzer" <snitzer at gmail.com> wrote:
>
>> On Thu, Oct 16, 2008 at 1:16 PM, Dave Anderson <anderson at redhat.com>
>> wrote:
>> >
>> > ----- "Dave Anderson" <anderson at redhat.com> wrote:
>> >> Ok, then I can't see off-hand why it would segfault.  Prior to
>> this
>> >> routine running, si->cpudata[0...i] all get allocated buffers
>> equal
>> >> to the size that's being BZERO'd.
>> >>
>> >> Is si->cpudata[i] NULL or something?
>>
>> (gdb) p si->cpudata
>> $1 = {0xa56400, 0xa56800, 0xa56c00, 0xa57000, 0x0 <repeats 252
>> times>}
>> (gdb) p si->cpudata[0]
>> $4 = (ulong *) 0xa56400
>
> OK, so if "i" is 0 at the time, then I don't understand how the
> BZERO/memset can segfault while zero'ing out memory starting at
> address 0xa56400?
>
>    BZERO(si->cpudata[i], sizeof(ulong) * vt->kmem_max_limit);
>
> Even if it over-ran the 0x400 bytes that's been allocated to
> si->cpuinfo[0], it would still harmlessly run into the buffer
> that was allocated for si->cpuinfo[1].  What's the bad address
> it's faulting on?

Frame 0 of crash's core shows:
(gdb) bt
#0  0x0000003b708773e0 in memset () from /lib64/libc.so.6

I'm not sure how to get the faulting address though?  Is it just
0x0000003b708773e0?

> And for sanity's sake, what is the crash utility's vm_table.kmem_max_limit
> equal to, and what architecture are you running on?

Architecture is x86_64.

kmem_max_limit=128, sizeof(ulong)=8; so the memset() should in fact be
zero'ing all 1024 (0x400) bytes that were allocated.

>> > Also, can you confirm that you are always using the exact vmlinux
>> > that is associated with each vmcore/live-system?  I mean you're
>> > not using a System.map command line argument, right?
>>
>> Yes, I'm using the exact vmlinux.  Not using any arguments for live
>> crash; I am for the vmcore runs but that seems needed given crash's
>> [mapfile] [namelist] [dumpfile] argument parsing.
>>
>> I use a redhat-style kernel rpm build process (with a more advanced
>> kernel .spec file); so I have debuginfo packages to match all my
>> kernels.
>
> OK cool -- so you know what you're doing.  ;-)

So the thing is; now when I run live crash on the 2.6.25.17 devel
kernel I no longer git a segfault!?  It still isn't happy but its at
least not segfaulting.. very odd.

I've not rebooted the system at all either... now when I run 'kmem -s'
in live crash I see:

CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
...
kmem: nfs_direct_cache: full list: slab: ffff810073503000  bad inuse counter: 5
kmem: nfs_direct_cache: full list: slab: ffff810073503000  bad inuse counter: 5
kmem: nfs_direct_cache: partial list: bad slab pointer: 88
kmem: nfs_direct_cache: full list: bad slab pointer: 98
kmem: nfs_direct_cache: free list: bad slab pointer: a8
kmem: nfs_direct_cache: partial list: bad slab pointer: 9f911029d74e35b
kmem: nfs_direct_cache: full list: bad slab pointer: 6b6b6b6b6b6b6b6b
kmem: nfs_direct_cache: free list: bad slab pointer: 6b6b6b6b6b6b6b6b
kmem: nfs_direct_cache: partial list: bad slab pointer: 100000001
kmem: nfs_direct_cache: full list: bad slab pointer: 100000011
kmem: nfs_direct_cache: free list: bad slab pointer: 100000021
ffff810073501600 nfs_direct_cache         192          2        40      2     4k
...
kmem: nfs_write_data: partial list: bad slab pointer: 65676e61725f32
kmem: nfs_write_data: full list: bad slab pointer: 65676e61725f42
kmem: nfs_write_data: free list: bad slab pointer: 65676e61725f52
kmem: nfs_write_data: partial list: bad slab pointer: 74736f705f73666e
kmem: nfs_write_data: full list: bad slab pointer: 74736f705f73667e
kmem: nfs_write_data: free list: bad slab pointer: 74736f705f73668e
ffff81007350a5c0 nfs_write_data           760         36        40      8     4k
...
etc.

But if I run crash against the vmcore I do get the segfault...

> BTW, if need be, would you be able to make the vmlinux/vmcore pair
> available for download somewhere?  (You can contact me off-list with
> the particulars...)

I can work to make that happen if needed...

Mike