[Crash-utility] kmem -[sS] segfault on 2.6.25.17
Mike Snitzer
snitzer at gmail.com
Thu Oct 16 20:37:44 UTC 2008
On Thu, Oct 16, 2008 at 3:54 PM, Dave Anderson <anderson at redhat.com> wrote:
>
> ----- "Mike Snitzer" <snitzer at gmail.com> wrote:
>
>> Frame 0 of crash's core shows:
>> (gdb) bt
>> #0 0x0000003b708773e0 in memset () from /lib64/libc.so.6
>>
>> I'm not sure how to get the faulting address though? Is it just
>> 0x0000003b708773e0?
>
> No, that's the text address in memset(). If you "disass memset",
> I believe that you'll see that the address above is dereferencing
> the rcx register/pointer. So then, if you enter "info registers",
> you'll get a register dump, and rcx would be the failing address.
OK.
0x0000003b708773e0 <memset+192>: movnti %r8,(%rcx)
(gdb) info registers
...
rcx 0xa7b000 10989568
(gdb) x/x 0xa7b000
0xa7b000: Cannot access memory at address 0xa7b000
>> I've not rebooted the system at all either... now when I run 'kmem
>> -s'
>> in live crash I see:
>>
>> CACHE NAME OBJSIZE ALLOCATED TOTAL
>> SLABS SSIZE
>> ...
>> kmem: nfs_direct_cache: full list: slab: ffff810073503000 bad inuse
>> counter: 5
>> kmem: nfs_direct_cache: full list: slab: ffff810073503000 bad inuse
>> counter: 5
>> kmem: nfs_direct_cache: partial list: bad slab pointer: 88
>> kmem: nfs_direct_cache: full list: bad slab pointer: 98
>> kmem: nfs_direct_cache: free list: bad slab pointer: a8
>> kmem: nfs_direct_cache: partial list: bad slab pointer:
>> 9f911029d74e35b
>> kmem: nfs_direct_cache: full list: bad slab pointer: 6b6b6b6b6b6b6b6b
>> kmem: nfs_direct_cache: free list: bad slab pointer: 6b6b6b6b6b6b6b6b
>> kmem: nfs_direct_cache: partial list: bad slab pointer: 100000001
>> kmem: nfs_direct_cache: full list: bad slab pointer: 100000011
>> kmem: nfs_direct_cache: free list: bad slab pointer: 100000021
>> ffff810073501600 nfs_direct_cache 192 2 40
>> 2 4k
>> ...
> Are those warnings happening on *every* slab type? When you run on a
> live system, the "shifting sands" of the kernel underneath the crash
> utility can cause errors like the above. But at least some/most of
> the other slabs' infrastructure should remain stable while the command
> runs.
Ah makes sense, yes many of them do remain stable:
kmem: request_sock_TCPv6: full list: bad slab pointer: 79730070756b6f7f
kmem: request_sock_TCPv6: free list: bad slab pointer: 79730070756b6f8f
ffff810079199240 request_sock_TCPv6 160 0 0 0 4k
ffff81007919a200 TCPv6 1896 3 4 2 4k
ffff81007dcb41c0 dm_mpath_io 64 0 0 0 4k
...
ffff81007d9ce580 sgpool-8 280 2 42 3 4k
ffff81007d9cf540 scsi_bidi_sdb 48 0 0 0 4k
ffff81007d98b500 scsi_io_context 136 0 0 0 4k
ffff81007d95e4c0 ext3_inode_cache 992 38553 38712 9678 4k
ffff81007d960480 ext3_xattr 112 68 102 3 4k
etc
>> But if I run crash against the vmcore I do get the segfault...
>>
>
> When you run it on the vmcore, do you get the segfault immediately?
> Or do some slabs display their stats OK, but then when it deals with
> one particular slab it generates the segfault?
>
> I mean that it's possible that the target slab was in transition
> at the time of the crash, in which case you might see some error
> messages like you see on the live system. But it is difficult to
> explain why it's dying specifically where it is, even if the slab
> was in transition.
>
> That all being said, even if the slab was in transition, obviously
> the crash utility should be able to handle it more gracefully...
None of the slabs display their stats OK, crash segfaults immediately.
>> > BTW, if need be, would you be able to make the vmlinux/vmcore pair
>> > available for download somewhere? (You can contact me off-list
>> with
>> > the particulars...)
>>
>> I can work to make that happen if needed...
>
> FYI, I did try our RHEL5 "debug" kernel (2.6.18 + hellofalotofpatches),
> which has both CONFIG_DEBUG_SLAB and CONFIG_DEBUG_SLAB_LEAK turned on,
> but I don't see the problem. So unless something obvious can be
> determined, that may be the only way I can help.
Interesting. OK, I'll work to upload them somewhere and I'll send you
a pointer off-list.
Thanks!
Mike
More information about the Crash-utility
mailing list