[Crash-utility] WARNING: malloc/free mismatch

Thu Apr 18 20:30:54 UTC 2019

On Thu, 2019-04-18 at 14:05 -0400, Dave Anderson wrote:
> 
> ----- Original Message -----
> > On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote:
> > > 
> > > ----- Original Message -----
> > > > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote:
> > > > > Hello,
> > > > > 
> > > > > Today while I was looking into a vmcore, I got suddenly the
> > > > > message
> > > > > in $SUBJECT.
> > > > > It started after I did a search into the process stack pages
> > > > > (search
> > > > > -t)
> > > > > and for each command I run afterwards I kept getting that
> > > > > message.
> > > > > For example:
> > > > > 
> > > > > $ retrace-server-interact 603967269 crash
> > > > > ...
> > > > > crash> search -t ffff88040a0d5280
> > > > > 
> > > > > search: invalid list entry: 0
> > > > > 
> > > > > search: invalid list entry: 0
> > > > > 
> > > > > search: invalid list entry: 0
> > > > > PID: 606    TASK: ffff88082d226eb0  CPU: 5   COMMAND:
> > > > > "xfsaild/dm-0"
> > > > > ffff88083ff5b948: ffff88040a0d5280
> > > > > ffff88083ff5b990: ffff88040a0d5280
> > > > > ffff88083ff5baa8: ffff88040a0d5280
> > > > > ffff88083ff5baf0: ffff88040a0d5280
> > > > > ffff88083ff5bcf0: ffff88040a0d5280
> > > > > ffff88083ff5bd38: ffff88040a0d5280
> > > > > ffff88083ff5bd98: ffff88040a0d5280
> > > > > 
> > > > > 
> > > > > WARNING: malloc/free mismatch (29/32)
> > > > > 
> > > > > crash> ps -m | grep UN
> > > > > [ 0 00:00:00.146] [UN]  PID: 1811   TASK:
> > > > > ffff880c17bd1fa0  CPU:
> > > > > 1   COMMAND: "cp"
> > > > > WARNING: malloc/free mismatch (29/32)
> > > > > 
> > > > > I guess this comes from a possible corrupted vmcore (I just
> > > > > got
> > > > > it
> > > > > from this vmcore),
> > > > > but I wonder why every new command keeps returning the same
> > > > > message.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Pier
> > > > > 
> > > > > --
> > > > > Crash-utility mailing list
> > > > > Crash-utility at redhat.com
> > > > > https://www.redhat.com/mailman/listinfo/crash-utility
> > > > 
> > > > FWIW, I just pulled this up after plambri pinged me.  This is
> > > > the
> > > > backtrace that is being hit though I've not dug in more:
> > > > 
> > > > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > 3820                                    error(INFO, "\ninvalid
> > > > list
> > > > entry:
> > > > 0\n");
> > > > (gdb) list
> > > > 3815                            return -1;
> > > > 3816                    }
> > > > 3817
> > > > 3818                    if (next == 0) {
> > > > 3819                            if (ld->flags &
> > > > LIST_HEAD_FORMAT) {
> > > > 3820                                    error(INFO, "\ninvalid
> > > > list
> > > > entry:
> > > > 0\n");
> > > > 3821                                    if (close_hq_on_return)
> > > > 3822                                            hq_close();
> > > > 3823                                    return -1;
> > > > 3824                            }
> > > > (gdb) bt
> > > > #0  do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > #1  0x000000000047ec82 in dump_vmap_area (vi=0x7ffffffed0d0) at
> > > > memory.c:8724
> > > > #2  dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590
> > > > #3  0x000000000047f3eb in last_vmalloc_address () at
> > > > memory.c:16792
> > > > #4  0x0000000000515e6b in x86_64_get_kvaddr_ranges
> > > > (vrp=0x7fffffffd340) at x86_64.c:8706
> > > > #5  0x000000000049c6ae in cmd_search () at memory.c:13988
> > > > #6  0x0000000000465f9c in exec_command () at main.c:879
> > > > #7  0x00000000004661ca in main_loop () at main.c:826
> > > > #8  0x00000000006b21a3 in captured_command_loop (data=<value
> > > > optimized out>) t main.c:258
> > > > #9  0x00000000006b0a8b in catch_errors (func=0x6b2190
> > > > <captured_command_loop>, func_args=0x0, errstring=0x90c106 "",
> > > > mask=6) at exceptions.c:557
> > > > #10 0x00000000006b3076 in captured_main (data=<value optimized
> > > > out>) at main.c:1064
> > > > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0
> > > > <captured_main>, func_args=0x7fffffffe2e0, errstring=0x90c106
> > > > "",
> > > > mask=6) at exceptions.c:557
> > > > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized out>)
> > > > at
> > > > main.c:1079
> > > > #13 0x00000000006b1fde in gdb_main_entry (argc=<value optimized
> > > > out>, argv=<value optimized out>) at main.c:1099
> > > > #14 0x0000000000467030 in main (argc=3, argv=0x7fffffffe458) at
> > > > main.c:707
> > > 
> > > Hmmm, the vmap_area list is a list_head type list, so there
> > > should
> > > never be
> > > a NULL "next" pointer.
> > > 
> > > I'm guessing that "kmem -v" also fails?  The last vmap_area entry
> > > should point back to
> > > the global "vmap_area_list" list header, for example:
> > > 
> > >   crash> kmem -v | tail
> > >   ffff96e7ecaaca80  ffff96e54c89c400  ffffffffc0e54000 -
> > > ffffffffc0e5a000    24576
> > >   ffff96e757ffe380  ffff96e4be98f3c0  ffffffffc0e5d000 -
> > > ffffffffc0e6d000    65536
> > >   ffff96e467b33400  ffff96e6a3ae1a00  ffffffffc0e6d000 -
> > > ffffffffc0e73000    24576
> > >   ffff96e85cf4e600  ffff96e752c52b40  ffffffffc0e77000 -
> > > ffffffffc0e7c000    20480
> > >   ffff96e85cf4e380  ffff96e5506c6c00  ffffffffc0e7c000 -
> > > ffffffffc0e81000    20480
> > >   ffff96e802baa500  ffff96e5506c69c0  ffffffffc0e81000 -
> > > ffffffffc0e86000    20480
> > >   ffff96e802baac00  ffff96e5506c6cc0  ffffffffc0e86000 -
> > > ffffffffc0e8c000    24576
> > >   ffff96e574196f80  ffff96e55ffd6c80  ffffffffc0e90000 -
> > > ffffffffc0e95000    20480
> > >   ffff96e574196680  ffff96e55ffd6880  ffffffffc0e95000 -
> > > ffffffffc0e9a000    20480
> > >   ffff96e87c222800  ffff96e5496ca680  ffffffffc0e9a000 -
> > > ffffffffc0ea4000    40960
> > >   crash> vmap_area ffff96e87c222800
> > >   struct vmap_area {
> > >     va_start = 18446744072651120640,
> > >     va_end = 18446744072651161600,
> > >     flags = 4,
> > >     rb_node = {
> > >       __rb_parent_color = 18446628510972342169,
> > >       rb_right = 0x0,
> > >       rb_left = 0xffff96e574196698
> > >     },
> > >     list = {
> > >       next = 0xffffffffae69af90,
> > >       prev = 0xffff96e5741966b0
> > >     },
> > >     purge_list = {
> > >       next = 0x0,
> > >       prev = 0xdead000000000200
> > >     },
> > >     vm = 0xffff96e5496ca680,
> > >     callback_head = {
> > >       next = 0x0,
> > >       func = 0xffff96e71d51aa00
> > >     }
> > >   }
> > >   crash> sym 0xffffffffae69af90
> > >   ffffffffae69af90 (D) vmap_area_list
> > >   crash>
> > >   
> > > Dave
> > > 
> > > 
> > > 
> > 
> > Yeah kmem -v fails as well:
> > crash> kmem -v
> > 
> > kmem: invalid list entry: 0
> > WARNING: malloc/free mismatch (29/30)
> > crash>
> > 
> > 
> > There's no indicating of an error when crash loads though - only
> > after
> > running these commands.  Do you think this a damaged vmcore that is
> > not
> > obvious?
> 
> I don't know it's damaged or if it's a symptom of the kernel
> crash.  Is the
> kernel crash happening while the vmlist is being modified?
> 

No there are no active processes modifying the vmap_area_list 

It is crash due to memory corruption and there are 3rd party modules. 
But the crash is inside xfs and does not appear in any way related to
this nor are the 3rd party modules in any backtraces.


> It's not obvious because it would only be seen when dump_vmlist() is
> called.
> When dump_vmlist() calls dump_vmap_area(), and do_list() returns -1
> back to
> dump_vmap_area(), it gets used as a loop-ending index, and then
> causes 
> presumably bogus values to get returned:
> 
>    do_vmap_area(struct meminfo *vi)
>    {
>    ...
>   
>           ld->end = symbol_value("vmap_area_list");
>           cnt = do_list(ld);
> 
>           for (i = 0; i < cnt; i++) {
>    ...   
>   
>           if (vi->flags & GET_HIGHEST)
>                   vi->retval = start+size;
> 
>           if (vi->flags & GET_VMLIST_COUNT)
>                   vi->retval = count;
> 
>           if (vi->flags & VMLIST_VERIFY)
>                   vi->retval = verified;
>   }
> 
> Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is
> -1?  Although, 
> that would kill all search command attempts.  It's hard to say.
> 
> Dave
> 
> 

I would maybe consider something like this which seems to fix the
persistent malloc/free mismatch errors and give some results?

crash> kmem -v

kmem: invalid list entry: 0
crash> 
crash> kmem -v

kmem: invalid list entry: 0
crash> search 0xdeadbeef

search: invalid list entry: 0

search: invalid list entry: 0

search: invalid list entry: 0
ffff88078d2cda80: deadbeef 
crash> 
crash> kmem -v

kmem: invalid list entry: 0
crash> quit


$ git diff memory.c

diff --git a/memory.c b/memory.c
index 8cdab06..7161d9d 100644
--- a/memory.c
+++ b/memory.c
@@ -8722,6 +8722,11 @@ dump_vmap_area(struct meminfo *vi)
        ld->list_head_offset = OFFSET(vmap_area_list);
        ld->end = symbol_value("vmap_area_list");
        cnt = do_list(ld);
+       if (cnt < 0) {
+               vi->retval = 0;
+               FREEBUF(vmap_area_buf);
+               return;
+       }
 
        for (i = 0; i < cnt; i++) {
                if (!(pc->curcmd_flags & HEADER_PRINTED) && (i == 0) &&