[Crash-utility] WARNING: malloc/free mismatch

Mon Apr 22 14:16:13 UTC 2019

On Thu, 2019-04-18 at 16:59 -0400, Dave Anderson wrote:
> 
> ----- Original Message -----
> > On Thu, 2019-04-18 at 14:05 -0400, Dave Anderson wrote:
> > > 
> > > ----- Original Message -----
> > > > On Thu, 2019-04-18 at 11:09 -0400, Dave Anderson wrote:
> > > > > 
> > > > > ----- Original Message -----
> > > > > > On Thu, 2019-04-18 at 15:02 +0100, Pierguido Lambri wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > Today while I was looking into a vmcore, I got suddenly
> > > > > > > the
> > > > > > > message
> > > > > > > in $SUBJECT.
> > > > > > > It started after I did a search into the process stack
> > > > > > > pages
> > > > > > > (search
> > > > > > > -t)
> > > > > > > and for each command I run afterwards I kept getting that
> > > > > > > message.
> > > > > > > For example:
> > > > > > > 
> > > > > > > $ retrace-server-interact 603967269 crash
> > > > > > > ...
> > > > > > > crash> search -t ffff88040a0d5280
> > > > > > > 
> > > > > > > search: invalid list entry: 0
> > > > > > > 
> > > > > > > search: invalid list entry: 0
> > > > > > > 
> > > > > > > search: invalid list entry: 0
> > > > > > > PID: 606    TASK: ffff88082d226eb0  CPU: 5   COMMAND:
> > > > > > > "xfsaild/dm-0"
> > > > > > > ffff88083ff5b948: ffff88040a0d5280
> > > > > > > ffff88083ff5b990: ffff88040a0d5280
> > > > > > > ffff88083ff5baa8: ffff88040a0d5280
> > > > > > > ffff88083ff5baf0: ffff88040a0d5280
> > > > > > > ffff88083ff5bcf0: ffff88040a0d5280
> > > > > > > ffff88083ff5bd38: ffff88040a0d5280
> > > > > > > ffff88083ff5bd98: ffff88040a0d5280
> > > > > > > 
> > > > > > > 
> > > > > > > WARNING: malloc/free mismatch (29/32)
> > > > > > > 
> > > > > > > crash> ps -m | grep UN
> > > > > > > [ 0 00:00:00.146] [UN]  PID: 1811   TASK:
> > > > > > > ffff880c17bd1fa0  CPU:
> > > > > > > 1   COMMAND: "cp"
> > > > > > > WARNING: malloc/free mismatch (29/32)
> > > > > > > 
> > > > > > > I guess this comes from a possible corrupted vmcore (I
> > > > > > > just
> > > > > > > got
> > > > > > > it
> > > > > > > from this vmcore),
> > > > > > > but I wonder why every new command keeps returning the
> > > > > > > same
> > > > > > > message.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > 
> > > > > > > Pier
> > > > > > > 
> > > > > > > --
> > > > > > > Crash-utility mailing list
> > > > > > > Crash-utility at redhat.com
> > > > > > > https://www.redhat.com/mailman/listinfo/crash-utility
> > > > > > 
> > > > > > FWIW, I just pulled this up after plambri pinged me.  This
> > > > > > is
> > > > > > the
> > > > > > backtrace that is being hit though I've not dug in more:
> > > > > > 
> > > > > > Breakpoint 3, do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > > > 3820                                    error(INFO,
> > > > > > "\ninvalid
> > > > > > list
> > > > > > entry:
> > > > > > 0\n");
> > > > > > (gdb) list
> > > > > > 3815                            return -1;
> > > > > > 3816                    }
> > > > > > 3817
> > > > > > 3818                    if (next == 0) {
> > > > > > 3819                            if (ld->flags &
> > > > > > LIST_HEAD_FORMAT) {
> > > > > > 3820                                    error(INFO,
> > > > > > "\ninvalid
> > > > > > list
> > > > > > entry:
> > > > > > 0\n");
> > > > > > 3821                                    if
> > > > > > (close_hq_on_return)
> > > > > > 3822                                            hq_close();
> > > > > > 3823                                    return -1;
> > > > > > 3824                            }
> > > > > > (gdb) bt
> > > > > > #0  do_list (ld=0x7ffffffea6c0) at tools.c:3820
> > > > > > #1  0x000000000047ec82 in dump_vmap_area
> > > > > > (vi=0x7ffffffed0d0) at
> > > > > > memory.c:8724
> > > > > > #2  dump_vmlist (vi=0x7ffffffed0d0) at memory.c:8590
> > > > > > #3  0x000000000047f3eb in last_vmalloc_address () at
> > > > > > memory.c:16792
> > > > > > #4  0x0000000000515e6b in x86_64_get_kvaddr_ranges
> > > > > > (vrp=0x7fffffffd340) at x86_64.c:8706
> > > > > > #5  0x000000000049c6ae in cmd_search () at memory.c:13988
> > > > > > #6  0x0000000000465f9c in exec_command () at main.c:879
> > > > > > #7  0x00000000004661ca in main_loop () at main.c:826
> > > > > > #8  0x00000000006b21a3 in captured_command_loop
> > > > > > (data=<value
> > > > > > optimized out>) t main.c:258
> > > > > > #9  0x00000000006b0a8b in catch_errors (func=0x6b2190
> > > > > > <captured_command_loop>, func_args=0x0, errstring=0x90c106
> > > > > > "",
> > > > > > mask=6) at exceptions.c:557
> > > > > > #10 0x00000000006b3076 in captured_main (data=<value
> > > > > > optimized
> > > > > > out>) at main.c:1064
> > > > > > #11 0x00000000006b0a8b in catch_errors (func=0x6b22b0
> > > > > > <captured_main>, func_args=0x7fffffffe2e0,
> > > > > > errstring=0x90c106
> > > > > > "",
> > > > > > mask=6) at exceptions.c:557
> > > > > > #12 0x00000000006b1fa4 in gdb_main (args=<value optimized
> > > > > > out>)
> > > > > > at
> > > > > > main.c:1079
> > > > > > #13 0x00000000006b1fde in gdb_main_entry (argc=<value
> > > > > > optimized
> > > > > > out>, argv=<value optimized out>) at main.c:1099
> > > > > > #14 0x0000000000467030 in main (argc=3,
> > > > > > argv=0x7fffffffe458) at
> > > > > > main.c:707
> > > > > 
> > > > > Hmmm, the vmap_area list is a list_head type list, so there
> > > > > should
> > > > > never be
> > > > > a NULL "next" pointer.
> > > > > 
> > > > > I'm guessing that "kmem -v" also fails?  The last vmap_area
> > > > > entry
> > > > > should point back to
> > > > > the global "vmap_area_list" list header, for example:
> > > > > 
> > > > >   crash> kmem -v | tail
> > > > >   ffff96e7ecaaca80  ffff96e54c89c400  ffffffffc0e54000 -
> > > > > ffffffffc0e5a000    24576
> > > > >   ffff96e757ffe380  ffff96e4be98f3c0  ffffffffc0e5d000 -
> > > > > ffffffffc0e6d000    65536
> > > > >   ffff96e467b33400  ffff96e6a3ae1a00  ffffffffc0e6d000 -
> > > > > ffffffffc0e73000    24576
> > > > >   ffff96e85cf4e600  ffff96e752c52b40  ffffffffc0e77000 -
> > > > > ffffffffc0e7c000    20480
> > > > >   ffff96e85cf4e380  ffff96e5506c6c00  ffffffffc0e7c000 -
> > > > > ffffffffc0e81000    20480
> > > > >   ffff96e802baa500  ffff96e5506c69c0  ffffffffc0e81000 -
> > > > > ffffffffc0e86000    20480
> > > > >   ffff96e802baac00  ffff96e5506c6cc0  ffffffffc0e86000 -
> > > > > ffffffffc0e8c000    24576
> > > > >   ffff96e574196f80  ffff96e55ffd6c80  ffffffffc0e90000 -
> > > > > ffffffffc0e95000    20480
> > > > >   ffff96e574196680  ffff96e55ffd6880  ffffffffc0e95000 -
> > > > > ffffffffc0e9a000    20480
> > > > >   ffff96e87c222800  ffff96e5496ca680  ffffffffc0e9a000 -
> > > > > ffffffffc0ea4000    40960
> > > > >   crash> vmap_area ffff96e87c222800
> > > > >   struct vmap_area {
> > > > >     va_start = 18446744072651120640,
> > > > >     va_end = 18446744072651161600,
> > > > >     flags = 4,
> > > > >     rb_node = {
> > > > >       __rb_parent_color = 18446628510972342169,
> > > > >       rb_right = 0x0,
> > > > >       rb_left = 0xffff96e574196698
> > > > >     },
> > > > >     list = {
> > > > >       next = 0xffffffffae69af90,
> > > > >       prev = 0xffff96e5741966b0
> > > > >     },
> > > > >     purge_list = {
> > > > >       next = 0x0,
> > > > >       prev = 0xdead000000000200
> > > > >     },
> > > > >     vm = 0xffff96e5496ca680,
> > > > >     callback_head = {
> > > > >       next = 0x0,
> > > > >       func = 0xffff96e71d51aa00
> > > > >     }
> > > > >   }
> > > > >   crash> sym 0xffffffffae69af90
> > > > >   ffffffffae69af90 (D) vmap_area_list
> > > > >   crash>
> > > > >   
> > > > > Dave
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > Yeah kmem -v fails as well:
> > > > crash> kmem -v
> > > > 
> > > > kmem: invalid list entry: 0
> > > > WARNING: malloc/free mismatch (29/30)
> > > > crash>
> > > > 
> > > > 
> > > > There's no indicating of an error when crash loads though -
> > > > only
> > > > after
> > > > running these commands.  Do you think this a damaged vmcore
> > > > that is
> > > > not
> > > > obvious?
> > > 
> > > I don't know it's damaged or if it's a symptom of the kernel
> > > crash.  Is the
> > > kernel crash happening while the vmlist is being modified?
> > > 
> > 
> > No there are no active processes modifying the vmap_area_list
> > 
> > It is crash due to memory corruption and there are 3rd party
> > modules.
> > But the crash is inside xfs and does not appear in any way related
> > to
> > this nor are the 3rd party modules in any backtraces.
> > 
> > 
> > > It's not obvious because it would only be seen when dump_vmlist()
> > > is
> > > called.
> > > When dump_vmlist() calls dump_vmap_area(), and do_list() returns
> > > -1
> > > back to
> > > dump_vmap_area(), it gets used as a loop-ending index, and then
> > > causes
> > > presumably bogus values to get returned:
> > > 
> > >    do_vmap_area(struct meminfo *vi)
> > >    {
> > >    ...
> > >   
> > >           ld->end = symbol_value("vmap_area_list");
> > >           cnt = do_list(ld);
> > > 
> > >           for (i = 0; i < cnt; i++) {
> > >    ...
> > >   
> > >           if (vi->flags & GET_HIGHEST)
> > >                   vi->retval = start+size;
> > > 
> > >           if (vi->flags & GET_VMLIST_COUNT)
> > >                   vi->retval = count;
> > > 
> > >           if (vi->flags & VMLIST_VERIFY)
> > >                   vi->retval = verified;
> > >   }
> > > 
> > > Mabye dump_vmap_area() should do a error(FATAL, ...) if cnt is
> > > -1?  Although,
> > > that would kill all search command attempts.  It's hard to say.
> > > 
> > > Dave
> > > 
> > > 
> > 
> > I would maybe consider something like this which seems to fix the
> > persistent malloc/free mismatch errors and give some results?
> > 
> > crash> kmem -v
> > 
> > kmem: invalid list entry: 0
> > crash>
> > crash> kmem -v
> > 
> > kmem: invalid list entry: 0
> > crash> search 0xdeadbeef
> > 
> > search: invalid list entry: 0
> > 
> > search: invalid list entry: 0
> > 
> > search: invalid list entry: 0
> > ffff88078d2cda80: deadbeef
> > crash>
> > crash> kmem -v
> > 
> > kmem: invalid list entry: 0
> > crash> quit
> > 
> > 
> > $ git diff memory.c
> > diff --git a/memory.c b/memory.c
> > index 8cdab06..7161d9d 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -8722,6 +8722,11 @@ dump_vmap_area(struct meminfo *vi)
> >         ld->list_head_offset = OFFSET(vmap_area_list);
> >         ld->end = symbol_value("vmap_area_list");
> >         cnt = do_list(ld);
> > +       if (cnt < 0) {
> > +               vi->retval = 0;
> > +               FREEBUF(vmap_area_buf);
> > +               return;
> > +       }
> >  
> >         for (i = 0; i < cnt; i++) {
> >                 if (!(pc->curcmd_flags & HEADER_PRINTED) && (i ==
> > 0) &&
> > 
> > --
> 
> I was wondering how the search command would handle its call to
> machdep->get_kvaddr_ranges()
> with the patch above -- which would return 0 as the vmalloc address
> range's "end" address.
> But given your output above, apparently it seems to work around it.
> 
> Thanks,
>   Dave
> 
> 

As far as I could tell, the code properly checks for a non-zero
meminfo.retval before proceeding in all instances.