[Crash-utility] [RFC PATCH v2 0/2] Show memory overcommit data in dump_kmeminfo()

Dave Anderson anderson at redhat.com
Tue Dec 2 17:11:17 UTC 2014



----- Original Message -----
> The first patch changes dump_kmeminfo() to report overcommit information
> similar to that displayed under the proc/meminfo file. It may be useful to
> indicate memory over commitment abuse, for example with forced vmcores from
> system hangs due to shortage of memory. The intended output is as follows:
> 
>   crash> kmem -i
>                    PAGES        TOTAL      PERCENTAGE
>       TOTAL MEM  1965332       7.5 GB         ----
>            FREE    78080       305 MB    3% of TOTAL MEM
>            USED  1887252       7.2 GB   96% of TOTAL MEM
>          SHARED   789954         3 GB   40% of TOTAL MEM
>         BUFFERS   110606     432.1 MB    5% of TOTAL MEM
>          CACHED  1212645       4.6 GB   61% of TOTAL MEM
>            SLAB   146563     572.5 MB    7% of TOTAL MEM
> 
>      TOTAL SWAP  1970175       7.5 GB         ----
>       SWAP USED        5        20 KB    0% of TOTAL SWAP
>       SWAP FREE  1970170       7.5 GB   99% of TOTAL SWAP
> 
>    COMMIT LIMIT  2952841      11.3 GB         ----
>       COMMITTED  1150595       4.4 GB   38% of TOTAL LIMIT
> 
> The second patch simply removes the mention of dump_zone_page_usage()
> availability from kmem's help page.
> 
> Tested under 3.16.4-200.fc20.x86_64 only.
> Though this should work under RHEL5 (2.6.18) and above.
> 
> Aaron Tomlin (2):
>   kmem: Show memory commitment data in kmem output
>   help: Remove mention of dump_zone_page_usage()
> 
>  help.c   |  40 ++++++++---------
>  memory.c | 153
>  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 155 insertions(+), 38 deletions(-)
> 

Hi Aaron,

I've got a RHEL4 kernel which shows a COMMIT LIMIT of zero, and then
dies with a SIGFPE:

$ gdb crash

...[ cut ] ...

crash 7.1.0rc5
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
please wait... (uncompressing vmlinux-2.6.9-73chaos.gz)Detaching after fork from child process 26349.
Detaching after fork from child process 26351.                                       
Detaching after fork from child process 26352.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

Detaching after fork from child process 26353.
please wait... (gathering task table data)      
WARNING: active task 102efb5adc0 on cpu 5: corrupt cpu value: 2152520160

      KERNEL: vmlinux-2.6.9-73chaos.gz
    DUMPFILE: vmcore.2007-11-23.0
        CPUS: 8
        DATE: Fri Nov 23 16:03:46 2007
      UPTIME: 4 days, 04:37:31
LOAD AVERAGE: 0.00, 0.46, 3.19
       TASKS: 385
    NODENAME: zeus205
     RELEASE: 2.6.9-73chaos
     VERSION: #1 SMP Thu Sep 27 14:00:05 PDT 2007
     MACHINE: x86_64  (2412 Mhz)
      MEMORY: 15.2 GB
       PANIC: "Oops: 0000 [1] <ffffffff802f063e>{schedule+96} SMP " (check log for details)
         PID: 5539
     COMMAND: "kiblnd_sd_03"
        TASK: 102f9654a00  [THREAD_INFO: 100bd8fa000]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash> kmem -i
Detaching after fork from child process 26354.
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  3913204      14.9 GB         ----
         FREE  3763462      14.4 GB   96% of TOTAL MEM
         USED   149742     584.9 MB    3% of TOTAL MEM
       SHARED     2459       9.6 MB    0% of TOTAL MEM
      BUFFERS        0            0    0% of TOTAL MEM
       CACHED    34170     133.5 MB    0% of TOTAL MEM
         SLAB    55775     217.9 MB    1% of TOTAL MEM

   TOTAL HIGH        0            0    0% of TOTAL MEM
    FREE HIGH        0            0    0% of TOTAL HIGH
    TOTAL LOW  3913204      14.9 GB  100% of TOTAL MEM
     FREE LOW  3763462      14.4 GB   96% of TOTAL LOW

   TOTAL SWAP        0            0         ----
    SWAP USED        0            0  100% of TOTAL SWAP
    SWAP FREE        0            0    0% of TOTAL SWAP

 COMMIT LIMIT        0            0         ----

Program received signal SIGFPE, Arithmetic exception.
dump_kmeminfo () at memory.c:7970
7970						/ allowed) : 0;
Missing separate debuginfos, use: debuginfo-install glibc-2.15-59.fc17.x86_64 libgcc-4.7.2-2.fc17.x86_64 libstdc++-4.7.2-2.fc17.x86_64 lzo-2.06-2.fc17.x86_64 ncurses-libs-5.9-11.20130511.fc17.x86_64 snappy-1.0.5-1.fc17.x86_64 xz-libs-5.1.2-1alpha.fc17.x86_64 zlib-1.2.5-7.fc17.x86_64
(gdb) bt
#0  dump_kmeminfo () at memory.c:7970
#1  0x00000000004a5a55 in cmd_kmem () at memory.c:4632
#2  0x0000000000467ca9 in exec_command () at main.c:832
#3  0x0000000000467ed2 in main_loop () at main.c:779
#4  0x000000000068afb3 in captured_command_loop (data=data at entry=0x0) at main.c:258
#5  0x000000000068964e in catch_errors (func=func at entry=0x68afa0 <captured_command_loop>, func_args=func_args at entry=0x0, 
    errstring=errstring at entry=0x8cc251 "", mask=mask at entry=6) at exceptions.c:557
#6  0x000000000068be26 in captured_main (data=data at entry=0x7fffffffddc0) at main.c:1064
#7  0x000000000068964e in catch_errors (func=func at entry=0x68b180 <captured_main>, 
    func_args=func_args at entry=0x7fffffffddc0, errstring=errstring at entry=0x8cc251 "", mask=mask at entry=6) at exceptions.c:557
#8  0x000000000068c174 in gdb_main (args=args at entry=0x7fffffffddc0) at main.c:1079
#9  0x000000000068c1ae in gdb_main_entry (argc=<optimized out>, argv=argv at entry=0x7fffffffdf18) at main.c:1099
#10 0x00000000004e6194 in gdb_main_loop (argc=<optimized out>, argc at entry=3, argv=argv at entry=0x7fffffffdf18)
    at gdb_interface.c:76
#11 0x0000000000466325 in main (argc=3, argv=0x7fffffffdf18) at main.c:677
(gdb) p allowed
$1 = 0
(gdb) 

So there also needs to be an allowance for this:

  crash> sym sysctl_overcommit_kbytes
  symbol not found: sysctl_overcommit_kbytes
  possible alternatives:
    (none found)
  crash> p sysctl_overcommit_ratio
  sysctl_overcommit_ratio = $2 = 0
  crash>

Since kmem -i is such a commonly used command, any required offsets should
be stored in the offset table, and not reinitialized every time it's called.
Can you add atomic_t.counter and percpu_counter.count to the bottom of the
offset_table() so that OFFSET() can be used?  And for that matter, maybe have 
vm_init() initialize all of the hstate-related offsets in order to simplify
dump_hstates() and get_hugetlb_total_pages() so that they only have to check
for the validity of the sizes/offsets that they need instead of having to 
re-initialize them every time?

Also, is there a reason you made this change?:

@@ -4627,7 +4628,7 @@ cmd_kmem(void)

        }

-       if (iflag == 1)
+       if (iflag)
                dump_kmeminfo();

        if (pflag == 1)

Thanks,
  Dave


  







More information about the Crash-utility mailing list