[Crash-utility] HEADS-UP: Linux 3.13-rc1 failures with CONFIG_MAXSMP and CONFIG_SLAB

Dave Anderson anderson at redhat.com
Wed Nov 27 17:30:12 UTC 2013


Just a heads-up if you are attempting to use crash with the
recently-released 3.13-rc1 kernel that is configured with either:

  CONFIG_MAXSMP=y (x86_64 kernels)
  CONFIG_SLAB=y

If CONFIG_MAXSMP is configured, the CONFIG_NR_CPUS value is
overridden, and will be changed to be 8192.  This is from the
3.13-rc1 arch/x86/Kconfig:

  config NR_CPUS
        int "Maximum number of CPUs" if SMP && !MAXSMP
        range 2 8 if SMP && X86_32 && !X86_BIGSMP
        range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK
        range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64
        default "1" if !SMP
        default "8192" if MAXSMP
        default "32" if SMP && (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000)
        default "8" if SMP
        ---help---
          This allows you to specify the maximum number of CPUs which this
          kernel will support.  If CPUMASK_OFFSTACK is enabled, the maximum
          supported value is 4096, otherwise the maximum value is 512.  The
          minimum value which makes sense is 2.

          This is purely to save memory - each supported CPU adds
          approximately eight kilobytes to the kernel image.
  
That is beyond the current crash limit of 5120, and as a result the crash
session will fail like so:
 
  # crash vmlinux vmcore
  
  crash 7.0.3-1.fc20
  Copyright (C) 2002-2013  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
   
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...
  
  WARNING: kernel-configured NR_CPUS (8192) greater than compiled-in NR_CPUS (5120)
  
  crash: recompile crash with larger NR_CPUS
  
  #
 
As the message indicates, the X86_64 NR_CPUS value in "defs.h"
must be bumped from 5120 up to 8192, and the crash binary rebuilt.

Secondly, if the 3.11-rc1 kernel is configured with CONFIG_SLAB, the crash
session will fail with a segmentation violation.  Here is an example with a
crash version that has a bumped-up NR_CPUS and configured to use CONFIG_SLAB:
  
  # crash vmlinux vmcore
  
  crash 7.0.4rc17
  Copyright (C) 2002-2013  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
   
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...
  
  please wait... (gathering kmem slab cache data)Segmentation fault
  

Although I haven't tested recent CONFIG_SLAB kernels (Fedora and RHEL use CONFIG_SLUB
by default) I believe this is related to this patch-set that was pulled into Linux 3.13:
  
  https://lkml.org/lkml/2013/10/16/155
  
  From: Joonsoo Kim <>
  Subject: [PATCH v2 00/15] slab: overload struct slab over struct page to reduce memory usage
  Date: Wed, 16 Oct 2013 17:43:57 +0900
  
  There is two main topics in this patchset. One is to reduce memory usage
  and the other is to change a management method of free objects of a slab.
  
  The SLAB allocate a struct slab for each slab. The size of this structure
  except bufctl array is 40 bytes on 64 bits machine. We can reduce memory
  waste and cache footprint if we overload struct slab over struct page.
  
  And this patchset change a management method of free objects of a slab.
  Current free objects management method of the slab is weird, because
  it touch random position of the array of kmem_bufctl_t when we try to
  get free object. See following example.
  
  struct slab's free = 6 
  kmem_bufctl_t array: 1 END 5 7 0 4 3 2 
  
  To get free objects, we access this array with following index pattern.
  6 -> 3 -> 7 -> 2 -> 5 -> 4 -> 0 -> 1 -> END
  
  If we have many objects, this array would be larger and be not in the same
  cache line. It is not good for performance.
  
  We can do same thing through more easy way, like as the stack.
  This patchset implement it and remove complex code for above algorithm.
  This makes slab code much cleaner.
  
  ...


And even if it's not the direct cause of the segmentation violation, that patchset
completely changes the slab object bookkeeping, and will require a significant
change to the crash utility to support CONFIG_SLAB kernels.

I am starting to work on the CONFIG_SLAB issue, but in the meantime, you can use
the --no_kmem_cache command-line workaround:
  
  # crash --no_kmem_cache vmlinux vmcore
  
  crash 7.0.4rc17
  Copyright (C) 2002-2013  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
   
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...
  
        KERNEL: vmlinux                        
      DUMPFILE: vmcore
          CPUS: 2
          DATE: Wed Nov 27 11:46:03 2013
        UPTIME: 00:11:55
  LOAD AVERAGE: 0.50, 0.35, 0.20
         TASKS: 103
      NODENAME: hp-xw4550-02.ml3.eng.bos.redhat.com
       RELEASE: 3.13.0-0.rc1.git2.1.fc20.x86_64
       VERSION: #1 SMP Tue Nov 26 14:42:45 EST 2013
       MACHINE: x86_64  (2194 Mhz)
        MEMORY: 1.9 GB
         PANIC: "Oops: 0002 [#1] SMP " (check log for details)
           PID: 663
       COMMAND: "bash"
          TASK: ffff88006ba70650  [THREAD_INFO: ffff88006ba34000]
           CPU: 0
         STATE: TASK_RUNNING (PANIC)
  
  crash>

If that option is used, any command that tries to access the slab cache data 
will not execute:
  
  crash> kmem -s
  kmem: kmem cache slab subsystem not available
  crash>
    
When I get CONFIG_SLAB support functional again, I'll release crash-7.0.4.

Thanks,
  Dave




More information about the Crash-utility mailing list