[Crash-utility] [ANNOUNCE] crash version 7.0.9 is available

Dave Anderson anderson at redhat.com
Thu Nov 13 21:41:05 UTC 2014


Download from: http://people.redhat.com/anderson
                 or
               https://github.com/crash-utility/crash/releases

The master branch serves as a development branch that will contain all 
patches that are queued for the next release:

  $ git clone git://github.com/crash-utility/crash.git


Changelog:
 
 - Fix the CPU timer and clock comparator output for the "bt -a" command
   on S390X machines.  The output of CPU timer and clock comparator has
   always been incorrect because:
     - We added S390X_WORD_SIZE (8) instead of 4 to get the second word
     - We did not left shift the clock comparator by 8
   The fix gets the complete 64 bit values and by shifting the clock 
   comparator correctly.
   (holzheu at linux.vnet.ibm.com)

 - Add "/lib/modules/<version>/build" to the list of directories that
   are searched for the currently-running kernel on live systems.  This
   will automatically locate the vmlinux namelist for kernels that were
   locally installed with "make modules_install install".
   (lrintel at redhat.com)

 - Addressed 3 Coverity Scan issues:
     (1) task.c: initialize the "curr" and "curr_my_q" variables in the
         dump_tasks_in_task_group_cfs_rq() function.
     (2) ramdump.c: make the "rd" and "len" return values from read()
         and write() calls in write_elf() to be ssize_t types.
     (3) cmdline.c: make the parsed PATH string buffer equal to the size
         of the PATH string + 1 to prevent a possible buffer overflow 
         when a command line starts with a "!".
   (anderson at redhat.com)

 - Fix for the one-time (dumpfile), or as-required (live system),
   gathering of tasks from the kernel pid_hash[] in 2.6.24 and later 
   kernels.  Without the patch, if an entry in a pid_hash[] chain is
   not related to the "init_pid_ns" pid_namespace structure, any 
   remaining entries in the hlist chain are skipped. 
   (vvs at parallels.com)

 - Update the "extensions/snap.mk" file to allow the "snap.so" extension
   module to be built outside of a crash source tree on a ppc64le PPC64
   little-endian host.  Without the patch, "make -f snap.mk" would fail
   to compile, indicating "gcc: error: macro name missing after '-D'"
   (anderson at redhat.com)

 - Improve the method for determining whether a 32-bit ARM vmlinux is
   an LPAE enabled kernel by first checking whether CONFIG_ARM_LPAE 
   exists in the vmcoreinfo data, and if it does not, by then checking
   whether the next higher symbol above "swapper_pg_dir" is 0x5000 bytes
   higher in value.
   (sdu.liu at huawei.com)

 - Fix "defs.h" for building extension modules outside of the crash 
   utility source tree on PPC and PPC64 machines.  Without the patch,
   both PPC and PPC64 will get #define'd if the extension module build
   procedure does not #define one or the other, which in turn causes 
   multiple conflicting declarations. 
   (anderson at redhat.com)
 
 - Fix for the "ps" command performance degradation patch the was
   introduced in crash-7.0.8.  Without this patch, it is possible that
   the "ps" command may fail prematurely with the error message 
   "ps: bsearch for tgid failed: task: <address> tgid: <number>"
   when running on a live system or against a "live" dumpfile.
   (panfy.fnst at cn.fujitsu.com)

 - Set the 32-bit ARM HZ value to a default value of 100 if the kernel
   was not configured with CONFIG_IKCONFIG.  Without the patch, the 
   initial system banner and the "sys" command show "UPTIME: (cannot 
   calculate: unknown HZ value)", the "ps -t" option shows "RUN TIME:
   (cannot calculate: unknown HZ value)", and the "timer -r" option 
   kills the crash session with a floating point exception.
   (hukeping at huawei.com)

 - Fix the error message displayed if the vmlinux or vmcore file is
   not the same endian as the crash utility binary.  Without the patch
   the filename is shown with the incorrect/opposite endian type.
   (hukeping at huawei.com)

 - Update the "ps" command's "ST" task state display to recognize the
   TASK_PARKED state in Linux 3.9 and later kernels.  Without the patch,
   the command's "ST" column entry for parked tasks shows "??".  The 
   state column will now show "PA", and the foreach command will accept 
   "PA" as a "state" argument.
   (anderson at redhat.com)

 - Fortify the protection against the use of an invalid/corrupted 
   CONFIG_SLAB kmem_cache per-cpu array_cache.limit value during 
   session initialization.  In a recently seen vmcore, several of the
   array_cache.limit values were corrupted such that they were stored
   as negative values, which in turn caused the "kmem -[sS]" options
   to fail immediately with a dump of the internal memory buffer 
   allocation statistics and the error message "kmem: cannot allocate
   any more memory!".
   (anderson at redhat.com)

 - Implement a new "offline" internal crash variable that can be set to
   either "show" (the default) or "hide".  When set to "hide", certain
   command output associated with offline cpus will be hidden from view,
   and the output will indicate that the cpu is "[OFFLINE]".  The new
   variable can be set during invocation on the crash command line via 
   the option "--offline [show|hide]".  During runtime, or in a .crashrc
   or other crash input file, the variable can be set by entering 
   "set offline [show|hide]".  The commands or options that are affected
   when the variable is set to "hide" are as follows:

     o  On X86_64 machines, the "bt -E" option will not search exception
        stacks associated with offline cpus.
     o  On X86_64 machines, the "mach" command will append "[OFFLINE]"
        to the addresses of IRQ and exception stacks associated with 
        offline cpus.
     o  On X86_64 machines, the "mach -c" command will not display the
        cpuinfo_x86 data structure associated with offline cpus.
     o  The "help -r" option has been fixed so as to not attempt to 
        display register sets of offline cpus from ELF kdump vmcores, 
        compressed kdump vmcores, and ELF kdump clones created by 
        "virsh dump --memory-only".
     o  The "bt -c" option will not accept an offline cpu number.
     o  The "set -c" option will not accept an offline cpu number.
     o  The "irq -s" option will not display statistics associated with
        offline cpus.
     o  The "timer" command will not display hrtimer data associated
        with offline cpus.
     o  The "timer -r" option will not display hrtimer data associated
        with offline cpus.
     o  The "ptov" command will append "[OFFLINE]" when translating a 
        per-cpu address offset to a virtal address of an offline cpu.
     o  The "kmem -o" option will append "[OFFLINE]" to the base per-cpu
        virtual address of an offline cpu.
     o  The "kmem -S" option in CONFIG_SLUB kernels will not display
        per-cpu data associated with offline cpus.
     o  When a per-cpu address reference is passed to the "struct" 
        command, the data structure will not be displayed for offline
        cpus.
     o  When a per-cpu symbol and cpu reference is passed to the "p"
        command, the data will not be displayed for offline cpus.
     o  When the "ps -[l|m]" option is passed the optional "-C [cpus]" 
        option, the tasks queued on offline cpus are not shown.
     o  The "runq" command and the "runq [-t/-m/-g/-d]" options will not
        display runqueue data for offline cpus.
     o  The "ps" command will replace the ">" active task indicator to
        a "-" for offline cpus.

   The initial system information banner and the "sys" command will 
   display the total number of cpus as before, but will append the count
   of offline cpus.  Lastly, a fix has been made for the initialization 
   time determination of the maximum number of per-cpu objects queued 
   in a CONFIG_SLAB kmem_cache so as to continue checking all cpus 
   higher than the first offline cpu.  These changes in behavior are not 
   dependent upon the setting of the crash "offline" variable.
   (qiaonuohan at cn.fujitsu.com)

 - Adjustment to the "offline" patch-set to make the initial system
   banner, the "sys" command, and the X86_64 "mach" command, to only 
   show the "OFFLINE" cpu count if there are actually offline cpus.
   (anderson at redhat.com)

 - Make the "bt -E" option conform to a "-c cpu(s)" specification when 
   the the two options are used together.  Without the patch, "bt -E" 
   ignores a cpu specifier.
   (anderson at redhat.com)

 - Fix for the determination of the cpu count on 32-bit ARM machines.
   Without the patch, if certain patterns of cpus are offline, the count
   may be too small, causing cpu-dependent commands to not recognize 
   online cpus. 
   (Jan.Karlsson at sonymobile.com, anderson at redhat.com)

 - Fix for a missing exception frame dump by the X86_64 "bt" command
   when an IRQ is received while a task is running on its per-cpu
   interrupt stack with interrupts enabled.
   (anderson at redhat.com)

 - Fix for the determination of the cpu count on ARM64 machines.  
   Without the patch, if certain patterns of cpus are offline, the count
   may be too small, causing cpu-dependent commands to not recognize 
   online cpus. 
   (Jan.Karlsson at sonymobile.com, anderson at redhat.com)

 - Fix for a possible SIGSEGV generated during session initialization 
   while "please wait... (determining panic task)" is being displayed.
   This was caused by a patch introduced in crash-7.0.8, and can only 
   happen when analyzing dumpfiles whose header does not contain the
   requisite information to determine the panic task and the active
   tasks do not have any crash-related traces in their kernel stacks.
   It should be noted that the SIGSEGV can be avoided by entering 
   "--no_panic" on the crash command line.
   (anderson at redhat.com)

 - Fix for a SIGSEGV generated by the "bt -a" or "help -r" commands
   if the NT_PRSTATUS notes in a compressed kdump are invalid/corrupt.
   If all cpus are online but the dumpfile initialization that cycles
   through the NT_PRSTATUS notes does not find exactly one note per 
   cpu, then the register contents in those notes should not be used.
   (anderson at redhat.com)

 - Fix for data access from "split" compressed kdump dumpfiles.  Without
   the patch, if a dumpfile read targets physical memory in the first 
   memory page stored in the second or later sequential split dumpfile,
   incorrect data will be returned.
   (qiaonuohan at cn.fujitsu.com)

 - Correction of the copyright and authorship of ramdump.c.
   (oza at broadcom.com)

 - Added recognition of the new DUMP_DH_COMPRESSED_INCOMPLETE flag in
   the header of compressed kdumps, and the new DUMP_ELF_INCOMPLETE flag
   in the header of ELF kdumps.  If the makedumpfile(8) facility fails
   to complete the creation of compressed or ELF kdump vmcore files
   due to ENOSPC or other error, it will mark the vmcore as incomplete.
   If either flag is set, the crash utility will issue a warning that 
   the dumpfile is known to be incomplete during initialization, just 
   prior to the system banner display.  When reads are attempted on 
   missing data, a read error will be returned.  As an alternative,
   zero-filled data will be returned if the "--zero_excluded" command
   line flag is used, or the "zero_excluded" runtime variable is set 
   to "on".  In either case, the read errors or zero-filled memory 
   may cause the crash session to fail entirely, cause commands to
   fail, or may result in other unpredictable runtime behavior.
   (anderson at redhat.com, zhouwj-fnst at cn.fujitsu.com)

 - If a kernel has been configured with CONFIG_DEBUG_INFO_REDUCED, then
   the crash utility will fail to initialize, typically with a message
   indicating "no debugging data available".  However, it has been 
   reported (on a 32-bit ARM system) that the initialization sequence 
   continued on beyond that message point, and the session failed later
   on with the message "neither runqueue nor rq structures exist".  As
   an aid to understanding why the session failed, if the target kernel
   is configured with CONFIG_IKCONFIG, and CONFIG_DEBUG_INFO_REDUCED has
   been set to "y", a relevant warning message will be displayed.
   (anderson at redhat.com)

 - Implemented support for this Linux 3.18 commit for kernels that are 
   configured with CONFIG_SLAB:

     commit bf0dea23a9c094ae869a88bb694fbe966671bf6d
     mm/slab: use percpu allocator for cpu cache

   The commit above redesigned the kmem_cache.array_cache[] from a
   hardwired array to a per-cpu pointer referencing external array_cache
   structures.  Without the patch, the crash session would fail during 
   initialization with the message "crash: cannot resolve cache_cache".
   Note that it could be worked around by using the "--no_kmem_cache" 
   command line option, with a resulting loss of functionality for
   commands requiring slab-related data.
   (anderson at redhat.com)

 - Implemented a new "sys -t" option that displays kernel taint 
   information.  If the "tainted_mask" symbol exists, the option will
   show its hexadecimal value and translate each bit set to the symbolic
   letter of the taint type.  On kernels prior to 2.6.28 which had the
   "tainted" symbol, only its hexadecimal value is shown.  The relevant
   kernel sources should be consulted for the meaning of the letter(s)
   or hexadecimal bit value(s).
   (anderson at redhat.com)

 - Cosmetic fix for the "help -[n|D]" translation of the bitmap contents
   of the kdump_sub_header.dump_level flag in compressed kdump dumpfiles.
   (anderson at redhat.com)

 - Fix for the support of compressed kdump clones created with the KVM
   "virsh dump --memory-only --format <compression-type>" command,
   where the compression-type is either "kdump-zlib", "kdump-lzo" or
   "kdump-snappy".  Without the patch, if an x86_64 guest kernel was loaded
   with a non-zero "phys_base", the "--machdep phys_base=<offset>" command
   line option was required as a workaround or the crash session would fail
   with the warning message "WARNING: cannot read linux_banner string"
   followed by the fatal error message "crash: vmlinux and <dumpfile name>
   do not match!".
   (anderson at redhat.com)




More information about the Crash-utility mailing list