[Crash-utility] patch to add vmss memory regions support

Dyno (Hongjun) Fu hfu at vmware.com
Tue Mar 31 16:28:47 UTC 2015


thanks for your patient guidance!
rgds,
Dyno

On 3/31/15 8:07 AM, Dave Anderson wrote:
>
> ----- Original Message -----
>> Dave,
>>   patch updated. please review, thanks.
>>   * converted all the FATAL to INFO.
>>   * the reason to "break" rather than "return FALSE" is that we might still
>>     be able to read the "memory" group which is the only relevant checkpoint
>>     group we care about.
> Hi Dyno,
>
> This patch looks OK, except for one minor thing.
>
> I see that you removed the two CRASHDEBUG(1) messages in is_vmware_vmss()
> as well.  In the case of the failure to fopen() or fread() the passed-in
> filename, it is OK to call error(INFO, ...), but they are essentially both
> no-ops.  Given that main() will reject any non-readable file, and several
> earlier dumpfile-checking functions will call error(FATAL, ...) if the 
> dumpfile cannot be opened, those messages will never be displayed.
>
> However, in this case:
>
> @@ -47,8 +53,7 @@ is_vmware_vmss(char *filename)
>             hdr.id != CPTDUMP_PARTIAL_MAGIC_NUMBER &&
>             hdr.id != CPTDUMP_RESTORED_MAGIC_NUMBER &&
>             hdr.id != CPTDUMP_NORESTORE_MAGIC_NUMBER) {
> -               if (CRASHDEBUG(1))
> -                       error(INFO, LOGPRX"Unrecognized .vmss file (magic %x).\n", hdr.id);
> +               error(INFO, LOGPRX"Unrecognized .vmss file (magic %x).\n", hdr.id);
>                 return FALSE;
>         }
>
> The CRASHDEBUG(1) absolutely should be there.  The is_vmware_vmss() function
> is simply there to verify whether the passed-in filename is a .vmss file.
> If it is not -- based upon the header contents -- then it should quietly return
> FALSE.  There will be other file formats in the future, and most likely will be placed
> following the is_vmware_vmss() call in main(), and we won't want that irrelevant
> message.  So I've restored that particular CRASHDEBUG(1) qualifier.
>
> Aside from that, the patch is queued for crash-7.1.1:
>
>   https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_crash-2Dutility_crash_commit_1ed90b28af6dbb74fd785e36502f16a8f156e3b3&d=AwICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=g2Vka_25x09RSowRkQw8pA&m=OAtu5GVMTW4RSkslwDnYb94wwRRNEwxSMomEbEnil1M&s=9nX5ubhXrdRFAOTUEhlINNAPP1tYgN3O6d_vhH1MYi4&e= 
>
> Thanks,
>   Dave
>
>   
>> rgds,
>> Dyno
>>
>> On 3/30/15 12:14 PM, Dave Anderson wrote:
>>> ----- Original Message -----
>>>> Dave,
>>>>    I've removed the assert now. please find the updated patch in
>>>>    attachment.
>>>>    thanks.
>>>> rgds,
>>>> Dyno
>>> Hi Dyno,
>>>
>>> I should have combined this actual code review with my last
>>> response re: the assert() calls, but there are just a few minor
>>> things that still need adjustment.
>>>
>>> As I mentioned in an earlier response, during session initialization,
>>> which is before RUNTIME gets set in pc->flags in main_loop(), any
>>> error(FATAL, ...) call will print the error message and the crash
>>> session immediately exits.  And that's OK if that's what you want
>>> to happen.
>>>
>>> But in vmware_vmss_init() you changed many of the previous error messages
>>> that used fprintf(vmss.ofp, ...) to use a construct using FATAL error
>>> messages like this:
>>>
>>> +               error(FATAL, LOGPRX"%s: %s\n", filename, strerror(errno));
>>> +               result = FALSE;
>>> +               goto exit;
>>>
>>> But that doesn't make sense since the error message will get printed and
>>> crash will exit immediately, and therefore will not return back here in
>>> memory_sources_init():
>>>
>>> +               } else if (pc->flags & VMWARE_VMSS) {
>>> +                       if (!vmware_vmss_init(pc->dumpfile, fp))
>>> +                               error(FATAL, "%s: initialization failed\n",
>>> +                                       pc->dumpfile);
>>> +               }
>>>
>>> So preferably the calls should be changed to error(INFO, ...) so that the
>>> user
>>> will get the specific message followed by the general "initialization
>>> failed"
>>> message.
>>>
>>> Also, it would make sense to add an error(INFO, ...) message here as you
>>> have done for the other error scenarios:
>>>
>>> +       if (fread(&hdr, sizeof(cptdumpheader), 1, fp) != 1) {
>>> +               result = FALSE;
>>> +               goto exit;
>>> +       }
>>>
>>> If that initial fread() of the header fails, then the session will exit
>>> with
>>> just the single somewhat-confusing "initialization failed" message.
>>>
>>> I also have a general question about vmware_vmss_init() which is not
>>> specific to this patch.  In the inner-most "for(;;)" loop, it appears
>>> that it only legitimately breaks if it runs into a TAG_NULL:
>>>
>>>                        if (tag == NULL_TAG)
>>>                                 break;
>>>
>>> But there are a number of fread() failures in the loop that do an
>>> fprintf(ofp, ...) and simply "break".  If any of those failures occur,
>>> it looks like vmware_vmss_init() will return success.  But should it?
>>>
>>> Anyway, if you can clean up the error message handling, we can get this
>>> patch
>>> checked in.
>>>
>>> Thanks,
>>>   Dave
>>>
>>>
>>>> On 3/27/15 11:05 AM, Dave Anderson wrote:
>>>>> Dyno,
>>>>>
>>>>> I've got the sample multi-region dumpfile -- thanks for that.
>>>>>
>>>>> This latest patch tests OK on all three dumpfiles.  But can you please
>>>>> remove the remaining assert() calls as I requested in my previous email?
>>>>>
>>>>> Thanks,
>>>>>   Dave
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> Dave,
>>>>>>
>>>>>>   An assert i put caused the crash, workstation VM memory has more block
>>>>>>   item
>>>>>>   than i thought.
>>>>>> fixed it in the patch and tested against all the dump i have. i will try
>>>>>> to
>>>>>> provide a 4GB memory dump later.
>>>>>>
>>>>>> ESX VM 3G Memory
>>>>>> ================
>>>>>> - Group: memory pos=0x1f6f6 size=0xc000090c
>>>>>> ------------------------------------
>>>>>> align_mask[0, 0]                         => 0x00ffff
>>>>>> regionsCount                             => 0x000000
>>>>>> Memory[0, 0]                             => BLOCK, pos=0x20000,
>>>>>> size=0xc0000000
>>>>>>
>>>>>> ESX VM 4G Memory
>>>>>> ================
>>>>>> - Group: memory pos=0x1f6f6 size=0x10000090c
>>>>>> -----------------------------------
>>>>>> align_mask[0, 0]                         => 0x00ffff
>>>>>> regionsCount                             => 0x000002
>>>>>> regionPageNum[0]                         => 0x000000
>>>>>> regionPPN[0]                             => 0x000000
>>>>>> regionSize[0]                            => 0x0c0000
>>>>>> regionPageNum[1]                         => 0x0c0000
>>>>>> regionPPN[1]                             => 0x100000
>>>>>> regionSize[1]                            => 0x040000
>>>>>> Memory[0, 0]                             => BLOCK, pos=0x20000,
>>>>>> size=0x100000000
>>>>>>
>>>>>> WS VM Memory
>>>>>> ============
>>>>>> - Group: memory pos=0x93b1 size=0x10098
>>>>>> ----------------------------------------
>>>>>> align_mask[0, 0]                         => 0x00ffff
>>>>>> regionsCount                             => 0x000000
>>>>>> hotSetSize                               => 0x040000
>>>>>> hotSet                                   => BLOCK, pos=0x9405,
>>>>>> size=0x8000
>>>>>> MainMemPageZeroStateSize                 => 0x040000
>>>>>> MainMemKnownZero                         => BLOCK, pos=0x11447,
>>>>>> size=0x8000
>>>>>>
>>>>>> rgds,
>>>>>> Dyno
>>>>>>
>>>>>> On 3/26/15 1:12 PM, Dave Anderson wrote:
>>>>>>> ----- Original Message -----
>>>>>>>> Dave,
>>>>>>>>   updated the patch and please review. thanks.
>>>>>>>>   - the page_size/page_shift problem.
>>>>>>>>   - change type cast to union.
>>>>>>>>   - the read_vmware_vmss() regression.
>>>>>>>>
>>>>>>>> rgds,
>>>>>>>> Dyno
>>>>>>> Dyno,
>>>>>>>
>>>>>>> Since you cannot make any additional sample vmss2core-generated
>>>>>>> dumpfiles available to me, I can only test this latest patch
>>>>>>> on the two dumpfile/kernel pairs that you gave me in February:
>>>>>>>   
>>>>>>>   vmlinux-2.6.32-431.el6 CentOS6.5-11bd56db.vmss
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>   vmlinux-3.13.0-39-generic Ubuntu1404_64bit-65993542.vmss
>>>>>>>   (with its companion Ubuntu1404_64bit-65993542.vmem file)
>>>>>>>
>>>>>>> The CentOS kernel works OK with your patch:
>>>>>>>   
>>>>>>>   $ crash vmlinux-2.6.32-431.el6 CentOS6.5-11bd56db.vmss
>>>>>>>   
>>>>>>>   crash 7.1.1rc13
>>>>>>>   Copyright (C) 2002-2014  Red Hat, Inc.
>>>>>>>   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>>>>>>>   Copyright (C) 1999-2006  Hewlett-Packard Co
>>>>>>>   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>>>>>>>   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>>>>>>>   Copyright (C) 2005, 2011  NEC Corporation
>>>>>>>   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>>>>>>>   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>>>>>>>   This program is free software, covered by the GNU General Public
>>>>>>>   License,
>>>>>>>   and you are welcome to change it and/or distribute copies of it under
>>>>>>>   certain conditions.  Enter "help copying" to see the conditions.
>>>>>>>   This program has absolutely no warranty.  Enter "help warranty" for
>>>>>>>   details.
>>>>>>>    
>>>>>>>   GNU gdb (GDB) 7.6
>>>>>>>   Copyright (C) 2013 Free Software Foundation, Inc.
>>>>>>>   License GPLv3+: GNU GPL version 3 or later
>>>>>>>   <https://urldefense.proofpoint.com/v2/url?u=http-3A__gnu.org_licenses_gpl.html&d=AwIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=g2Vka_25x09RSowRkQw8pA&m=iQzEiRx9c9mhk2AN2ZgUquvryNUKAR8a3H2BxGelV-8&s=UTmIfMl3-60oCUaY2i-0rZlb-o78kBVx6f9F90s9r4Q&e=
>>>>>>>   >
>>>>>>>   This is free software: you are free to change and redistribute it.
>>>>>>>   There is NO WARRANTY, to the extent permitted by law.  Type "show
>>>>>>>   copying"
>>>>>>>   and "show warranty" for details.
>>>>>>>   This GDB was configured as "x86_64-unknown-linux-gnu"...
>>>>>>>   
>>>>>>>         KERNEL: vmlinux-2.6.32-431.el6
>>>>>>>       DUMPFILE: CentOS6.5-11bd56db.vmss
>>>>>>>           CPUS: 4
>>>>>>>           DATE: Tue Feb  3 18:22:03 2015
>>>>>>>         UPTIME: 00:01:06
>>>>>>>   LOAD AVERAGE: 0.71, 0.22, 0.07
>>>>>>>          TASKS: 302
>>>>>>>       NODENAME: promd-1s-dhcp37.eng.vmware.com
>>>>>>>        RELEASE: 2.6.32-431.el6.x86_64
>>>>>>>        VERSION: #1 SMP Fri Nov 22 03:15:09 UTC 2013
>>>>>>>        MACHINE: x86_64  (2394 Mhz)
>>>>>>>         MEMORY: 511.5 MB
>>>>>>>          PANIC: ""
>>>>>>>            PID: 0
>>>>>>>        COMMAND: "swapper"
>>>>>>>           TASK: ffffffff81a8d020  (1 of 4)  [THREAD_INFO:
>>>>>>>           ffffffff81a00000]
>>>>>>>            CPU: 0
>>>>>>>          STATE: TASK_RUNNING (ACTIVE)
>>>>>>>        WARNING: panic task not found
>>>>>>>   
>>>>>>>   crash>
>>>>>>>   
>>>>>>> But the Ubuntu1404_64bit-65993542.vmss fails miserably:
>>>>>>>
>>>>>>>   $ crash Ubuntu1404_64bit-65993542.vmss vmlinux-3.13.0-39-generic
>>>>>>>   
>>>>>>>   crash 7.1.1rc13
>>>>>>>   Copyright (C) 2002-2014  Red Hat, Inc.
>>>>>>>   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>>>>>>>   Copyright (C) 1999-2006  Hewlett-Packard Co
>>>>>>>   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>>>>>>>   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>>>>>>>   Copyright (C) 2005, 2011  NEC Corporation
>>>>>>>   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>>>>>>>   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>>>>>>>   This program is free software, covered by the GNU General Public
>>>>>>>   License,
>>>>>>>   and you are welcome to change it and/or distribute copies of it under
>>>>>>>   certain conditions.  Enter "help copying" to see the conditions.
>>>>>>>   This program has absolutely no warranty.  Enter "help warranty" for
>>>>>>>   details.
>>>>>>>    
>>>>>>>   crash: vmware_vmss.c:169: vmware_vmss_init: Assertion `__extension__
>>>>>>>   ({
>>>>>>>   size_t __s1_len, __s2_len; (__builtin_constant_p (name) &&
>>>>>>>   __builtin_constant_p ("Memory") && (__s1_len = strlen (name),
>>>>>>>   __s2_len
>>>>>>>   =
>>>>>>>   strlen ("Memory"), (!((size_t)(const void *)((name) + 1) -
>>>>>>>   (size_t)(const void *)(name) == 1) || __s1_len >= 4) &&
>>>>>>>   (!((size_t)(const void *)(("Memory") + 1) - (size_t)(const void
>>>>>>>   *)("Memory") == 1) || __s2_len >= 4)) ? __builtin_strcmp (name,
>>>>>>>   "Memory") : (__builtin_constant_p (name) && ((size_t)(const void
>>>>>>>   *)((name) + 1) - (size_t)(const void *)(name) == 1) && (__s1_len =
>>>>>>>   strlen (name), __s1_len < 4) ? (__builtin_constant_p ("Memory") &&
>>>>>>>   ((size_t)(const void *)(("Memory") + 1) - (size_t)(const void
>>>>>>>   *)("Memory") == 1) ? __builtin_strcmp (name, "Memory") :
>>>>>>>   (__extension__
>>>>>>>   ({ __const unsigned char *__s2 = (__const unsigned char *) (__const
>>>>>>>   char
>>>>>>>   *) ("Memory"); register int __result = (((__const unsigned char *)
>>>>>>>   (__const char *) (name))[0] - __s2[0]); if (__s1_len > 0 && __result
>>>>>>>   ==
>>>>>>>   0)
>>>>>> { __result = (((__const unsigned char *) (__const char *) (name))[1] -
>>>>>> __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((__const
>>>>>> unsigned char *) (__const char *) (name))[2] - __s2[2]); if (__s1_len >
>>>>>> 2
>>>>>> &&
>>>>>> __result == 0) __result = (((__const unsigned char *) (__const char *)
>>>>>> (name))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p
>>>>>> ("Memory")
>>>>>> && ((size_t)(const void *)(("Memory") + 1) - (size_t)(const void
>>>>>> *)("Memory") == 1) && (__s2_len = strlen ("Memory"), __s2_len < 4) ?
>>>>>> (__builtin_constant_p (name) && ((size_t)(const void *)((name) + 1) -
>>>>>> (size_t)(const void *)(name) == 1) ? __builtin_strcmp (name, "Memory") :
>>>>>> (__extension__ ({ __const unsigned char *__s1 = (__const unsigned char
>>>>>> *)
>>>>>> (__const char *) (name); register int __result = __s1[0] - ((__const
>>>>>> unsigned char *) (__const char *) ("Memory"))[0]; if (__s2_len > 0 &&
>>>>>> __result == 0) { __result = (__s1[1] - ((__const unsigned char *)
>>>>>> (__const
>>>>>> char *) ("Memory"))[1]); if (__s2_len > 1 && __result == 0) { __resul
>>>>>> t = (__s1[2] - ((__const unsigned char *) (__const char *)
>>>>>> ("Memory"))[2]);
>>>>>> if (__s2_len > 2 && __result == 0) __result = (__s1[3] - ((__const
>>>>>> unsigned
>>>>>> char *) (__const char *) ("Memory"))[3]); } } __result; }))) :
>>>>>> __builtin_strcmp (name, "Memory")))); }) == 0' failed.
>>>>>>>   Aborted (core dumped)
>>>>>>>   $
>>>>>>>   
>>>>>>> Note that crash-7.1.0 works OK:
>>>>>>>   
>>>>>>>   $ /usr/bin/crash Ubuntu1404_64bit-65993542.vmss
>>>>>>>   vmlinux-3.13.0-39-generic
>>>>>>>   
>>>>>>>   crash 7.1.0
>>>>>>>   Copyright (C) 2002-2014  Red Hat, Inc.
>>>>>>>   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>>>>>>>   Copyright (C) 1999-2006  Hewlett-Packard Co
>>>>>>>   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>>>>>>>   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>>>>>>>   Copyright (C) 2005, 2011  NEC Corporation
>>>>>>>   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>>>>>>>   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>>>>>>>   This program is free software, covered by the GNU General Public
>>>>>>>   License,
>>>>>>>   and you are welcome to change it and/or distribute copies of it under
>>>>>>>   certain conditions.  Enter "help copying" to see the conditions.
>>>>>>>   This program has absolutely no warranty.  Enter "help warranty" for
>>>>>>>   details.
>>>>>>>    
>>>>>>>   vmw: Memory dump is not part of this vmss file.
>>>>>>>   vmw: Try to locate the companion vmem file ...
>>>>>>>   vmw: vmem file: Ubuntu1404_64bit-65993542.vmem
>>>>>>>   
>>>>>>>   GNU gdb (GDB) 7.6
>>>>>>>   Copyright (C) 2013 Free Software Foundation, Inc.
>>>>>>>   License GPLv3+: GNU GPL version 3 or later
>>>>>>>   <https://urldefense.proofpoint.com/v2/url?u=http-3A__gnu.org_licenses_gpl.html&d=AwIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=g2Vka_25x09RSowRkQw8pA&m=iQzEiRx9c9mhk2AN2ZgUquvryNUKAR8a3H2BxGelV-8&s=UTmIfMl3-60oCUaY2i-0rZlb-o78kBVx6f9F90s9r4Q&e=
>>>>>>>   >
>>>>>>>   This is free software: you are free to change and redistribute it.
>>>>>>>   There is NO WARRANTY, to the extent permitted by law.  Type "show
>>>>>>>   copying"
>>>>>>>   and "show warranty" for details.
>>>>>>>   This GDB was configured as "x86_64-unknown-linux-gnu"...
>>>>>>>   
>>>>>>>         KERNEL: vmlinux-3.13.0-39-generic
>>>>>>>       DUMPFILE: Ubuntu1404_64bit-65993542.vmss
>>>>>>>           CPUS: 1
>>>>>>>           DATE: Thu Nov 13 14:10:53 2014
>>>>>>>         UPTIME: 2 days, 03:40:33
>>>>>>>   LOAD AVERAGE: 0.00, 0.01, 0.05
>>>>>>>          TASKS: 669
>>>>>>>       NODENAME: ubuntu
>>>>>>>        RELEASE: 3.13.0-39-generic
>>>>>>>        VERSION: #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014
>>>>>>>        MACHINE: x86_64  (2693 Mhz)
>>>>>>>         MEMORY: 1 GB
>>>>>>>          PANIC: ""
>>>>>>>            PID: 0
>>>>>>>        COMMAND: "swapper/0"
>>>>>>>           TASK: ffffffff81c15480  [THREAD_INFO: ffffffff81c00000]
>>>>>>>            CPU: 0
>>>>>>>          STATE: TASK_RUNNING
>>>>>>>        WARNING: panic task not found
>>>>>>>   
>>>>>>>   crash>
>>>>>>>   
>>>>>>> Anyway, besides fixing whatever the problem is, please remove the
>>>>>>> assert()
>>>>>>> calls entirely.  They did not exist in the original vmware_vmss.c file,
>>>>>>> and shouldn't be added now.
>>>>>>>
>>>>>>> assert() is not used by the top-level crash sources (except for qemu.c
>>>>>>> and qemu-load.c, which came from a 3rd party, and I didn't feel like
>>>>>>> changing
>>>>>>> all of them).  Instead, please use the crash convention by testing for
>>>>>>> the
>>>>>>> anomoly, and then send an appropriate error message to error(FATAL,
>>>>>>> ...),
>>>>>>> which will kill the crash session if it's during session
>>>>>>> initialization,
>>>>>>> or kill the current command if it's during crash runtime.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>   Dave
>>>>>>>
>>




More information about the Crash-utility mailing list