[Crash-utility] 答复: Re: [patch]Crash can't process xen dump core files larger that 4GB.

Thu Feb 4 16:44:31 UTC 2010

----- "xiaowei hu" <xiaowei.hu at oracle.com> wrote:

> Hi all,
> 
> There is a bug when using crash to process the xen domU dump core that
> larger that 4GB(it is found at processing a 10GB guest core dump file).
> crash reporting this errors:
> crash: cannot find mfn 8392757 (0x801035) in page index               
>  
> 
> crash: cannot read/find cr3 page
> 
> this is caused by a var overflow,in the structure of 
> typedef struct xc_core_header { 
>      unsigned int xch_magic; 
>      unsigned int xch_nr_vcpus; 
>      unsigned int xch_nr_pages; 
>      unsigned int xch_ctxt_offset; 
>      unsigned int xch_index_offset; 
>      unsigned int xch_pages_offset; 
> } xc_core_header_t;
> 
> the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the
> offsets in the core dump file , when it is defined as unsingend
> long ,that means the file can't be more that 4GB,so when processing with
> core dump files that more than 4GB may have error (I encountered
> overflow on that 10GB file),so changing those offset vars to unsigned
> long ,make sure crash can seek to the right position.
> btw,please reply directly to me ,I am not in the mail list.
> 
> 
> Signed-off-by: Xiaowei Hu <xiaowei.hu at oracle.com>
> 
> 
> diff -up crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h
> --- crash-5.0.0/xendump.h.org	2010-02-04 03:48:04.000000000 +0800
> +++ crash-5.0.0/xendump.h	2010-02-04 05:41:27.000000000 +0800
> @@ -28,9 +28,9 @@ typedef struct xc_core_header {
>      unsigned int xch_magic;
>      unsigned int xch_nr_vcpus;
>      unsigned int xch_nr_pages;
> -    unsigned int xch_ctxt_offset;
> -    unsigned int xch_index_offset;
> -    unsigned int xch_pages_offset;
> +    unsigned long xch_ctxt_offset;
> +    unsigned long xch_index_offset;
> +    unsigned long xch_pages_offset;
>  } xc_core_header_t;
>  
>  struct pfn_offset_cache {

>First question -- are you saying that the change above works for you?

yes, this change works for me on a 10GB dump core file,whose .xen_p2m segment's offset at
0x280005000, this offset can't be stored in a unsinged int var.

>And second -- in your dumpfile, even with 10GB of memory, wouldn't
>the base offset value of all three indexes still fit well below
>the 4GB mark?

actually from the xen-dump-core document the .xen_p2m segment should be located before 
the .xen_pages segment, in this order ,there is should not have problem.
but according the segment table read by readelf,I found the core dump file have the xen_p2m 
segment located at offset 0x2800025000 after the .xen_pages segment,beyond the 4GB mark.

>The xc_core_header in crash is a copy of that found in "tools/libxc/xenctrl.h",
>and is presumptively the beginning/header of the dumpfile.  And so making the
>wholesale change above breaks all earlier (?) versions.  

>But what is confusing is that the latest/final version of "xenctrl.h" used in RHEL5
>(3.0.3 vintage), as well as the current version in Fedora (3.4.0-2.fc12) still use
>unsigned int offsets, and I just checked with one of our xen masters, and the Xensource
>git tree also still has unsigned int values in the header data structure: 

>typedef struct xc_core_header {
>    unsigned int xch_magic;
>    unsigned int xch_nr_vcpus;
>    unsigned int xch_nr_pages;
>    unsigned int xch_ctxt_offset;
>    unsigned int xch_index_offset;
>    unsigned int xch_pages_offset;
>} xc_core_header_t;

>#define XC_CORE_MAGIC     0xF00FEBED
>#define XC_CORE_MAGIC_HVM 0xF00FEBEE

>Are your xen userspace tools an Oracle hybrid?

yes, the core dump file is generated on oracle virtualization server.But I did not check the ovm 
source code for changes of this header data structure.will check it and replay again tommorrow.

>Dave

thanks 
xiaowei