[Crash-utility] [PATCH] crash: ARM: support LPAE

Thu Jun 12 13:10:29 UTC 2014

于 2014/6/10 21:57, Dave Anderson 写道:
> 
> 
> ----- Original Message -----
>>>
>>> Given that the vmcore indentifies itself as a kdump ELF vmcore that contains
>>> a VMCOREINFO note, it would seem that the kdump facility "just works" with
>>> an ARM PAE kernel as long as the physical memory can be contained within
>>> 4GB?
>>
>> On our platfrom I have test physical memory beyond 4GB. And it worked fine.
>> Maybe my test is not all-around. I will try to do that on qemu, which I failed
>> to do last time.
> 
> Yeah, it looks like as long as the beginning of the highest physical memory
> PT_LOAD segment *begins* before the 4GB mark, then it would work OK.  However,
> the 32-bit ELF Elf32_Phdr.p_paddr field is an 32-bit value that cannot contain
> a physical address that is 4GB or larger. 
> 
>>> But if the system contained physical memory beyond 4GB, then it would
>>> require a 64-bit ELF header, and therefore your recent changes to kexec-tools,
>>> correct?  In addition, it would require update to the crash utility's netdump.c
>>> is_netdump() function to to accept 64-bit ELF headers for EM_ARM vmcores.
>>
>>
>> Yes, kexec is ready for LPAE now.
>>
>> Maybe I can try to do that. But perhaps I can't test it because
>> of lack of entironment.
> 
> I believe the 32-bit vs. 64-bit ELF header is configurable, correct?

Now if max phyiscal address of ARM platform execceding 4G, kexec
creates 64-bit ELF header. Otherwise, it creates 32-bit ELF header.

> On RHEL, by default we configure 64-bit ELF headers for 32-bit x86 
> machines regardless of their memory size.  So you should be able to
> create a vmcore with a 64-bit ELF header on a system that has less
> than 4GB of physical memory.

For now the ARM kernel can not parse 64-bit ELF header without my patch
locating at "https://lkml.org/lkml/2014/5/3/63". So perhaps we can not
create a vmcore with 64-bit ELF header for ARM. Otherwise all old
ARM kernels can not generate vmcores correctly.

> 
> But as I mentioned above, there will need to be at least one fix for
> the crash utility, because it will fail at line 258 of netdump.c.
> To accept 64-bit ARM headers, there would need to be a additional
> case statement like this:
> 
>                  case EM_ARM:
>                         if (machine_type_mismatch(file, "ARM", NULL,
>                             source_query))
>                                 goto bailout;
>                         break;
> 
> I'm not sure whether any other fixes would be required?

On our platform, I usually use makedumpfile to reduce vmcore.
Then it comes to diskdump format.

But when I dealed with original vmcores. This error occurs:

.....
WARNING: machine type mismatch:
         crash utility: ARM
         ../github/1381_4/vmcore: (unknown)

crash: ../github/1381_4/vmcore: not a supported file format

When fully checking is done, I will send the related patches.
Sorry for this mistake!

> 
>>
>>>
>>> Also, w/respect to this commit:
>>>       
>>>   commit 56b700fd6f1e49149880fb1b6ffee0dca5be45fb
>>>   Author: Liu Hua <sdu.liu at huawei.com>
>>>   Date:   Fri Apr 18 07:45:36 2014 +0100
>>>   
>>>       ARM: 8030/1: ARM : kdump : add arch_crash_save_vmcoreinfo
>>>       
>>>       For vmcore generated by LPAE enabled kernel, user space
>>>       utility such as crash needs additional infomation to
>>>       parse.
>>>       
>>>       So this patch add arch_crash_save_vmcoreinfo as what PAE enabled
>>>       i386 linux does.
>>>       
>>>       Cc: <stable at vger.kernel.org>
>>>       Reviewed-by: Will Deacon <will.deacon at arm.com>
>>>       Signed-off-by: Liu Hua <sdu.liu at huawei.com>
>>>       Signed-off-by: Russell King <rmk+kernel at arm.linux.org.uk>
>>>   
>>>   diff --git a/arch/arm/kernel/machine_kexec.c
>>>   b/arch/arm/kernel/machine_kexec.c
>>>   index f0d180d..8cf0996 100644
>>>   --- a/arch/arm/kernel/machine_kexec.c
>>>   +++ b/arch/arm/kernel/machine_kexec.c
>>>   @@ -184,3 +184,10 @@ void machine_kexec(struct kimage *image)
>>>    
>>>           soft_restart(reboot_entry_phys);
>>>    }
>>>   +
>>>   +void arch_crash_save_vmcoreinfo(void)
>>>   +{
>>>   +#ifdef CONFIG_ARM_LPAE
>>>   +       VMCOREINFO_CONFIG(ARM_LPAE);
>>>   +#endif
>>>   +}
>>>   
>>> I note that the sample vmcore you sent me does not have the ARM_LPAE vmcoreinfo
>>> item, and that your patch doesn't require/check it.  Was it your intention
>>> to use the above as determining factor for setting the "PAE" bit?
>>
>> The kernel version I used is 3.13. So it does not contained this infomation.
>> At the begining I used this vmcoreinfo, But I found a better way to indentify
>> the LPAE enabled kernel. PG_DIR_SIZE of LPAE enabled kernel is larger than
>> that of the normal(0x5000 : 0x4000). What do you thank about it?
> 
> That was my only concern regarding the patchset, because it presumes that
> the difference will be either 0x4000 or 0x5000.  But that's not necessarily
> true, at least on older kernels.  For example, here are the values seen
> in my small sample set of ARM dumpfiles, showing the kernel release along with
> the values of the "swapper_pg_dir" and "_text" symbols, and the difference
> between the two:
>   
>          RELEASE: 2.6.35-rc3-00272-gd189df4
>   swapper_pg_dir: c0004000
>            _text: c002c000
>                   (28000)
>   
>          RELEASE: 2.6.38-rc2-00274-g1f0324c-dirty
>   swapper_pg_dir: c0004000
>            _text: c0050000
>                   (4c000)
>   
>          RELEASE: 2.6.36-rc6-next-20101005-00033-g5d269a5-dirty
>   swapper_pg_dir: c0004000
>            _text: c01d3000
>                   (1cf000)
>   
>          RELEASE: 3.1.1
>   swapper_pg_dir: c0004000
>            _text: c0008000
>                   (4000)
>   
>          RELEASE: 3.1.1
>   swapper_pg_dir: c0004000
>            _text: c0008000
>                   (4000)
>   
>          RELEASE: 3.0.8+
>   swapper_pg_dir: c0004000
>            _text: c0108000
>                   (104000)
>   
>          RELEASE: 3.13.5     <-- your LPAE kernel
>   swapper_pg_dir: 80003000
>            _text: 80008000
>                   (5000)
>   
>          RELEASE: 3.0.8+
>   swapper_pg_dir: c0004000
>            _text: c0108000
>                   (104000)
>   
>          RELEASE: 3.1.1
>   swapper_pg_dir: c0004000
>            _text: c0008000
>                   (4000)
>   
>          RELEASE: 3.1.1
>   swapper_pg_dir: c0004000
>            _text: c0008000
>                   (4000)
>   
> Note that in some earlier kernels, the "_text" symbol is often much
> higher.  But I presume that it would be highly unlikely that the difference
> would ever be 0x5000 in an older kernel -- so until somebody reports a
> problem, it seems OK to do it that way.
> 
> However, just in case the layout changes in the future, there should be
> a fail-safe check for the VMCOREINFO_CONFIG(ARM_LPAE) in arm_init(),
> that does something like this:
> 
>     if ((string = pc->read_vmcoreinfo("CONFIG_ARM_LPAE"))) {
>             machdep->flags |= PAE;
>             free(string);
>     } else 
>             [check for 0x5000 difference]
> 
> There's really no need to check for the "y" contents of the string, because
> if the entry exists, then CONFIG_ARM_LPAE is configured.

Yes, you advice is much better. We should add this. And perhaps we should also find
another way to recognise LPAE enableed vmcores for old kernels, rather than pg_dir_size.

BTW, what do you think about parseing big endian vmcores(such as ARMEB vmcores) on
x86-64 host?

Thanks,
Liu Hua

>  
>>> In any case, thanks for the vmlinux/vmcore pair, which moves us part of the way
>>> towards supporting LPAE -- with support for 64-bit ELF headers to be addressed in
>>> the future.
>>
>> Thanks to your agreement. I will work on this issue continually.
> 
> Great -- again, I really appreciate your help.
> 

> Thanks,
>   Dave
> 
> .
>