[Crash-utility] Can't read stack contents from qemu dump

Nikolay Borisov nborisov at suse.com
Wed Apr 4 15:50:42 UTC 2018



On  4.04.2018 18:48, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
>>
>>
>> On  4.04.2018 17:48, Dave Anderson wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> Hello,
>>>>
>>>> I tried running crash-head (HEAD: 5d172b230cf4) against today's linus'
>>>> master on a dump obtained via dump-guest-memory in qemu. And I got the
>>>> following when the image is loaded:
>>>>
>>>> please wait... (determining panic task)
>>>> bt: read error: kernel virtual address: fffffe0000007000  type: "stack
>>>> contents"
>>>>
>>>>   KERNEL: vmlinux
>>>>     DUMPFILE: memory-verbatim.img
>>>>         CPUS: 1
>>>>         DATE: Wed Apr  4 16:36:47 2018
>>>>       UPTIME: 00:27:48
>>>> LOAD AVERAGE: 31.11, 17.80, 10.43
>>>>        TASKS: 145
>>>>     NODENAME: ubuntu-virtual
>>>>      RELEASE: 4.16.0-rc7-nbor
>>>>      VERSION: #570 SMP Wed Apr 4 16:03:44 EEST 2018
>>>>      MACHINE: x86_64  (3392 Mhz)
>>>>       MEMORY: 4 GB
>>>>        PANIC: ""
>>>>          PID: 0
>>>>      COMMAND: "swapper/0"
>>>>         TASK: ffffffff82016500  [THREAD_INFO: ffffffff82016500]
>>>>          CPU: 0
>>>>        STATE: TASK_RUNNING
>>>>      WARNING: panic task not found
>>>>
>>>> crash> bt
>>>> PID: 0      TASK: ffffffff82016500  CPU: 0   COMMAND: "swapper/0"
>>>>  #0 [ffffffff82003dc8] __schedule at ffffffff817ea059
>>>> bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop:
>>>> ffffffff82000000/ffffffff82002000 cpu: 0
>>>>
>>>>
>>>> So the kernel has been compiled with : gcc (Ubuntu
>>>> 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 which has retpoline enabled.
>>>>
>>>> I have KASLR disabled: # CONFIG_RANDOMIZE_BASE is not set and the kernel
>>>> is compiled with CONFIG_FRAME_POINTER=y .
>>>>
>>>> This scenario used to work around the 4.10 timeline. Am I doing
>>>> something wrong or crash still needs time to work on the latest upstream
>>>> kernel code?
>>>
>>> Presumably the latter.
>>>
>>> If you do a "task -R stack ffffffff82016500", I'm presuming that it
>>> shows the stack base address is ffffffff82000000.  And the looking at
>>> the stackbase/stacktop values, the crash utility is presuming an 8K stack:
>>>
>>>  bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop:
>>>  ffffffff82000000/ffffffff82002000 cpu: 0
>>>
>>> But the RSP is ffffffff82003dc8, which puts its beyond the 8K stack size,
>>> so I'm presuming that the kernel is actually using 16K stacks.  The most
>>> recent kernel I have is 4.16.0-0.rc6.git3.1.fc29.x86_64, which uses 16K
>>> stacks.
>>
>> This is correct, indeed the kernel size should be 16k. However...
>>
>>>
>>> Here is how the crash utility determines the stack size.  The x86_64
>>> stacksize
>>> starts out with a default size of 2 pages, as set here in
>>> x86_64_init(PRE_SYMTAB):
>>>
>>>        case PRE_SYMTAB:
>>> 		... [ cut ] ...
>>>                 machdep->stacksize = machdep->pagesize * 2;
>>>                 ...
>>>
>>> Then later on in task_init(), it gets resized as shown here, where
>>> the STACKSIZE() macro is machdep->stacksize:
>>>
>>>         if (VALID_SIZE(task_union) && (SIZE(task_union) != STACKSIZE())) {
>>>                 error(WARNING, "\nnon-standard stack size: %ld\n",
>>>                         len = SIZE(task_union));
>>>                 machdep->stacksize = len;
>>>         } else if (VALID_SIZE(thread_union) &&
>>>                 ((len = SIZE(thread_union)) != STACKSIZE()))
>>>                 machdep->stacksize = len;
>>
>> This is not resized at all, instead VALID_SIZE(thread_union) actually
>> fails, I've added the following else to the if statement there :
>>
>> +       } else {
>> +               if (VALID_SIZE(thread_union)) {
>> +               error(WARNING, "WE ARE IN THE ELSE BRANCH: len: %llu thread_union size: %llu STACKSIZE(): %llu\n",
>> +                     len, SIZE(thread_union), STACKSIZE());
>> +               } else {
>> +               error(WARNING, "thread_union is invalid\n");
>> +               }
>> +       }
>>
>> Also doing:
>>
>> crash> struct thread_union
>> struct: invalid data structure reference: thread_union
> 
> BTW, that command should fail -- it should be "union thread_union".
> But as you've shown below, it's not finding it in the debuginfo.
>  
>> So for some reason the thread_union cannot be found by gdb:
>>
>> help -o | grep thread_union
>>                   thread_union: -1
> 
> I can't explain why.  It's still declared in "include/linux/sched.h"
> in today's linux-git tree:
> 
>   union thread_union {
>   #ifndef CONFIG_ARCH_TASK_STRUCT_ON_STACK
>           struct task_struct task;
>   #endif
>   #ifndef CONFIG_THREAD_INFO_IN_TASK
>           struct thread_info thread_info;
>   #endif
>           unsigned long stack[THREAD_SIZE/sizeof(long)];
>   };
> 
> If you run "gdb vmlinux", does it find it?  For example:
> 
>   (gdb) ptype union thread_union
>   Python Exception <type 'exceptions.ImportError'> No module named gdb.types: 
>   type = union thread_union {
>       struct task_struct task;
>       unsigned long stack[2048];
>   }
>   (gdb)

(gdb) ptype union thread_union
No union type named thread_union.


Next thing to try will be a different compiler.

> 
> Dave
> 
> 
> 
> 
> 




More information about the Crash-utility mailing list