[Crash-utility] [PATCH] x86_64_exception_frame only performs EFRAME_VERIFY if it is the only flag

David Mair dmair at suse.com
Wed Sep 2 19:35:50 UTC 2020


On 9/1/20 8:35 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> -----Original Message-----
>> Calls to x86_64_exception_frame() with combined items set in the flags
>> argument that include EFRAME_VERIFY do not have the EFRAME_VERIFY
>> operation performed. I have some cores where multiple cases of
>> attempting to read a not-present pt_regs end a single PID backtrace with
>> a failure. One instance has the pt_regs read overrunning stacktop
>> because the pt_regs is not present and the level's stack position is
>> closer to stacktop than the size of a pt_regs. That results in a
>> backtrace failing before complete with a "seek error" at the start of
>> page after stacktop:
>>
>> crash> bt 7456
>> PID: 7456   TASK: ffff933fdb960000  CPU: 0   COMMAND: "sh"
>>  #0 [fffffe0000009e58] crash_nmi_callback at ffffffff93260e93
>> ...
>>  #9 [ffffaea5c0003f80] hrtimer_interrupt at ffffffff933313d5
>> bt: seek error: kernel virtual address: ffffaea5c0004000  type: "pt_regs"
>> crash>
>>
>> The correct backtrace would reach level #12 with no seek error.
>>
>> The condition to perform the EFRAME_VERIFY operation tests if the flags
>> value equals EFRAME_VERIFY, not if the value includes EFRAME_VERIFY. The
>> call to x86_64_exception_frame() in x86_64_print_stack_entry() performed
>> when eframe_check >= 0 supplies a flags value of EFRAME_PRINT |
>> EFRAME_VERIFY.
>>
>> In the bt example above backtrace reaches level #9, 128 bytes from the
>> top of the current stack's pages in an IRQ stack and with a function
>> name ending in "_interrupt". This leads to x86_64_print_stack_entry()
>> setting eframe_check to zero and x86_64_exception_frame() being called
>> with flags EFRAME_PRINT | EFRAME_VERIFY. x86_64_exception_frame()
>> doesn't perform the verify because flags is not just EFRAME_VERIFY. An
>> attempt is made to read 168 bytes (SIZE(pt_regs) - 8 bytes) from a
>> position 128 bytes from the top of the stack. The stack in question is
>> followed by a not-present page and the read fails attempting to read
>> from the page following stacktop.
>>
>> Signed-off-by: David Mair <dmair at suse.com>
> 
> Thanks for the patch.
> but it seems that it stops "bt -e" and "bt -E" options working, can we
> keep them?

Yes, they seem to be broken anyway, I'll re-post an
x86_64_exception_frame() EFRAME_VERIFY fix later when I can combine it
with a bt -e/-E solution.

I see the bt -e case fail because x86_64_eframe_search() walks up the
userspace stack in ulongs sized steps with the loop count based on the
kernel stack size in ulongs. x86_64_eframe_search() does this for each
loop iteration:

if (x86_64_exception_frame(EFRAME_SEARCH|EFRAME_PRINT|EFRAME_VERIFY, 0,
(char *)up, bt, fp))

The multiple flags mean crash master doesn't do the requested
EFRAME_VERIFY (per my previous post). If the EFRAME_VERIFY is performed
then the second argument kvaddr being the zero constant would fail at
the beginning of x86_64_exception_frame() because the supplied kvaddr is
constant zero:

	if (flags & EFRAME_VERIFY) {
		if (!accessible(kvaddr) ||
		    !accessible(kvaddr + SIZE(pt_regs) - sizeof(long)))
			return FALSE;

basically, the supplied kvaddr argument ensures EFRAME_VERIFY will fail
before even trying an EFRAME_SEARCH and an EFRAME_PRINT will never be
reached. For every step of the for loop through the userspace stack.

The EFRAME_VERIFY block in x86_64_exception_frame() does not verify the
local argument. In current form, if EFRAME_VERIFY is actually performed,
you must supply a kvaddr to find/print an exception on the userspace
stack. Plus, the size assumed for the userspace stack by
x86_64_eframe_search() is the computed size of the kernel stack.

> Also, I could not apply the patch as is because of a few format issues.
> I think your MTA or something replaced tabs with spaces and inserted a
> new line between the "ulong kvaddr," and "char *local,".  We can fix
> this when applying but if you update the patch for the above comment,
> please fix it as well (if possible).  It would be helpful.

I'm sorry, a clipboard copy from git diff output in a console and paste
into an e-mail, my error. I'll re-post it correctly when I have a bt
-e/-E fix.

Thanks,
David.





More information about the Crash-utility mailing list