[dm-devel] [PATCH v2] hex2bin: make the function hex_to_bin constant-time

Stafford Horne shorne at gmail.com
Wed May 11 12:17:38 UTC 2022


On Sun, May 08, 2022 at 09:37:26AM +0900, Stafford Horne wrote:
> On Thu, May 05, 2022 at 05:38:58AM +0900, Stafford Horne wrote:
> > On Wed, May 04, 2022 at 01:10:03PM -0700, Linus Torvalds wrote:
> > > On Wed, May 4, 2022 at 12:58 PM Stafford Horne <shorne at gmail.com> wrote:
> > > >
> > > > I have uploaded a diff I created here:
> > > >   https://gist.github.com/54334556f2907104cd12374872a0597c
> > > >
> > > > It shows the same output.
> > > 
> > > In hex_to_bin itself it seems to only be a difference due to some
> > > register allocation (r19 and r3 switched around).
> > > 
> > > But then it gets inlined into hex2bin and there changes there seem to
> > > be about instruction and basic block scheduling, so it's a lot harder
> > > to see what's going on.
> > > 
> > > And a lot of constant changes, which honestly look just like code code
> > > moved around by 16 bytes and offsets changed due to that.
> > > 
> > > So I doubt it's hex_to_bin() that is causing problems, I think it's
> > > purely code movement. Which explains why adding a nop or a fake printk
> > > fixes things.
> > > 
> > > Some alignment assumption that got broken?
> > 
> > This is what it looks like to me too.  I will have to do a deep dive on what is
> > going on with this particular build combination as I can't figure out what it is
> > off the top of my head.
> > 
> > This test is using a gcc 11 compiler, I tried with my gcc 12 toolchain and the
> > issue cannot be reproduced.
> > 
> >   - musl gcc 11 - https://musl.cc/or1k-linux-musl-cross.tgz
> >   - openrisc gcc 12 - https://github.com/openrisc/or1k-gcc/releases/tag/or1k-12.0.1-20220210-20220304
> > 
> > But again the difference between the two compiler outputs is a lot of register
> > allocation and offsets changes.  Its not easy to see anything that stands out.
> > I checked the change log for the openrisc specific changes from gcc 11 to gcc
> > 12.  Nothing seems to stand out, mcount profiler fix for PIC, a new large binary
> > link flag.
> 
> Hello,
> 
> Just an update on this.  The issue so far has been traced to the alignment of
> the crypto multiplication function fe_mul_impl in lib/crypto/curve25519-fiat32.c.
> 
> This patch e5be15767e7e ("hex2bin: make the function hex_to_bin constant-time")
> allowed for the alignment to be just right to cause the failure.  Without
> this patch and forcing the alignment we can reproduce the issue.  So though the
> bisect is correct, this patch is not the root cause of the issue.
> 
> Using some l.nop sliding techniques and some strategically placed .align
> statements I have been able to reproduce the issue on:
> 
>   - gcc 11 and gcc 12
>   - preempt and non-preempt kernels
> 
> I have not been able to reproduce this on my FPGA, so far only QEMU.  My
> hunch now is that since the fe_mul_impl function contains some rather long basic
> blocks it appears that timer interrupts that interrupt qemu mid basic block
> execution somehow is causing this.  The hypothesis is it may be basic block
> resuming behavior in qemu that causes incosistent behavior.
> 
> It will take a bit more time to trace this.  Since I maintain OpenRISC QEMU the
> issue is on me.
> 
> Again, It's safe to say that patch e5be15767e7e ("hex2bin: make the function
> hex_to_bin constant-time") is not an issue.

This issue has been fixed.  I sent a patch to QEMU for it:

 - https://lore.kernel.org/qemu-devel/20220511120541.2242797-1-shorne@gmail.com/T/#u

The issue was a bug in the OpenRISC emulation in QEMU which was triggered when
receiving a TICK TIMER interrupt, in a delay slot, on a page boundary.

The fix was simple enough, but investigation took quite some work.

Thanks,

-Stafford



More information about the dm-devel mailing list