[Libguestfs] libnbd golang failure on RISC-V

Daniel P. Berrangé berrange at redhat.com
Thu Jun 9 16:00:46 UTC 2022


On Thu, Jun 09, 2022 at 04:48:34PM +0100, Richard W.M. Jones wrote:
> On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote:
> > On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrangé wrote:
> > > > + go test -count=1 -v
> > > > === RUN   Test010Load
> > > > --- PASS: Test010Load (0.00s)
> > > > === RUN   TestAioBuffer
> > > > --- PASS: TestAioBuffer (0.00s)
> > > > === RUN   TestAioBufferFree
> > > > --- PASS: TestAioBufferFree (0.00s)
> > > > === RUN   TestAioBufferBytesAfterFree
> > > > SIGABRT: abort
> > > > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610
> > > 
> > > So suggesting TestAioBufferBytesAfterFree is as fault, but quite
> > > odd as that test case is trivial and whle it allocates some
> > > native memory it doesn't seem to write to it. Unless the problem
> > > happened in an earlier test case and we have delayed detection ?
> > > 
> > > I guess I'd try throwing darts at the wall by chopping out bits
> > > of test code to see what makes it disappear.
> > > 
> > > Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero
> > > in case pre-existing data into the C.malloc()d block is confusing
> > > Go ? 
> > 
> > Interestingly if I remove libnbd_020_aio_buffer_test.go completely,
> > and disable GODEBUG, then the tests pass.  (Reproducer commands at end
> > of email).  So I guess at least one of the problems is confined to
> > this test and/or functions it calls in the main library.
> > Unfortunately this test is huge.
> > 
> > At your suggestion, replacing every MakeAioBuffer with
> > MakeAioBufferZero in that test, but it didn't help.  Also tried
> > replacing malloc -> calloc in the aio_buffer.go implementation which
> > didn't help.
> > 
> > I'll try some more random things ...
> 
> Adding a few printf's shows something interesting:
> 
> === RUN   TestAioBufferBytesAfterFree
> calling Free on 0x3fbc1882b0
> calling C.GoBytes on 0x3fbc1882b0
> SIGABRT: abort
> PC=0x3fe6aaebac m=0 sigcode=18446744073709551610
> 
> goroutine 21 [running]:
> gsignal
>         :0
> abort
>         :0
> runtime.throwException
>         ../../../libgo/runtime/go-unwind.c:128
> runtime.unwindStack
>         ../../../libgo/go/runtime/panic.go:535
> panic
>         ../../../libgo/go/runtime/panic.go:750
> runtime.panicmem
>         ../../../libgo/go/runtime/panic.go:210
> runtime.sigpanic
>         ../../../libgo/go/runtime/signal_unix.go:634
> _wordcopy_fwd_aligned
>         :0
> __GI_memmove
>         :0
> runtime.stringtoslicebyte
>         ../../../libgo/go/runtime/string.go:155
> __go_string_to_byte_array
>         ../../../libgo/go/runtime/string.go:509
> _cgo_23192bdcbd72_Cfunc_GoBytes
>         ./cgo-c-prolog-gccgo:46
> 
> This is a simple use after free because the Free function in
> aio_buffer.go frees the array and then the Bytes function attempts to
> copy b.Size bytes from the NULL pointer.

Well it isn't use-after-free, because we've cleared the
pointer we freed.

Rather we're simply trying to reference the NULL pointer,

> I didn't write this test so I'm not quite sure what it's trying to
> achieve.  It seems to be deliberately trying to cause a panic, but
> causes a segfault instead?  (And why only on RISC-V?)

IIUC, the kernel will map a page without read/write perms
at address 0x0 in userspace. Thus a NULL pointer reference
causes a SEGV to userspace. Golang tries to catch this
SEGV and turn it into a panic I assume.

Assuming the kernel isn't doing something wierd on Risc-V
with the userspace mapping at 0x0, then points to the
Golang SEGV handler on RISCV.

> 
>   func TestAioBufferBytesAfterFree(t *testing.T) {
>         buf := MakeAioBuffer(uint(32))
>         buf.Free()
> 
>         defer func() {
>                 if r := recover(); r == nil {
>                         t.Fatal("Did not recover from panic calling Bytes() after Free()")
>                 }
>         }()
> 
>         buf.Bytes()
>   }


> 
> Since this only happens on RISC-V I guess it might be something to do
> with the golang implementation on this architecture being unable to
> turn segfaults into panics.
> 
> Removing all three *AfterFree tests fixes the tests.
> 
> It seems a bit of an odd function however.  Wouldn't it be better to
> changes the Bytes function so that it tests if the pointer is NULL and
> panics?

In theory I guess both should be equivalent in terms of
semantics for the caller.


Also I feel like 'Free' ought to set 'b.Size = 0' after
it set 'b.P = nul'.  That should solve the problem for
the Bytes & Slice method tests at least, but probably not
the Get method test.

> NB: this _does not_ address the other problem where GODEBUG=cgocheck=2
> complains about "fatal error: Go pointer stored into non-Go memory".

Maybe that message across comes from the Go signal handler that's
trying to cope with the SEGV from the NULL reference, causing it
to trip over itself & thus not turn  the problem into a pnaic.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


More information about the Libguestfs mailing list