[Libguestfs] Memory corruption when testing nbdkit python plugin with nbd-tester-client?

Carl-Daniel Hailfinger c-d.hailfinger.devel.2006 at gmx.net
Mon Sep 26 18:40:24 UTC 2016


On 26.09.2016 19:20, Carl-Daniel Hailfinger wrote:
> On 26.09.2016 14:29, Richard W.M. Jones wrote:
>> On Mon, Sep 26, 2016 at 02:18:02PM +0200, Carl-Daniel Hailfinger wrote:
>>> Hi,
>>>
>>> has anyone ever run "make check" from nbd against nbdkit with a python
>>> plugin? I usually get segfaults during such a run, and sometimes various
>>> other errors happen before the segfault, suggesting that some memory
>>> corruption is underway.
>>> AFAICS a pure python plugin should not be able to cause memory corruption.
>> Correct, a python plugin should not cause memory corruption,
>> and nbdkit shouldn't segfault ever.
>>
>> Did you get a stack trace from C (not from Python)?
> The core files were useless, but I ran nbdkit in gdb and got something...
>
> (gdb) run -n -f /software/nbdkit/plugins/python/.libs/nbdkit-python-plugin.so script=/software/nbdkit/plugins/python/ssd-ftl-wearleveling.py disksize=52428800
> Starting program: /software/nbdkit/src/nbdkit -n -f /software/nbdkit/plugins/python/.libs/nbdkit-python-plugin.so script=/software/nbdkit/plugins/python/ssd-ftl-wearleveling.py disksize=52428800
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>
> Breakpoint 1, py_config (key=0x7fffffffe2e0 "script", value=0x7fffffffe2e7 "/software/nbdkit/plugins/python/ssd-ftl-wearleveling.py") at python.c:107
> 107     {
> (gdb) info shared
> >From                To                  Syms Read   Shared Object Library
> 0x00007ffff7ddaae0  0x00007ffff7df5490  Yes         /lib64/ld-linux-x86-64.so.2
> 0x00007ffff7bd6ed0  0x00007ffff7bd79ce  Yes         /lib/x86_64-linux-gnu/libdl.so.2
> 0x00007ffff79bd9f0  0x00007ffff79ca471  Yes         /lib/x86_64-linux-gnu/libpthread.so.0
> 0x00007ffff7612520  0x00007ffff7757183  Yes         /lib/x86_64-linux-gnu/libc.so.6
> 0x00007ffff73f0000  0x00007ffff73f13e7  Yes         /software/nbdkit/plugins/python/.libs/nbdkit-python-plugin.so
> 0x00007ffff6ed70c0  0x00007ffff7065782  Yes (*)     /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> 0x00007ffff6c73e00  0x00007ffff6c83bf8  Yes         /lib/x86_64-linux-gnu/libz.so.1
> 0x00007ffff6a6ff10  0x00007ffff6a70804  Yes         /lib/x86_64-linux-gnu/libutil.so.1
> 0x00007ffff676e610  0x00007ffff67dd056  Yes         /lib/x86_64-linux-gnu/libm.so.6
> (*): Shared library is missing debugging information.
> (gdb) c
> Continuing.
> [New Thread 0x7ffff62eb700 (LWP 24498)]
> [Thread 0x7ffff62eb700 (LWP 24498) exited]
> [New Thread 0x7ffff62eb700 (LWP 24516)]
> [Thread 0x7ffff62eb700 (LWP 24516) exited]
> [New Thread 0x7ffff62eb700 (LWP 24519)]
> [Thread 0x7ffff62eb700 (LWP 24519) exited]
> [New Thread 0x7ffff62eb700 (LWP 24536)]
> [Thread 0x7ffff62eb700 (LWP 24536) exited]
> [New Thread 0x7ffff62eb700 (LWP 24549)]
> [Thread 0x7ffff62eb700 (LWP 24549) exited]
> [New Thread 0x7ffff62eb700 (LWP 24561)]
> [New Thread 0x7ffff4c26700 (LWP 24580)]
> nbdkit: python[6]: error: ssd-ftl-wearleveling.py: callback failed: close
> [Thread 0x7ffff62eb700 (LWP 24561) exited]
> nbdkit: python[7]: error: invalid flush request: expecting offset and length == 0
> nbdkit: python[7]: error: write reply: Connection reset by peer
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff4c26700 (LWP 24580)]
> 0x00007ffff7015306 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> (gdb) bt
> #0  0x00007ffff7015306 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #1  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #2  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #3  0x00007ffff701954d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #4  0x00007ffff7017dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #5  0x00007ffff701954d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #6  0x00007ffff704e6d0 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #7  0x00007ffff6fbad43 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #8  0x00007ffff7033577 in PyEval_CallObjectWithKeywords () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
> #9  0x00007ffff73f0af0 in py_close (handle=0x625338) at python.c:234
> #10 0x0000000000405803 in plugin_close (conn=conn at entry=0x7fffe8000910) at plugins.c:377
> #11 0x00000000004037ec in free_connection (conn=0x7fffe8000910) at connections.c:147
> #12 0x0000000000404476 in _handle_single_connection (sockout=<optimized out>, sockin=<optimized out>) at connections.c:99
> #13 handle_single_connection (sockin=<optimized out>, sockout=<optimized out>) at connections.c:109
> #14 0x0000000000405e73 in start_thread (datav=0x7fffffffdd40) at sockets.c:220
> #15 0x00007ffff79c0184 in start_thread (arg=0x7ffff4c26700) at pthread_create.c:312
> #16 0x00007ffff76ed37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

And another one... my close function pickles all python objects which
were accessed in the plugin. Maybe that's what exposes the memory
corruption on close.

(gdb) run  -n -f /software/nbdkit/plugins/python/.libs/nbdkit-python-plugin.so script=/software/nbdkit/plugins/python/ssd-ftl-wearleveling.py disksize=52428800
Starting program: /software/nbdkit/src/nbdkit -n -f /software/nbdkit/plugins/python/.libs/nbdkit-python-plugin.so script=/software/nbdkit/plugins/python/ssd-ftl-wearleveling.py disksize=52428800
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff62eb700 (LWP 24790)]
[Thread 0x7ffff62eb700 (LWP 24790) exited]
[New Thread 0x7ffff62eb700 (LWP 24802)]
[Thread 0x7ffff62eb700 (LWP 24802) exited]
[New Thread 0x7ffff62eb700 (LWP 24804)]
[Thread 0x7ffff62eb700 (LWP 24804) exited]
[New Thread 0x7ffff62eb700 (LWP 24822)]
[Thread 0x7ffff62eb700 (LWP 24822) exited]
[New Thread 0x7ffff62eb700 (LWP 24836)]
[Thread 0x7ffff62eb700 (LWP 24836) exited]
[New Thread 0x7ffff62eb700 (LWP 24848)]
[New Thread 0x7ffff4bea700 (LWP 24984)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff62eb700 (LWP 24848)]
0x00007ffff6fd19cc in PyObject_GetAttr () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
(gdb) bt
#0  0x00007ffff6fd19cc in PyObject_GetAttr () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#1  0x00007ffff7015b11 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#2  0x00007ffff701954d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#3  0x00007ffff7017dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#4  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#5  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#6  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#7  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#8  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#9  0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#10 0x00007ffff7018059 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#11 0x00007ffff701954d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#12 0x00007ffff7017dd8 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#13 0x00007ffff701954d in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#14 0x00007ffff704e6d0 in ?? () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#15 0x00007ffff6fbad43 in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#16 0x00007ffff7033577 in PyEval_CallObjectWithKeywords () from /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0
#17 0x00007ffff73f0af0 in py_close (handle=0x625338) at python.c:234
#18 0x0000000000405803 in plugin_close (conn=conn at entry=0x7ffff0000910) at plugins.c:377
#19 0x00000000004037ec in free_connection (conn=0x7ffff0000910) at connections.c:147
#20 0x00000000004040f6 in _handle_single_connection (sockout=<optimized out>, sockin=<optimized out>) at connections.c:95
#21 handle_single_connection (sockin=<optimized out>, sockout=<optimized out>) at connections.c:109
#22 0x0000000000405e73 in start_thread (datav=0x7fffffffdd40) at sockets.c:220
#23 0x00007ffff79c0184 in start_thread (arg=0x7ffff62eb700) at pthread_create.c:312
#24 0x00007ffff76ed37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Will try instrumenting the close function next.

Regards,
Carl-Daniel

 
>>> Examples of nbdkit logs for running "make check" or subsets of it from
>>> nbd against nbdkit:
>>>
>>> nbdkit -n -f python script=ssd-ftl-wearleveling.py disksize=52428800
>>> nbdkit: python[6]: error: ssd-ftl-wearleveling.py: callback failed: close
>>> nbdkit: python[7]: error: invalid flush request: expecting offset and
>>> length == 0
>>> nbdkit: python[7]: error: write reply: Connection reset by peer
>>> Segmentation fault
>>>
>>> nbdkit -n -f python script=ssd-ftl-wearleveling.py disksize=52428800
>>> Fatal Python error: GC object already tracked
>>> Segmentation fault
>>>
>>> nbdkit -n -f python script=ssd-ftl-wearleveling.py disksize=52428800
>>> Segmentation fault
>>>
>>> nbdkit -n -f python script=ssd-ftl-wearleveling.py disksize=52428800
>>> nbdkit: python[1]: error: invalid flush request: expecting offset and
>>> length == 0
>>> nbdkit: python[1]: error: write reply: Broken pipe
>>> nbdkit: python[3]: error: client sent NBD_OPT_ABORT to abort the connection
>>> nbdkit: python[6]: error: ssd-ftl-wearleveling.py: callback failed: close
>>> Traceback (most recent call last):
>>>   File "ssd-ftl-wearleveling.py", line 221, in close
>>>     pickle.dump(lbahist, f)
>>>   File "/usr/lib/python2.7/pickle.py", line 1370, in dump
>>> Segmentation fault (core dumped)
>>>
>>> My close function just dumps the disk image and write access history
>>> with python pickle. It does not have any return statement, and because
>>> of that I seriously doubt that said close function can fail.
>>>
>>> I'm now trying to replicate this with example.py. Will report back.




More information about the Libguestfs mailing list