[Libguestfs] Memory corruption when testing nbdkit python plugin with nbd-tester-client?

Richard W.M. Jones rjones at redhat.com
Mon Oct 31 15:24:19 UTC 2016


On Tue, Sep 27, 2016 at 10:26:18AM +0200, Carl-Daniel Hailfinger wrote:
> On 26.09.2016 20:40, Carl-Daniel Hailfinger wrote:
> > On 26.09.2016 19:20, Carl-Daniel Hailfinger wrote:
> >> On 26.09.2016 14:29, Richard W.M. Jones wrote:
> >>> On Mon, Sep 26, 2016 at 02:18:02PM +0200, Carl-Daniel Hailfinger wrote:
> >>>> Hi,
> >>>>
> >>>> has anyone ever run "make check" from nbd against nbdkit with a python
> >>>> plugin? I usually get segfaults during such a run, and sometimes various
> >>>> other errors happen before the segfault, suggesting that some memory
> >>>> corruption is underway.
> >>>> AFAICS a pure python plugin should not be able to cause memory corruption.
> >>> Correct, a python plugin should not cause memory corruption,
> >>> and nbdkit shouldn't segfault ever.
> >>>
> >>> Did you get a stack trace from C (not from Python)?
> >> The core files were useless, but I ran nbdkit in gdb and got something...
> > And another one... my close function pickles all python objects which
> > were accessed in the plugin. Maybe that's what exposes the memory
> > corruption on close.
> > [...]
> > Will try instrumenting the close function next.
> 
> The results are in.
> 
> I dumped the reference counters for various python objects inside
> py_close and often the reference counters are off-by-one or off-by-two
> in the crashing case.
> 
> AFAICS the python API says that once you have threads, you have to do
> the GIL dance:
> https://docs.python.org/2/c-api/init.html#thread-state-and-the-global-interpreter-lock
> . All the crashes I saw can be explained by unsynchronized accesses to
> python objects.
> 
> I found the bug.
> plugin_lock_request() is used to lock client requests only, but it does
> _not_ lock any other calls into the plugin. Due to that, the following
> calls are neither locked against each other nor are they locked against
> client requests:
> plugin_register (const char *_filename,
> plugin_cleanup (void)
> plugin_config (const char *key, const char *value)
> plugin_config_complete (void)
> plugin_open (struct connection *conn, int readonly)
> plugin_close (struct connection *conn)
> plugin_get_size (struct connection *conn)
> plugin_can_write (struct connection *conn)
> plugin_can_flush (struct connection *conn)
> plugin_is_rotational (struct connection *conn)
> plugin_can_trim (struct connection *conn)
> 
> That means dumping the plugin-internal data on connection close is not
> locked against the requests within that connection, and any garbage
> collection triggered by the last request may still be running while
> plugin_close runs. Boom.
> 
> In general, when working with python any call into python should be
> wrapped like this:
> 
> PyGILState_STATE gstate;
> gstate = PyGILState_Ensure();
> 
> /* Perform Python actions here. */
> result = CallSomeFunction();
> /* evaluate result or handle exception */
> 
> /* Release the thread. No Python API allowed beyond this point. */
> PyGILState_Release(gstate);
> 
> 
> That would even allow us to run everything in parallel.
> 
> I have now added a call to plugin_lock_request() within plugin_close()
> and will check if it still explodes. That's obviously less than what the
> python developers recommend, but it may at least reduce the likelihood
> of the crashes.

Just following up on some old email.  Did this issue get fixed in the
end?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




More information about the Libguestfs mailing list