[Libguestfs] [libnbd PATCH 2/2] python: Optimize away copy during pread_structured

Tue May 31 20:44:59 UTC 2022

On Tue, May 31, 2022 at 11:12:50PM +0300, Nir Soffer wrote:
> On Tue, May 31, 2022 at 5:15 PM Eric Blake <eblake at redhat.com> wrote:
> >
> > The Py_BuildValue "y#" format copies data; this is because Python
> > wants to allow our C memory to go out of scope without worrying about
> > whether the user's python callback has stashed off a longer-living
> > reference to its incoming parameter.  But it is inefficient; we can do
> > better by utilizing Python's memoryview for a zero-copy exposure to
> > the callback's C buffer, as well as a .release method that we can
> > utilize just before our C memory goes out of scope.  Now, if the user
> > stashes away a reference, they will get a clean Python error if they
> > try to access the memory after the fact. This IS an API change (code
> > previously expecting a stashed copy to be long-lived will break), but
> > we never promised Python API stability, and anyone writing a callback
> > that saves off data was already risky (neither libnbd nor nbdkit's
> > testsuite had such a case).  For a demonstration of the new error,
> > where the old code succeeded:
> >
> > $ ./run nbdsh
> > nbd> h.connect_command(["nbdkit", "-s", "memory", "10"])
> > nbd> copy=None
> > nbd> def f(b,o,s,e):
> > ...   global copy
> > ...   copy = b
> > ...   print(b[0])
> > ...
> > nbd> print(copy)
> > None
> > nbd> h.pread_structured(10,0,f)
> > 0
> > bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
> > nbd> copy
> > <released memory at 0x7ff97410de40>
> > nbd> copy[0]
> > Traceback (most recent call last):
> >   File "/usr/lib64/python3.10/code.py", line 90, in runcode
> >     exec(code, self.locals)
> >   File "<console>", line 1, in <module>
> > ValueError: operation forbidden on released memoryview object
> >
> > To demonstrate the speedup, I tested:
> >
> > $ export script='
> > def f(b,o,s,e):
> >  pass
> > m=1024*1024
> > size=h.get_size()
> > for i in range(size // m):
> >  buf = h.pread_structured(m, m*i, f)
> > '
> > $ time ./run nbdkit -U - memory 10G --run 'nbdsh -u "$uri" -c "$script"'
> >
> > On my machine, this took 9.1s pre-patch, and 3.0s post-patch, for an
> > approximate 65% speedup.
> 
> Looks like ~300% speedup to me. I guess 3 times faster is the most clear way
> to describe the change.

I guess it depends whether you compare the delta in execution speed to
the old or the new speed.  But either way, it is a nice speedup! (I'll
probably re-write to use the ~3x wording, as that sounds more
impressive)

> 
> > The corresponding diff to the generated code is:
> >
> > | --- python/methods.c.bak        2022-05-31 07:57:25.256293091 -0500
> > | +++ python/methods.c    2022-05-31 08:14:09.570567858 -0500
> > | @@ -73,8 +73,12 @@ chunk_wrapper (void *user_data, const vo
> > |
> > |    PyGILState_STATE py_save = PyGILState_UNLOCKED;
> > |    PyObject *py_args, *py_ret;
> > | +  PyObject *py_subbuf = NULL;
> > |    PyObject *py_error = NULL;
> > |
> > | +  /* Cast away const */
> 
> I think it will be more clear and helpful as:
> 
>     /* Casting subbuf to char* is safe since we use PyBUF_READ. */
> 
> > | +  py_subbuf = PyMemoryView_FromMemory ((char *) subbuf, count, PyBUF_READ);

Yes, good idea.

> 
> Maybe py_view?
> 
> And also change the doscring to mention "view" instead of "subbuf".

That's determined by the public API parameter name.  The name is not
an API change, per-se, but changing it (in API.ml) would have knock-on
effects to other language bindings.  And 'view' is potentially
misleading, since while it has always been a view in C, and this patch
is getting us closer to it being a view in Python (you need patch 3/2
to make it an actual view of the final buffer for h.pread_structured,
and an as-yet unfinished 4/2 before it is an actual view of the buffer
passed to h.aio_pread_structured), we do not guarantee that all other
language bindings are able to avoid the memcpy.

> > |   out:
> > | +  if (py_subbuf) {
> > | +    PyObject *tmp = PyObject_CallMethod(py_subbuf, "release", NULL);
> 
> What it the user does:
> 
>     def f(b,o,s,e):
>         use b...
>         b.release()
> 
> Or:
> 
>     def f(b,o,s,e):
>         with b:
>             use b...
> 
> We would release a released memoryview here.
> 
> I don't think this is likely to happen though.

Even if it does happen, it's harmless.
https://docs.python.org/3/library/stdtypes.html#memoryview.release
says "After this method has been called, any further operation on the
view raises a ValueError (except release() itself which can be called
multiple times):".

> 
> I'm not sure about this change - the performance improvement is great,
> but the API change is surprising.
> 
> Since pread_structured() returns a buffer with the read data, can
> we arrange that the memoryview argument to the chunk callback
> is pointing into the returned buffer, so the view does not have to be
> released by the C extension?

Yes, in patch 3/2 for h.pread_structured; and I'm still working on it
for h.aio_pread_structured.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org