[Libguestfs] [libnbd PATCH 2/2] python: Optimize away copy during pread_structured
Eric Blake
eblake at redhat.com
Tue May 31 20:44:59 UTC 2022
On Tue, May 31, 2022 at 11:12:50PM +0300, Nir Soffer wrote:
> On Tue, May 31, 2022 at 5:15 PM Eric Blake <eblake at redhat.com> wrote:
> >
> > The Py_BuildValue "y#" format copies data; this is because Python
> > wants to allow our C memory to go out of scope without worrying about
> > whether the user's python callback has stashed off a longer-living
> > reference to its incoming parameter. But it is inefficient; we can do
> > better by utilizing Python's memoryview for a zero-copy exposure to
> > the callback's C buffer, as well as a .release method that we can
> > utilize just before our C memory goes out of scope. Now, if the user
> > stashes away a reference, they will get a clean Python error if they
> > try to access the memory after the fact. This IS an API change (code
> > previously expecting a stashed copy to be long-lived will break), but
> > we never promised Python API stability, and anyone writing a callback
> > that saves off data was already risky (neither libnbd nor nbdkit's
> > testsuite had such a case). For a demonstration of the new error,
> > where the old code succeeded:
> >
> > $ ./run nbdsh
> > nbd> h.connect_command(["nbdkit", "-s", "memory", "10"])
> > nbd> copy=None
> > nbd> def f(b,o,s,e):
> > ... global copy
> > ... copy = b
> > ... print(b[0])
> > ...
> > nbd> print(copy)
> > None
> > nbd> h.pread_structured(10,0,f)
> > 0
> > bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
> > nbd> copy
> > <released memory at 0x7ff97410de40>
> > nbd> copy[0]
> > Traceback (most recent call last):
> > File "/usr/lib64/python3.10/code.py", line 90, in runcode
> > exec(code, self.locals)
> > File "<console>", line 1, in <module>
> > ValueError: operation forbidden on released memoryview object
> >
> > To demonstrate the speedup, I tested:
> >
> > $ export script='
> > def f(b,o,s,e):
> > pass
> > m=1024*1024
> > size=h.get_size()
> > for i in range(size // m):
> > buf = h.pread_structured(m, m*i, f)
> > '
> > $ time ./run nbdkit -U - memory 10G --run 'nbdsh -u "$uri" -c "$script"'
> >
> > On my machine, this took 9.1s pre-patch, and 3.0s post-patch, for an
> > approximate 65% speedup.
>
> Looks like ~300% speedup to me. I guess 3 times faster is the most clear way
> to describe the change.
I guess it depends whether you compare the delta in execution speed to
the old or the new speed. But either way, it is a nice speedup! (I'll
probably re-write to use the ~3x wording, as that sounds more
impressive)
>
> > The corresponding diff to the generated code is:
> >
> > | --- python/methods.c.bak 2022-05-31 07:57:25.256293091 -0500
> > | +++ python/methods.c 2022-05-31 08:14:09.570567858 -0500
> > | @@ -73,8 +73,12 @@ chunk_wrapper (void *user_data, const vo
> > |
> > | PyGILState_STATE py_save = PyGILState_UNLOCKED;
> > | PyObject *py_args, *py_ret;
> > | + PyObject *py_subbuf = NULL;
> > | PyObject *py_error = NULL;
> > |
> > | + /* Cast away const */
>
> I think it will be more clear and helpful as:
>
> /* Casting subbuf to char* is safe since we use PyBUF_READ. */
>
> > | + py_subbuf = PyMemoryView_FromMemory ((char *) subbuf, count, PyBUF_READ);
Yes, good idea.
>
> Maybe py_view?
>
> And also change the doscring to mention "view" instead of "subbuf".
That's determined by the public API parameter name. The name is not
an API change, per-se, but changing it (in API.ml) would have knock-on
effects to other language bindings. And 'view' is potentially
misleading, since while it has always been a view in C, and this patch
is getting us closer to it being a view in Python (you need patch 3/2
to make it an actual view of the final buffer for h.pread_structured,
and an as-yet unfinished 4/2 before it is an actual view of the buffer
passed to h.aio_pread_structured), we do not guarantee that all other
language bindings are able to avoid the memcpy.
> > | out:
> > | + if (py_subbuf) {
> > | + PyObject *tmp = PyObject_CallMethod(py_subbuf, "release", NULL);
>
> What it the user does:
>
> def f(b,o,s,e):
> use b...
> b.release()
>
> Or:
>
> def f(b,o,s,e):
> with b:
> use b...
>
> We would release a released memoryview here.
>
> I don't think this is likely to happen though.
Even if it does happen, it's harmless.
https://docs.python.org/3/library/stdtypes.html#memoryview.release
says "After this method has been called, any further operation on the
view raises a ValueError (except release() itself which can be called
multiple times):".
>
> I'm not sure about this change - the performance improvement is great,
> but the API change is surprising.
>
> Since pread_structured() returns a buffer with the read data, can
> we arrange that the memoryview argument to the chunk callback
> is pointing into the returned buffer, so the view does not have to be
> released by the C extension?
Yes, in patch 3/2 for h.pread_structured; and I'm still working on it
for h.aio_pread_structured.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
More information about the Libguestfs
mailing list