[Libguestfs] [PATCH libnbd 1/7] api: Add semi-private function for freeing persistent data.

Eric Blake eblake at redhat.com
Mon Aug 12 18:53:56 UTC 2019


On 8/12/19 1:13 PM, Richard W.M. Jones wrote:
> On Mon, Aug 12, 2019 at 11:29:10AM -0500, Eric Blake wrote:
>> On 8/12/19 11:08 AM, Richard W.M. Jones wrote:
>>> + typedef void (*nbd_free_callback) (void *ptr, void *user_data);
>>> + int nbd_add_free_callback (struct nbd_handle *h,
>>> +                            void *ptr,
>>> +                            nbd_free_callback cb,
>>> +                            void *user_data);
>>
>> Do we want to insist on a user_data argument?  Libvirt, for example,
>> states that free callbacks are passed only the pointer to be freed (as
>> you can already pack whatever you need into that pointer alone, rather
>> than needing yet another void *user_data); if we do not take a user_data
>> here, then...
> 
> Indeed, and that was my original implementation, and it would
> be usable in the simple case (free).
> 
> However it doesn't work for OCaml buffers (nor for Python).  In OCaml
> we need to free the GC root, but that is separate from the buffer
> pointer.
> 
> The generated code for an OCaml persistent buffers eventually looks
> like this:
> 
>   /* The function may save a reference to the Buffer, so we
>    * must treat it as a possible GC root.
>    */
>   value *buf_user_data;
>   buf_user_data = malloc (sizeof (value));
>   if (buf_user_data == NULL) caml_raise_out_of_memory ();
>   *buf_user_data = bufv;
>   caml_register_generational_global_root (buf_user_data);
>   struct nbd_buffer *buf_buf = NBD_buffer_val (bufv);
>   const void *buf = buf_buf->data;
>   size_t count = buf_buf->len;
>   if (nbd_add_free_callback (h, (void *)buf,
>                              free_root, buf_user_data) == -1)
>     caml_raise_out_of_memory ();
> 
> Notice that the free_root function needs the pointer to the GC root,
> but you can't get to that from the buffer pointer.

But still, can't we create a single C struct that tracks both the buffer
and the GC root, and then pass the address of that C struct as the one
pointer to the free function?

In other words, I'm trying to get to the point of something like:

struct closure_wrap {
  /* track anything for buffer GC root and refcounting */
  value *buf;
  /* track anything for the OCaml callback, if one was present */
  value *callback;
};

struct closure_wrap *wrap = malloc (sizeof *wrap);
wrap->buf = malloc (sizeof (*wrap->buf));
caml_register_generational_global_root (wrap->buf);
struct nbd_buffer *buf_buf = NBD_buffer_val (bufv);
const void *buf = buf_buf->data;
size_t count = buf_buf->len;
if (in nbd_aio_pread) {
  wrap->callback = NULL;
  res = nbd_aio_pread_callback(h, buf, count, offset,
                               NULL, free_cb, wrap,
                               flags);
} else /* in nbd_aio_pread_callback */ {
  wrap->callback = ...;
  res = nbd_aio_pread_callback(h, buf, count, offset,
                               completion_cb, free_cb, wrap,
                               flags);
}

where completion_cb(wrap) calls the OCaml callback (and doesn't get used
if there is no OCaml callback), and where free_cb(wrap) gets called at
the end of the lifetime for buf, wrap->callback (if any), AND wrap
itself, so that we have:

void free_cb(void *ptr) {
  struct closure_wrap *wrap = ptr;
  caml_remove_generational_global_root (wrap->buf);
  if (wrap->callback) {
    unregister the OCaml callback...
  }
  free (wrap);
}

> 
> A similar problem will exist in Python, but I've not got around to
> writing that code yet.
> 
>>> +C<ptr> is a pointer to the object (ie. the buffer or
>>> +callback C<user_data>).  C<cb (ptr)> is called when libnbd
>>
>> As written, your patch uses C<cb (ptr, user_data)>, so either this text
>> is wrong, or you really don't need user_data.
> 
> Yes it should be cb (ptr, user_data).  I'll tighten it up because
> there are now two user_datas.

But again, we may be able to get by with just one pointer.  (I'm still
hoping the idea of a three-tuple in C: main callback, free callback, and
user data - is sufficient to give language bindings the flexibility they
need to pack up a C struct containing everything needing cleanup at the
end of the lifetime, whether or not there is also a language callback in
play)

> 
>>> +    new_callbacks = realloc (h->free_callbacks,
>>> +                             sizeof (struct free_callback) * new_alloc);
>>
>> Should we start relying on reallocarray() to guarantee no multiplication
>> overflow? (Not yet in POSIX, but it has been proposed:
>> http://austingroupbugs.net/view.php?id=1218)
> 
> I guess - didn't know about it.  Seems like it's not available in
> glibc yet?

'man realloc' on Fedora 29 lists reallocarray under _GNU_SOURCE, but
fails to say which glibc introduced it, and 'man reallocarray' is
missing (bug in the man pages).


>>> +
>>> +  /* Need to keep the list sorted by pointer so we can use bsearch.
>>> +   * Insert the new entry before the i'th entry.
>>> +   *
>>> +   * Note the same pointer can be added multiple times (eg. if a
>>> +   * buffer is shared between commands), and that is not a bug.  Which
>>> +   * free function is called is undefined, but they should all be
>>> +   * called eventually.
>>
>> or in other words, if the free function is actually a reference counter
>> that only does something interesting when the count reaches 0, then
>> things still work.
> 
> Yes this is another reason why this API is trickier to use than it
> seems.  But this is really necessary if you share a buffer between
> several AIO calls as I discovered during implementation.

Normally, a user doesn't share the same buffer on the same handle - but
we do test that for our speed tests (where we don't care about buffer
contents so much as how many API calls can we fire off, so having
cross-contamination between API calls into the same buffer in those
tests is okay).  But you are also right that it is going to be common to
read into a buffer from one handle, then write that buffer out to
another handle - so our actions on capturing the buffer during
aio_pread, and then releasing it when retiring that command, really are
centered on ref-counting whatever the language handed us, and only
free()ing the actual C struct we use to tie everything together under a
single pointer.

> 
>>> +   */
>>> +  for (i = 0; i < h->nr_free_callbacks; ++i) {
>>> +    if (ptr <= h->free_callbacks[i].ptr)
>>
>> Comparing 2 pointers that are not allocated as part of the same array is
>> undefined in C.  To be safe, you're better off writing this as:
>>
>> if ((intptr_t)ptr <= (intptr_t)h->free_callbacks[i].ptr)
>>
>> or even storing intptr_t rather than void* in your list of pointers
>> needing callbacks.
> 
> OK.

And if I'm right about my idea of having each Closure become a 3-tuple
in C, where we can use a common free_cb(ptr) at the time the command is
retired for both the language nbd.aio_pread and for
nbd.aio_pread_callback, then pointer comparisons is moot because we
don't need to track a list of pointers, but rather just a free_cb
alongside each user_data.

> 
>>> +      break;
>>> +  }
>>
>> This is a linear search O(n) for where to insert.  Why not bsearch() for
>> O(log n), particularly since you document your intent to use bsearch()
>> for later lookups?
> 
> Yes indeed, good idea!
> 
>>> +  memmove (&h->free_callbacks[i+1], &h->free_callbacks[i],
>>> +           (h->nr_free_callbacks - i) * sizeof (struct free_callback));
>>
>> Hmm - the use of memmove() is O(n); we'd need a different data structure
>> (for example red-black tree or hash table) if we wanted list
>> manipulations to not be the chokepoint.  So, as long as you are already
>> stuck with an O(n) list manipulation, using O(n) lookup is not making
>> the problem any worse.
> 
> h->nr_free_callbacks is actually reasonably small in real code
> (usually <= 20).  It's somewhat related to the number of commands in
> flight multiplied by a small constant.

And my argument is that we are not in control of that number - a user
can write a Python script that deliberately calls nbd.aio_pread 1000
times before ever calling nbd.poll to finally advance the state machine.


>>> +static int
>>> +compare_free_callbacks (const void *v1, const void *v2)
>>> +{
>>> +  const void *ptr = v1;
>>> +  const struct free_callback *cb = v2;
>>> +
>>> +  if (ptr < cb->ptr) return -1;
>>> +  else if (ptr > cb->ptr) return 1;
>>
>> Again, undefined in C; comparing intptr_t is defined, but direct
>> comparison of unrelated pointers could cause a compiler to mis-optimize
>> (even if it happens to currently work in practice).
> 
> OK.  We could store the pointers in the struct as intptr_t ?

Yes, if we have to compare them at all. But I'm hoping we don't have to.


>>> +  free_cb = bsearch (ptr, h->free_callbacks, h->nr_free_callbacks,
>>> +                     sizeof (struct free_callback),
>>> +                     compare_free_callbacks);
>>
>> Here, you've got O(log n) lookup.
>>
>>> +  if (free_cb) {
>>> +    assert (ptr == free_cb->ptr);
>>> +
>>> +    free_cb->cb (ptr, free_cb->user_data);
>>> +
>>> +    /* Remove it from the free list. */
>>> +    memmove (free_cb, free_cb+1,
>>> +             sizeof (struct free_callback) *
>>> +             (&h->free_callbacks[h->nr_free_callbacks] - (free_cb+1)));
>>
>> but then slow it down with O(n) list manipulation.
> 
> Right, but only when we actually call the callback.
> 
> For C callers we will basically never use this.  h->nr_free_callbacks
> will be 0 and we will do a frequent bsearch on a zero-sized array.
> It's only expected to be used from other programming languages.

Okay, so the effect to C is minimal, but the effect to other languages
may be noticeable (since there we will be registering a callback for
every aio call).

> 
>> I'm still not sold on whether this is any better than our existing
>> completion callbacks coupled with VALID|FREE (where at least we didn't
>> suffer from O(n) lookups); or whether we want to instead explore taking
>> an explicit free function pointer as part of a C closure (that is, each
>> Closure maps to a three-tuple of C arguments for main function, free
>> function, and user_data).  But as you said in the cover letter, having
>> the whole series to review will let us choose between our options, so
>> I'll see what the rest of the series is able to accomplish with this.
> 
> I think we shouldn't worry about whether the implementation is
> efficient since we can always improve that based on feedback from
> perf.  It's really about whether this is a better API for freeing or not.

Getting rid of the VALID|FREE seems worthwhile, and _since_ that is
already an API/ABI change, we can _also_ make the API/ABI change from
Closure in C being a 2-tuple (fn, ptr) to being a 3-tuple (fn, free_fn,
ptr).  The tricky part that remains is how to convince the generator
that the code for Python nbd.aio_pread AND for nbd.aio_pread_callback
both need to use the C nbd_aio_pread_callback, for the sake of the
completion's free_cb() to do the necessary refcount cleanup on the buf,
at which point we no longer need a separate registration of free callbacks.

We then have the design question of whether to make an OClosure type,
where C has two functions nbd_aio_pread and nbd_aio_pread_callback for
convenience, but where other languages have only a single nbd.aio_pread
where the callback parameter is optional (the Closure type for
pread_structured chunk and for block_status extent will still be
mandatory; it is only the completion callback that is currently causing
us twice the API because we are treating it as pseudo-optional).  Or
maybe we just require C clients of nbd_aio_pread to always provide
parameters for callbacks, but document that the completion callback and
free callback pointers may be NULL.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190812/f2d2c22c/attachment.sig>


More information about the Libguestfs mailing list