[Libguestfs] Few libnbd questions/concerns

Thu Jun 27 10:11:48 UTC 2019

On Wed, Jun 26, 2019 at 09:33:04PM -0500, Eric Blake wrote:
>On 6/25/19 9:25 AM, Eric Blake wrote:
>
>>>> When running a program with `nbdkit -U - --run ...`, the $nbd parameter gets
>>>> expanded to nbd:unix:/path/to/socket.  When this string is passed to
>>>> nbd_connect_uri(), it does not return an error (even though it is not a valid
>>>> URL), but what's more it treats it as "nbd://localhost", which might be a completely different server (that actually happened to me and it was kind of confusing).
>>>
>>> There's a bit of magic in nbdkit about $nbd.  It also predates our
>>> standardization effort for NBD URLs which is going on upstream.
>>>
>>> That needs to be fixed, but in the meantime just use $unixsocket
>>> instead (see nbdkit-captive(1) man page for the full details).
>>
>> In the meantime, I don't mind working on the quick fix for rejecting an
>> invalid URI.
>
>Done.
>
>>
>>>
>>>> When nbd_block_status() callback returns -1, the nbd_block_status() function
>>>> reports:
>>>>
>>>>  command failed: Protocol error (errno = 71)
>>>>
>>>> which is a bit confusing to me.  I could be nicer to have that report that it
>>>> really was the callback that caused this.
>>>
>>> Not sure about this one - Eric?
>>
>> Right now, a failed callback will cause the overall nbd_block_status()
>> to fail with whatever errno the callback left behind, with EPROTO as the
>> default. I'm guessing you didn't set an errno when you returned -1?  Is
>> there a better default we should use than EPROTO?  Is there really a way
>> to diagnose without an errno value that the callback failed but didn't
>> set errno?
>>
>
>My pending patch to take advantage of Mutable(Int) in the callback,
>makes this a bit nicer, as you can then control the error status by
>assigning into the parameter rather than by what got left in errno.
>
>> There's also the related question - my patches to add
>> nbd_pread_structured are set up to call the callback for as many chunks
>> as the server sends, even if an earlier invocation of the callback
>> fails, but to still preserve the errno of the first failed callback
>> (again defaulting to EPROTO). Should nbd_block_status also call the
>> callback for all chunks from the server, rather than stopping the use of
>> further callbacks once the first callback fails, if only for consistency
>> between the two?
>
>So now I'm leaning towards making that change, particularly since I'm
>also working on adding a nbd_aio_FOO_notify() for each command that
>takes a callback that also takes an error pointer.
>
>>
>>>> One last thing is that I could not find any definition for the flags
>>>> of "base:allocation" metadata (NBD_STATE_HOLE and NBD_STATE_ZERO).
>>>> It might just be things that are still missing due to an early stage
>>>> of libnbd, or it might not belong there or it might be me misusing
>>>> something.
>>>
>>> Have a look at the NBD protocol doc:
>>>
>>> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
>>
>> We could add convenience #defines into libnbd.h if needed, which would
>> work for the base:allocation context, but it would not help any other
>> context (such as qemu:dirty-bitmap:FOO) which has other definitions for
>> the returned bits.  But the fact that you have to call
>> nbd_add_meta_context(h, "base:allocation") is also awkward; if we add
>> convenience macros for the results, we probably also want to support
>> nbd_add_meta_context(h, LIBNBD_CONTEXT_BASE_ALLOCATION).
>
>Still needs to be done.
>

Oh, it it's just this, then I can add that, at least this is not too deep inside
the machinery that I'd have to go through everything to understand it properly.
I sent a simple patch that adds it, we can discuss that there.

>Another thing I just noticed: nbd_poll returns 0 both for timeout and
>for when it notified the state machine about POLLIN or POLLOUT making
>progress. We probably want to switch it to return 0 on timeout and 1
>when it made progress, so that a user can detect timeout (all our
>internal callers pass -1 for the timeout parameter, and thus can't time
>out, but external callers may care).
>

I haven't gotten to using async APIs yet.  But resembling poll(2) seems
reasonable.

>-- 
>Eric Blake, Principal Software Engineer
>Red Hat, Inc.           +1-919-301-3226
>Virtualization:  qemu.org | libvirt.org
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190627/eb5a3dfb/attachment.sig>