[Libguestfs] [PATCH libnbd] ublk: Add new nbdublk program

Tue Aug 30 09:07:40 UTC 2022

On Tue, Aug 30, 2022 at 04:30:40PM +0800, Ming Lei wrote:
> On Tue, Aug 30, 2022 at 09:04:07AM +0100, Richard W.M. Jones wrote:
> > On Tue, Aug 30, 2022 at 10:32:02AM +0800, Ming Lei wrote:
> > > Hi Jones,
> > > 
> > > On Thu, Aug 25, 2022 at 01:10:55PM +0100, Richard W.M. Jones wrote:
> > > > This patch adds simple support for a ublk-based NBD client.
> > > > It is also available here:
> > > > https://gitlab.com/rwmjones/libnbd/-/tree/nbdublk/ublk
> > > > 
> > > > ublk is a way to write Linux block device drivers in userspace:
> > > 
> > > Just looked at your nbdublk implementation a bit, basically it is good,
> > > and one really nice work.
> > > 
> > > Also follows two suggestions:
> > > 
> > > 1) the io_uring context is multilexed with ublk io command handling, so
> > > we should avoid to block in both ->handle_io_async() and
> > > ->handle_event(), otherwise performance may be bad
> > 
> > The nbd_aio_* calls don't block.
> > 
> > However I noticed that I made a mistake with the trim and zero paths
> > because I am using synchronous (blocking) nbd_flush / nbd_trim /
> > nbd_zero instead of nbd_aio_flush / nbd_aio_trim / nbd_aio_zero.  I
> > will fix this soon.
> > 
> > Nothing in handle_event should block except for the call to
> > pthread_mutex_lock.  This lock is necessary because new commands can
> > be retired on the nbd_work_thread while handle_event is being called
> > from the io_uring thread.
> 
> Yes, I meant the mutex.
> 
> Also given nbd_work_thread() is required, it is reasonable to move
> nbd_aio_* into nbd_work_thread(), so avoid to wait on the mutex in
> io_uring context, then io command may not be forwarded to userspace
> in time.

nbd_aio_* calls shouldn't block.  But how can we move those calls to
nbd_work_thread()?  If we queue up commands in handle_io_async we
would still have to use a lock on that queue.  I don't see how we can
avoid some kind of locking in handle_io_async.

> BTW what is the context for running callback of nbd_aio_*()?

It depends on whether nbd_aio_* can complete the command entirely
without blocking.

The most common case is that nbd_aio_* calls a network function
(eg. recv(2)) which returns EWOULDBLOCK.  Later, on nbd_work_thread,
recv(2) is completed, the command is processed, and
command_completed() is called.

It's unlikely, but possible, that the command_completed() function
could be called directly from handle_io_async -> nbd_aio_* ->
command_completed, eg. if recv(2) never blocks for some reason like
it's talking over a Unix domain socket to a server which is very quick
to reply.

> > > 2) in the implementation of nbd worker thread, there are two sleep
> > > points(wait for incoming io command, and network FD), I'd suggest to use
> > > poll to wait on any of them
> > >
> > > Recently I are working to add ublksrv io offloading or aio
> > > interfaces on this sort of case in which io_uring can't be used,
> > > which may simplified this area, please see the attached patch which
> > > applies the above two points against your patch. And obvious
> > > improvement can be observed on my simple fio test( randread, io, 4k
> > > bs, libaio) against backend of 'nbdkit file'.
> > >
> > > But these interfaces aren't merged to ublksrv github tree yet, you can find
> > > them in the aio branch, and demo_event.c is one example wrt. how to use
> > > them:
> > > 
> > >   https://github.com/ming1/ubdsrv/tree/aio
> > >
> > > Actually this interface can be improved further for nbdublk case,
> > > and the request allocation isn't needed actually for this direct
> > > offloading. But they are added for covering some IOs not from ublk
> > > driver, such as meta data, so 'struct ublksrv_aio' is allocated.
> > > I will try best to finalize them and merge to master branch.
> > 
> > I didn't really understand what these patches to ubdsrv do when I
> > looked at them before.  Maybe add some diagrams?
> 
> The patches are added recently. It is just for simplifying to offload
> IO handling on another worker context, such as nbd_work_thread.
> 
> It uses one eventfd to notify the target worker thread when new requests
> are added to a list. Once the worker thread is wakeup, it fetches
> requests from the list, after handle these requests, the worker thread
> waits on both the eventfd and target IO(network FD) via poll().

In the patch, it seems there is no place which creates a pthread for
nbd_work_thread.  Also ubdsrv aio branch does not call pthread_create.
So I don't understand in what context nbd_work_thread is called.

I've never used eventfd before and the man page for it is very opaque.

> In your previous implementation, nbd work thread may wait on one pthread
> mutex and aio_poll(), this way may not be efficient, given when waiting
> on one event, another events can't be handled.

I'm not sure what "event" means in this context.  Does it mean
"NBD command"?  Or poll(2) event?

There are multiple (usually 4) nbd_work threads, one for each NBD
network connection.  Each NBD network connection can handle many
commands in flight at once.

However if your patch is faster then something must be going on.

> > > BTW, IOPS on nbdublk(backend: nbdkit file) still has big gap compared
> > > with ublk-loop, so I guess in future maybe io_uring should be tried and
> > > see if big improvement can be observed.
> > 
> > It's always going to be a bit slower because we're converting the
> > requests into a network protocol and passing them to another process.
> 
> In my simple fio randread test, it can be 5+ ~ 10+ slower compared with loop.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org