[libvirt] Unbounded client streams

Tue Apr 25 14:06:47 UTC 2017

On Tue, Apr 25, 2017 at 09:22:51AM +0200, Michal Privoznik wrote:
> Dear list,
> 
> as you might have seen, I've been playing with our streams lately. One of
> the things I wanted to try was how my sparse streams deal with vol-download
> and slow writer (e.g. slow disk). Note to whomever wants to try that out:
> blkio,throttle.write_* is no good for this. So I went the old way:

> And found out something ugly: the memory footprint of virsh just keeps
> growing as vol-download progresses. I've tracked down the problem to:
> 
> 1) The client event loop (virNetClientIOEventLoop) sees some incoming data,
> so it reads it (virNetClientIOHandleInput -> virNetClientCallDispatch) and
> queues the packet (virNetClientCallDispatchStream) regardless of the length
> of already queued packets
> 
> 2) Since the server sees client has read all the data, it sends even more
> data.

Yep, from the server's POV the current protocol design relies on the client
not consuming data if it is not ready to handle it.

There were two reasons for this. First for efficiency, we wanted to avoid
a round trip for every single block of data we wish to upload / download.
Second, and more importantly when using the streams for a bi-directional
interactive console, we've no idea when there will be data to download,
so we just wanted the server to send it as soon as it was received from
the guest.

> And the queue in step 1) just grows and grows. Ideally, the packets are
> taken out from the queue by virStreamRecv (which boils down to
> virNetClientStreamRecvPacket), but that is not on the schedule for the next
> minute.

Yep.

> Frankly, I don't have any idea how to fix this. We can stop reading the data
> (i.e. not set POLLIN on the client <-> server socket) if the queue reaches
> certain length. BUT, this would mean no other call is processed. Not even
> keepalive - and the connection would break.

My original intention was certainly that we would stop reading data off the
socket if we had too much queued, but this predated the existance of the
keepalive code.

I see three possible options (besides ignoring it)

 - Turn off the keepalive somehow when we want to pause reading from
   the stream

 - Somehow introduce stream "chunking". eg assume a chunk size of 10 MB
   is somehow enabled. The server would send 10 MB, and then not send
   any more data until the client issued a "continue" message of some
   kind, whereupon a further 10 MB is permitted to be sent.

 - Stop multiplexing streams on the socket. Have some way to open up
   additional sockets to the server, one per stream. The main connection
   would somehow provide a one time only auth token, so we don't
   need to force apps to go through an authentication step again.

Option 2 is probably less scary to retrofit.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|