[Libguestfs] Progress bars

Tue Jul 6 15:37:47 UTC 2010

On Fri, Jul 02, 2010 at 11:27:26PM +0100, Richard W.M. Jones wrote:
> Background
> ----------
> 
> A complaint I'm hearing is that some tools which take a long time to
> run (virt-resize in particular) need to have progress bars to indicate
> how long they are expected to run.  This is also a basic usability
> principle, see for example this paper:
> http://www.chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf
> 
> If you look at how virt-resize is implemented, the bulk of the time is
> spent copying partitions in a single long-running function, copy_size:
> 
>   http://libguestfs.org/guestfs.3.html#guestfs_copy_size
> 
> The current protocol is entirely synchronous so that while the daemon
> is doing the copy_size, the library is blocked waiting for a reply
> message and does nothing else in that thread.  To implement a progress
> bar for this call we'd have to have a way to query into the call while
> it was running to find out how far it has got, or to have it send
> regular status messages out.  The plan below outlines a way to do
> this.
> 
> Protocol changes
> ----------------
> 
> This plan does require changing the protocol.  We're allowed to change
> the library/daemon protocol (it's not ABI) but we tend to avoid doing
> it because it means people can't repackage our appliance and use it in
> other distributions (see http://libguestfs.org/FAQ.html#distros).  So
> if we do change it now, we should:
> 
> (a) Only do it at the start of the 1.5 development cycle, indicating
> that now would be a good time to release 1.4.  See:
> https://www.redhat.com/archives/libguestfs/2010-June/msg00069.html
> 
> (b) We should make the other protocol changes we've been wanting to
> do, see: http://libguestfs.org/guestfs.3.html#libguestfs_gotchas
> 
> (c) We need to make sure everyone understands the change to the
> appliance and protocol.

> 
> The current protocol (ignoring file transfers) implements a very
> simple RPC mechanism which is entirely synchronous:
> 
>   library                           daemon
>     request message --------------->
>                              daemon processes message
>     <----------------- reply message
> 
> I have discarded the idea that we should change to using an
> asynchronous system, eg. allowing the library to issue more than one
> request message simultaneously, because at this stage it adds great
> complexity to both ends, and is I believe not necessary in order to
> implement progress messages.  This means that we can't make additional
> "get status" requests during a call.

Allowing overlapping parallel requests isn't really an asychronous
system, at least from the protocol POV. It could be from the daemon
implementation POV if the daemon didn't want to use threads for RPC
dispatch. 

In libvirt we didn't make the individual API calls asynchronous,
but we did allow for multiple threads to use a single connection
concurrently, so one thread doesn't block another at a protocol
level. It certainly added alot of complexity to the code, but
becoming fully multithread capable was a big win

> Instead I'd like to change the protocol like this:
> 
>   library                           daemon
>     request message --------------->
>                              daemon processes message
>     <---------------- status message
>                              daemon processes message
>     <---------------- status message
>                              daemon processes message
>     <----------------- reply message
> 
> In a long running call such as copy_size, the daemon would send
> periodic status messages to the library.
> 
> Points to note:
> 
> (i) The daemon should self-limit these messages, eg. to once per
> second, starting at least one second after the request.
> 
> (ii) An initial implementation of the library would simply discard
> these status messages.  This would allow us to test the daemon side,
> only making trivial library changes, and be reasonably sure that the
> daemon side is correct.
> 
> (iii) The status message contains just two (64 bit) numbers:
> 
>   - Total size of the current operation
> 
>   - Current progress (0 <= current <= total size)
> 
>   The meaning of these two numbers is defined by the context of the
>   call, but would usually indicate, eg. total size in bytes and number
>   of bytes processed/copied so far.  Callers are only interested in
>   the ratio of these two numbers when displaying a progress meter.

This need not be incompatible. The async status message packets can 
use a new value for 'guest_message_direction'. If existing clients
don't already ignore packets with unknown direction field, then you
could include a post-connect RPC call to turn on the status messages
for the connection. So a old client talking to new daemon would never
see the async messages, unless they enabled this feature explicitly.

> Library changes
> ---------------
> 
> On receiving a status message, the library can ignore it, or can:
> 
> (1) update the total and current fields in the guestfs_h handle, and/or
> 
> (2) call a prearranged callback function.
> 
> We would add a way to query these numbers for an existing handle:
> 
>   void guestfs_get_progress (guestfs_h *g, int64_t *total, int64_t *current);
> 
> Callers can poll this function on any open handle from any thread to
> retrieve the progress of the currently running call.
> 
> [Side note: In general you cannot call libguestfs APIs from multiple
> threads:
> http://libguestfs.org/guestfs.3.html#multiple_handles_and_multiple_threads]

In the wire protocol, when sending back the status messages, it is
probably wise to make sure the 'serial' field of the status message
always matches the 'serial' field of the initial RPC request. This
means if you do make the library multi-threaded in the future, you
have the ability to correlate async status messages to the correct
thread's RPC call.

> Daemon changes
> --------------
> 
> Long running calls tend to be of two forms:
> 
>   while (n < size) {
>     copy_a_buffer ();
>     n += size_copied;
>   }
> 
> or:
> 
>   command ("long_running_external_command");
> 
> The first form can be changed easily:
> 
>   while (n < size) {
>     copy_a_buffer ();
>     n += size_copied;
>     notify_progress (n, size);
>   }
> 
> The 'notify_progress' function can be called as often as needed, and
> it would have to contain its own rate limiting functionality so that
> it doesn't actually send progress messages back more often than
> desired (eg. once per second).  This allows for a very simple change
> to all the potentially long-running daemon functions of the first
> form.
> 
> The second form are more difficult to change.  With each one we would
> have to consider the nature of the external command, if it provides
> some sort of progress feature or if we need to poll it from the daemon
> (eg. polling the size of the input and output files).
> 
> Some functions of the second form could be changed.  eg. The
> implementation of guestfs_dd runs the external "dd" command, but could
> be modified to an internal copy function of the first form, and this
> might have other benefits too.
> 
> Other functions of the second form would be changed to poll status
> over time.  There is nothing in the library API which requires any
> function to provide status messages.
> 
> Tools
> -----
> 
> Tools, such as guestfish and virt-resize, must also be changed to poll
> for status or set a callback for some long-running operations.
> Obviously setting a callback would be preferred.
> 
> We would need to change the Perl and OCaml bindings at least to
> support the progress callback.
> 
> Generator
> ---------
> 
> After making these changes we'd have a few API calls which generate
> status messages, and many calls which don't.  You can call
> guestfs_get_progress on any call, but it wouldn't return any useful
> information for the majority of calls.

If you're not very careful adding a 'guestfs_get_progress' API
could make it very hard to introduce multithread support. Since
the main thread is blocked on the original API, I assume this
get_progress will be called from a separate thread, so you can't
just use thread locals to associate this info. 

I can see two options to make this extensible to allow potential
for future concurrent usage

 - Associate a 'status' object with a thread. 

     guestfs_status_h s = guestfs_status_new();
     guestfs_set_status_object(g, s);
     guestfs_do_some_api_call(g);

   This would store 's' in a thread-local. Any API calls
   made in this thread would use that object.

   Another thread can query that same object. Thus many
   threads can setup their own status objects and each
   can be monitored independantly.

 - Add variants of each API which take a status object

     guestfs_status_h s = guestfs_status_new();
     guestfs_do_some_api_call_with_status(g, s);

   The status object is associated explicitly with an API call, so again
   concurrently API calls from different threads can be monitored
   independantly.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|