[libvirt] RFC / Braindump: public APIs needing data streams

Tue May 26 12:02:25 UTC 2009

The patches for secure migration raise an interesting question wrt
the handling of data streams and their effects on the internal driver
API and the public API. Although the migration helper APIs are not
technically public, they do map onto the remote wire protocol and as
such we have the same long term compatability issues to worry about.

The way the migration APIs fit together are obscuring the picture
a little, so for the sake of clarity, the remainder of this mail is
going to talk about a ficticious public API 'virDomainRestoreStream'
which allows a guest domain to be restored from a generic data
stream, rather than a named file. If we can solve this API problem,
then the design will trivially apply to secure migration. 

I'll now outline some possible approaches at public API level:

1. Pass a file handle to the public API

  Application usage:

     fd = open(filename);

     ret = virDomainRestoreStream(dom, fd);

  Driver internal usage:

     int virDomainRestoreStreamImpl(virDomainPtr dom, int fd)
        char buf[4096];
        int ret;
        int qemuFD;

        qemuFD = runQEMUWithIncomingFD(dom);

        do {
           ret = read(fd, buf, sizeof buf);
           if (ret > 0)
             write(qemuFD, buf, ret);
        } while (ret > 0);
     }

   Good: Restore functionality all in one driver method
   Good: Public API is very simple
   Good: Internal driver can poll() on the FD to avoid blocking
    Bad: Application API is blocked
    Bad: Data read from FD might need transformation
         eg, uncompress, or decrypt TLS/SASL.

2. Provide public APIs for starting restore, feeding data,
   and completing. This matches proposal for secure migration
   patchset.

   Application usage:

     fd = open(filename);
     ret = virDomainRestorePrepare(dom);

     do {
        char buf[4096];
        ret = read(fd, buf, sizeof buf);
        virDomainRestoreData(dom, buf, ret);
     } while (ret > 0);

     virDomainRestoreFinish(dom, ret == 0 ? 0 : 1);

  Driver internal usage:

     int virDomainRestorePrepareImpl(virDomainPtr dom) {
        qemuFD = runQEMUWithIncomingFD(dom);
     }

     int virDomainRestoreDataImpl(virDomainPtr dom, const char *buf, int buflen) {
        qemudFD = ...find previously opened qemuFD ...
        write(qemuFD, buf, buflen)
     }

     int virDomainRestoreFinishImpl(virDomainPtr dom, int error) {
        if (error)
          ...kill QEMU ...
        else
          qemuFD = ...find previously opened qemuFD ...
          close(qemuFD);
     }

   Good: Application can easily decrypt input data
   Good: Application can use an event loop to feed in data
         as it becomes available. eg poll in socket()
   Good: Application API never blocks execution for long time
    Bad: Driver has to maintain state across calls indefinitely
    Bad: Cannot guarentee that same client calls prepare/data.
         ie Different clients can get mixed up feeding data.
    Bad: Public API is fairly complex
    Bad: Lots of public API entry points for each method needing
         streams.

3. Provide a stream object for feeding data to driver from
   application. Similar to option 2, but provides easier state
   mgmt for driver. The driver will set callbacks on the data
   stream to receive data from client.

   Application usage:

      virDataStream stream = virDataStreamNew();
      ret = virDomainRestore(dom, stream);

     do {
        char buf[4096];
        ret = read(fd, buf, sizeof buf);
        virDataStreamWrite(stream, buf, ret);
     } while (ret > 0);

     virDataStreamFinish(dom, ret == 0 ? 0 : 1);

  Driver internal usage:

     int virDomainRestoreStreamImpl(virDomainPtr dom, virDataStream stream) {
        qemuFD = runQEMUWithIncomingFD(dom);
        virDataStreamSetCallbacks(stream,
                                  virDomainRestoreDataImpl,
                                  virDomainRestoreFinishImpl,
                                  (void *)qemudFD);
     }

     int virDomainRestoreDataImpl(virDomainPtr dom, const char *buf, int buflen, void *opaque) {
        int qemudFD = (int)opaque;
        write(qemuFD, buf, buflen)
     }

     int virDomainRestoreFinishImpl(virDomainPtr dom, int error, void *opaque) {
        if (error)
          ...kill QEMU ...
        else
          int qemudFD = (int)opaque;
          close(qemuFD);
     }

   Good: Application can easily decrypt input data
   Good: Application can use an event loop to feed in data
         as it becomes available. eg poll in socket()
   Good: Application API never blocks execution for long time
   Good: New APIs reusable for any public API with data stream
    Bad: Public API is fairly complex
    Bad: Driver has to maintain state across calls indefinitely

4. Provide a callback for driver to fetch data from the client app.
   Similar to option 1, but avoids need to expose concept of a 'fd'
   in public API directly.

   Application usage:

     int appreader(virDomainPtr dom, char *buf, int buflen, void *opaque) {
         int fd = (int)opaque;
         return read(fd, buf, buflen)
     }

     fd = open(filename);

     ret = virDomainRestoreStream(dom, appreader, (void *)fd);

   Driver internal usage

     int virDomainRestoreStreamImpl(virDomainPtr dom, int (*reader)(virDomainPtr, char *, int, voi d*), void *opaque)
        char buf[4096];
        int ret;
        int qemuFD;

        qemuFD = runQEMUWithIncomingFD(dom);

        do {
           ret = (*reader)(dom, buf, sizeof buf, opaque);
           if (ret > 0)
             write(qemuFD, buf, ret);
        } while (ret > 0);
     }

   Good: Restore functionality all in one driver method
   Good: Public API is very simple
   Good: Client app callback can decrypt data
    Bad: Application API is blocked
    Bad: Internal driver code will block on executing callback if
         not data is available. Can't integrate with event loop.

5. Provide a generic public stream API to fetch data from the
   client app. Similar to option 4, but adding an stream object
   to manage the callback - allows more functionality to be added
   to public API later without changing restore API contract.

   Application usage:

     int appreader(virDomainPtr dom, char *buf, int buflen, void *opaque) {
         int fd = (int)opaque;
         return read(fd, buf, buflen)
     }

     fd = open(filename);
     stream = virDataStreamNewReader(appreader, (void*)fd);
     ret = virDomainRestoreStream(dom, stream);

   Driver internal usage

     int virDomainRestoreStreamImpl(virDomainPtr dom, virDataStream stream) {
        char buf[4096];
        int ret;
        int qemuFD;

        qemuFD = runQEMUWithIncomingFD(dom);

        do {
           ret = virDataStreamRead(dom, buf, sizeof buf);
           if (ret > 0)
             write(qemuFD, buf, ret);
        } while (ret > 0);
     }

   Good: Restore functionality all in one driver method
   Good: Public API is fairly simple
   Good: Client app callback can decrypt data
    Bad: Application API is blocked
    Bad: Internal driver code will block on executing callback if
         not data is available. Can't integrate with event loop.

All the APIs have good and bad points to them, in particular there is
a difficult tradeoff between simplicity of the public API application
code, vs the internal API implementation code. Some important goals
though:

 - There must be a way to invoke the public API without blocking
   the application code
 - The driver must be able to receive data from an encrypted
   channel, because in libvirtd the FD might be the SASL/TLS socket
 - Internal driver API should not block on callbacks to app code,
   since it might need to be polling on another FD concurrently
   with reading data.
 - The number of new APIs to support streaming should not increase
   for each new method needing stream support.

Ultimately I think options 3 or 5 are the most promising, because
the addition of a generic 'virDataStream' public object makes it
easier to manage the processing of the data stream without adding
huge numbers of new APIs. Option 3 is a little more cumbersome to
use from application code, but it avoids blocking either the client
app, or the internal driver. The downside is the driver impl code
is split across several methods. With option 5 it is harder to
avoid blocking the client and internal driver, since it'd require
the driver to integrate with an event loop, but there is no direct
FD for the driver to poll() on.

The choices made also have possible implications on the design of
the remote wire protocol to support these methods. Ignoring the
design of the public API, there are a handful of ways to stream
data between client and server

1. Invoke primary method eg "restore domain", then feed the data
   in a sequence of following RPC calls.

    C --------------> S     Restore domain call
    C <-------------- S     Restore domain reply

    C --------------> S     Restore data call 1
    C <-------------- S     Restore data reply 1
    C --------------> S     Restore data call 2
    C <-------------- S     Restore data reply 2
      ...............
    C --------------> S     Restore data call n
    C <-------------- S     Restore data reply n

    C --------------> S     Restore data complete
    C <-------------- S     Restore data reply

  If server wants to abort a restore operation, it'll
  send an error on one of the replies.

2. Invoke primary method eg "restore domain", then feed the data
   in a sequence of following async messages.

    C --------------> S     Restore domain call
    C <-------------- S     Restore domain reply

    C --------------> S     Restore data msg 1
    C --------------> S     Restore data msg 2
      ...............
    C --------------> S     Restore data msg n

    C --------------> S     Restore data complete
    C <-------------- S     Restore data reply

  If server wants to stop the client without closing the socket, it
  needs an async 'stop' message from server to client.  This is 
   prety much same as option 2, but is killing off the explicit
  replies for each data packet.

3. Invoke primary method eg "restore domain", but require the data
   to be stream to server, before the reply is sent back

    C --------------> S     Restore domain call

    C --------------> S     Stream data msg 1
    C --------------> S     Stream data msg 2
      ...............
    C --------------> S     Stream data msg n
    C --------------> S     Stream finish msg

    C <-------------- S     Restore domain reply

  If server wants to stop the client without closing the socket, it
  simply sends back the 'restore domain reply' message as an error
  before client finishes sending data, and ignore any further data
  messages.

Options 2 or 3 have potential benefits on links with noticable
latency, since they're not blocking the client on synchronous
replies from the server. That said, the remote protocol does
allow for interleaving of calls & replies, so with Option 1 the
client could send multiple data packets without waiting for their
replies, and deal with possible delayed error replies. If doing
that though, the benefit of having a 1-to-1  call-to-reply ratio
is minimal, miht as well go for a n-to-1, call-to-reply approach.
It is hard to match public API option 3, with wire protocol 
option 3, because of the delayed 'restore domain reply' message.

The options I'm really thinking are most viable are 

 - Public API 3 + RPC 2
 - Public API 5 + RPC 3

Both of these are a little more complex to implement in the libvirtd
daemon that Chris' current secure migration patches, but then then also
have functional & design benefits

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|