[libvirt] [PATCH RFC 0/8] Sparse streams

Lee Schermerhorn lee.schermerhorn at hpe.com
Mon Feb 8 21:52:55 UTC 2016

On Mon, 2016-02-01 at 14:50 +0000, Daniel P. Berrange wrote:
> On Fri, Jan 29, 2016 at 02:26:51PM +0100, Michal Privoznik wrote:
> > 
> > This is merely an RFC.
> > 
> > What's the problem?
> > We have APIs for transferring disk images from/to host. Problem is,
> > disk images
> > can be sparse files. Our code is, however, unaware of that fact so
> > if for
> > instance the disk is one big hole of 8GB all those bytes have to:
> > a) be read b)
> > come through our event loop. This is obviously very inefficient
> > way.
> > 
> > How to deal with it?
> > The way I propose (and this is actually what I like you to comment
> > on) is to
> > invent set of new API. We need to make read from and write to a
> > stream
> > sparseness aware. The new APIs are as follows:
> > 
> >   int virStreamSendOffset(virStreamPtr stream,
> >                           unsigned long long offset,
> >                           const char *data,
> >                           size_t nbytes);
> > 
> >   int virStreamRecvOffset(virStreamPtr stream,
> >                           unsigned long long *offset,
> >                           char *data,
> >                           size_t nbytes);
> > 
> > The SendOffset() is fairly simple. It is given an offset to write
> > @data from so
> > it basically lseek() to @offset and write() data.
> > The RecvOffset() has slightly complicated design - it has to be
> > aware of the
> > fact that @offset it is required to read from fall into a hole. If
> > that's the
> > case it sets @offset to new location where data starts.
> > 
> > Are there other ways possible?
> > Sure! If you have any specific in mind I am happy to discuss it.
> > For instance
> > one idea I've heard (from Martin) was instead of SendOffset() and
> > RecvOffset()
> > we may just invent our variant of lseek().
> > 
> > What's left to be done?
> > Basically, I still don't have RPC implementation. But before I dive
> > into that I
> > thought of sharing my approach with you - because it may turn out
> > that a
> > different approach is going to be needed and thus my work would
> > render useless.
> It would be intesting to see the virsh vol-upload/download  client
> code updated
> to use the new APIs to deal with holes, so we can see how this API
> design looks
> from the POV of apps using libvirt.
> Regards,
> Daniel

Still an RFC.

Below I've included example of updated virsh vol-download client code
to handle sparse files/streams as mentioned in earlier mail.  This
example assumes a different take on the stream APIs than discussed
previously. For this exercise, let's assume that we don't change the
function profiles of the existing virStream APIs; rather their behavior
will be slightly different for a "sparse stream".  A client will need a
way to tell libvirt that it wants/expects the sparse behavior.  To that
purpose let's define a couple of new API's:

int virStreamSetSparse(virStreamPtr stream) requests that sparse
behavior be enabled for this stream.  Returns 0 on success; -1 on
error.  Possible errors:  maybe don't support enabling sparseness after
data already sent/received on stream?  

Note: a client could request sparseness via new flag to virStreamNew()
 as well or instead.

Will also want a:

int virStreamIsSparse(virStreamPtr stream) for use inside stream
driver, unless open coding a flags test is preferred?

Existing Streams [and FDStreams] APIs that behave differently for
sparse streams:

virStreamSend(virStreamPtr stream, const char *data, size_t nbytes)
will recognize a NULL @data as indicating that @nbytes contains the
size of a hole in the data source rather than an error.  This will be
true also for stream driver streamSend() handlers: remoteStreamSend()
and virFDStreamWrite() which take the same arguments as

Internally when called from remoteStreamSend() with status=VIR_NET_SKIP
virNetClientStreamSendPacket() will encode the hole length nbytes as
the payload of a VIR_STREAM_SKIP packet as Daniel suggested earlier.

virStreamRecv(virStreamPtr stream, char *data, size_t nbytes), upon
encountering a hole in the incoming data stream will return the
negative value of the hole size.  Similarly for the stream driver
streamRecv handlers remoteStreamRecv() and virFDStreamRead().  All of
these functions current return bytes received [>0], EOF [== 0], or -1
or -2 to indicate error or no data on a non-blocking stream.  The
minimum hole size will be defined [much?] greater than 2, so a negative
return value < (0 - VIR_STREAM_MIN_HOLE_SIZE) can indicate a hole.  A
macro to test the return value for a hole would be convenient.

Prior mail discussed how a VIR_STREAM_SKIP could be propagated up the
stack as an iovec with NULL iov_base to represent a hole.
 virNetClientStreamRecvPacket() would the return the hole as the large
negative value in this model.

One limitation with this approach is that the return values are
declared as 'int' which will limit the size of the hole we can receive.
 This will require that multi-GB holes be broken up into ~2GB chunks.
 If this [or the use of large negative returns to represent holes]
sounds too onerous, one can adopt an approach with new sparse stream
APIs with an explicit @holesize out parameter for receive.  I don't
think this will affect the lower level implementation nor virsh's
vol{Up,Down}load commands all that much.

Next, virsh's cmdVol{Upload,Download)() functions use
virStream{Send,Recv}All(), respectively.  These functions take a
stream, a data source or sink handler function and an opaque object
representing the actual source or sink.  virsh need only request sparse
stream behavior and pass a "sparse stream aware" source or sink handler
and object to use the existing APIs.

As a convenience, libvirt can provide support for sparse aware
source/sink handlers layered on FDStreams as follows:

typedef void *virStreamSparseFilePtr;

virStreamSparseFileOpen(const char *path,
                        unsigned long long offset,
                        unsigned long long length,
                        int oflags)

This is a public wrapper around the libvirt
private virFDStreamOpenFile() et al that creates an FDStream associated
with the specified file/path for use with sparse aware source/sink
handlers.  Open with O_RDONLY for sources, O_WRONLY+O_TRUNC... for
sinks.  I don't know that offset and length are needed for source and
sink, but I kept them above.

We'll also need a:

virStreamSparseFileClose(virStreamSparseFilePtr sparseFile)

to clean things up.

Libvirt can also provide [needs to while FDStreams are libvirt private]
sparse aware source and sink handlers:

int virStreamSparseSource(virStreamPtr st [ATTRIBUTE_UNUSED ? TBD],
                          char *bytes, size_t nbytes, void *opaque)

This is a public wrapper around virFDStreamRead() that knows that the
@opaque parameter is a sparse FDStream created by
virStreamSparseFileOpen().  It will pass along @bytes and @nbytes to
the sparse FDStream which may return holes as large negative values.

int virStreamSparseSink(virStreamPtr st [ATTRIBUTE_UNUSED ? TBD],
                        const char *bytes, size_t nbytes, void *opaque)

A public wrapper around virFDStreamWrite() that passes along its
arguments to the FDStream indicated by @opaque.

One more thing before looking at the virsh cmdVol{Up,Down}load() code:
we may want to maintain the current, non-sparse behavior as default and
require a '--sparse' option to enable sparse behavior.  The following
code segments represent the sparse path after parameter parsing.

cmdVolDownload() sparse file/stream path:

!   virStreamSparseFilePtr sparseFile;

!   if ((sparseFile = virStreamSparseFileOpen(file,
!                                             O_WRONLY|O_CREAT|O_EXCL,
!                                             0666)) < 0) {
        if (errno != EEXIST ||
!           (sparseFile = virStreamSparseFileOpen(file,
!                                                 O_WRONLY|O_TRUNC,
!                                                 0666)) < 0) {
            vshError(ctl, _("cannot create %s"), file);
            goto cleanup;
    } else {
        created = true;

    if (!(st = virStreamNew(priv->conn, 0))) {
        vshError(ctl, _("cannot create a new stream"));
        goto cleanup;

    if (virStorageVolDownload(vol, st, offset, length, 0) < 0) {
        vshError(ctl, _("cannot download from volume %s"), name);
        goto cleanup;

!   if (virStreamRecvAll(st, virStreamSparseSink, sparseFile) < 0) {
        vshError(ctl, _("cannot receive data from volume %s"), name);
        goto cleanup;

!   if (virStreamSparseFileClose(sparseFile) < 0) {
        vshError(ctl, _("cannot close file %s"), file);
        goto cleanup;

    if (virStreamFinish(st) < 0) {
        vshError(ctl, _("cannot close volume %s"), name);
        goto cleanup;

    ret = true;

!   // cleanup will be different for sparseFile upload, too.

Changes to cmdVolUpload() will mirror those above with a data source
opened O_RDONLY.

I hope this is sufficient detail to explain this model -- reusing the
existing with a couple of additions for sparse streams.  I've started
generating patches, but would like to settle on the APIs before going
to far at that level.

Changes to the stream driver functions will be fairly straightforward
if we stick to sending and receiving holes explicitly, whatever the
API.  Handling hole detection and expansion for sources and sinks that
don't support seeking() and when the remote end doesn't support the new
protocol may contain some interesting issues.

I'll be out of the office for a couple of weeks and so won't be able to
work on this much during that time.  Perhaps sparse stream support will
be a done deal by the time I return.


More information about the libvir-list mailing list