[libvirt] [PATCH v2 08/38] util: Introduce virFileInData

John Ferlan jferlan at redhat.com
Sat Apr 29 16:46:57 UTC 2017



On 04/20/2017 06:01 AM, Michal Privoznik wrote:
> This function takes a FD and determines whether the current
> position is in data section or in a hole. In addition to that,
> it also determines how much bytes are there remaining till the
> current section ends.
> 
> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> ---
>  src/libvirt_private.syms |   1 +
>  src/util/virfile.c       |  81 +++++++++++++++++++
>  src/util/virfile.h       |   3 +
>  tests/virfiletest.c      | 203 +++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 288 insertions(+)
> 

FWIW: This feels like an "inflection point" from which the remainder of
the algorithms are built around. I've forgotten anything from the
discussion for the original set of patches, but since I had a few cycles
to think... Apologies in advance for the length.

I guess I was under the impression that the purpose for sparse streams
is to find hole's in a file|stream and ensure those don't get copied
instead generating the hole on the target. If so, a bunch of different
small API's come to mind after reading this patch and trying to
internalize what's taking place:

    vir{File|Stream}SeekHoleSupported
      -> If SEEK_{DATA|HOLE} returns -1 and EINVAL, then either don't
use sparse streams and force usage of the non sparse option or just fail.

    vir{File|Stream}HasHole
      -> From beginning of file, find first HOLE < EOF
         -> If none, then sparse streams won't matter, right?

    vir{File|Stream}IsInHole
      -> From @cur if SEEK_DATA > SEEK_HOLE

    vir{File|Stream}IsInData
      -> From @cur if SEEK_HOLE > SEEK_DATA

    vir{File|Stream}GetDataEnd
      -> From @cur of DATA return @length

    vir{File|Stream}GetHoleEnd
      -> From @cur of HOLE return @length

We should know where we are and be able to keep track, right? Having to
determine @cur each time just seems to be overhead. For the purpose of
what's being done - I would think we start at 0 and know we have a
findable @end - then it would be "just" (hah!) a matter of ensuring we
have holes and then a sequence of finding hole/data from whence we start
(since theoretically we could start in a hole, right?).

Maybe the following is too simplistic...

   if (supportsSparse)
       if ((cur = lseek(fd, 0, SEEK_SET)) != 0)
           goto error;

       if (!hasHoles)
           return fullCopy? (although I think the rest would still work)

       if ((end = lseek(fd, 0, SEEK_END)) < 0)
           goto error;

       inHole = virFile|StreamIsInHole(fd, cur)
       while (cur < end)
           if (inHole)
                get size of hole
                if ret < 0, then size would be end-cur, right?
                update remote to write hole
                update cur to include size of hole
           else /* assume? in data */
                get size of data
                if ret < 0, then size would be end-cur, right?
                update remote to include data
                update cur to include size of data

I guess I just became concerned about the various decision/exit points
in virFileInData below and started wondering if they were really needed.


> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
> index 181e178..7132b3a 100644
> --- a/src/libvirt_private.syms
> +++ b/src/libvirt_private.syms
> @@ -1620,6 +1620,7 @@ virFileGetHugepageSize;
>  virFileGetMountReverseSubtree;
>  virFileGetMountSubtree;
>  virFileHasSuffix;
> +virFileInData;
>  virFileIsAbsPath;
>  virFileIsDir;
>  virFileIsExecutable;
> diff --git a/src/util/virfile.c b/src/util/virfile.c
> index cbfa384..093125c 100644
> --- a/src/util/virfile.c
> +++ b/src/util/virfile.c
> @@ -3793,6 +3793,87 @@ virFileComparePaths(const char *p1, const char *p2)
>   cleanup:
>      VIR_FREE(res1);
>      VIR_FREE(res2);
> +
> +    return ret;
> +}
> +
> +

s/b
int
virFileIndata(...)

Lacking any sort of description of input/returns. Lot's of variables
here too - whether fd valid/invalid could cause error... Setting *inData
based on where we are... Setting *length based on no returning a -1...
and of course returning -1 based on some lseek call failing for various
reasons.

> +int virFileInData(int fd,
> +                  int *inData,

Usage seems to be only boolean...

> +                  unsigned long long *length)
> +{
> +    int ret = -1;
> +    off_t cur, data, hole, end;
> +
> +    /* Get current position */
> +    cur = lseek(fd, 0, SEEK_CUR);
> +    if (cur == (off_t) -1) {
> +        virReportSystemError(errno, "%s",
> +                             _("Unable to get current position in file"));
> +        goto cleanup;
> +    }

One would hope this doesn't fail, still could @cur be input? I guess I
wonder why we're looking for current position if the purpose is to copy
from start to end ensuring that holes aren't copied, but rather just
created on the target.  I am assuming this function is for the "source"
side.

> +
> +    /* Now try to get data and hole offsets */
> +    data = lseek(fd, cur, SEEK_DATA);
> +
> +    /* There are four options:
> +     * 1) data == cur;  @cur is in data
> +     * 2) data > cur; @cur is in a hole, next data at @data
> +     * 3) data < 0, errno = ENXIO; either @cur is in trailing hole, or @cur is beyond EOF.
> +     * 4) data < 0, errno != ENXIO; we learned nothing

if data < 0 and errno = EINVAL, seeking data/holes isn't supported,
true?  If so then it might be a possible function to check if data/hole
seeking is supported.

> +     */
> +
> +    if (data == (off_t) -1) {
> +        /* cases 3 and 4 */
> +        if (errno != ENXIO) {
> +            virReportSystemError(errno, "%s",
> +                                 _("Unable to seek to data"));
> +            goto cleanup;
> +        }
> +
> +        *inData = 0;
> +        /* There are two situations now. There is always an
> +         * implicit hole at EOF. However, there might be a
> +         * trailing hole just before EOF too. If that's the case
> +         * report it. */
> +        if ((end = lseek(fd, 0, SEEK_END)) == (off_t) -1) {
> +            virReportSystemError(errno, "%s",
> +                                 _("Unable to seek to EOF"));
> +            goto cleanup;

So for the purpose of what you're trying to do should we care? I suppose
if you're in a hole and cannot find the end that might be odd. Still if
we know upfront what @cur and what @end are, then does some crazy
trailing hole/end/eof need to cause an error?

> +        }
> +        *length = end - cur;
> +    } else if (data > cur) {
> +        /* case 2 */
> +        *inData = 0;
> +        *length = data - cur;
> +    } else {
> +        /* case 1 */
> +        *inData = 1;
> +
> +        /* We don't know where does the next hole start. Let's
> +         * find out. Here we get the same 4 possibilities as
> +         * described above.*/
> +        hole = lseek(fd, data, SEEK_HOLE);
> +        if (hole == (off_t) -1 || hole == data) {

Now I'm wondering about the implicit hole @ EOF described above? and the
error checks from that that aren't done here.

Still does this matter in the "case" you want to use this for?  That is
if there's no HOLE from @data to @end, then for the purpose of stream
copying would you'd just be copying everything from data -> end (e.g.
@length = (end - data)? IOW, does it really matter?

> +            /* cases 1, 3 and 4 */
> +            /* Wait a second. The reason why we are here is
> +             * because we are in data. But at the same time we
> +             * are in a trailing hole? Wut!? Do the best what we
> +             * can do here. */
> +            virReportSystemError(errno, "%s",
> +                                 _("unable to seek to hole"));

Would the "best" we can do is just copy everything from this point to
the @end regardless of whether that's copying a hole?  Again it's just
another error that perhaps we could avoid.

> +            goto cleanup;
> +        } else {
> +            /* case 2 */
> +            *length = (hole - data);
> +        }
> +    }
> +
> +    ret = 0;
> + cleanup:
> +    /* At any rate, reposition back to where we started. */
> +    if (cur != (off_t) -1)
> +        ignore_value(lseek(fd, cur, SEEK_SET));
>      return ret;
>  }
>  
> diff --git a/src/util/virfile.h b/src/util/virfile.h
> index 41c25a2..32a1d3c 100644
> --- a/src/util/virfile.h
> +++ b/src/util/virfile.h
> @@ -340,4 +340,7 @@ int virFileReadValueInt(const char *path, int *value);
>  int virFileReadValueUint(const char *path, unsigned int *value);
>  int virFileReadValueBitmap(const char *path, int maxlen, virBitmapPtr *value);
>  
> +int virFileInData(int fd,
> +                  int *inData,
> +                  unsigned long long *length);

You could also go with Pavel's preferred format to follow the .c file
format.

>  #endif /* __VIR_FILE_H */
> diff --git a/tests/virfiletest.c b/tests/virfiletest.c
> index 702a76a..3e298dc 100644
> --- a/tests/virfiletest.c
> +++ b/tests/virfiletest.c
> @@ -21,6 +21,7 @@
>  #include <config.h>
>  
>  #include <stdlib.h>
> +#include <fcntl.h>
>  
>  #include "testutils.h"
>  #include "virfile.h"
> @@ -119,6 +120,190 @@ testFileSanitizePath(const void *opaque)
>  
>  
>  static int
> +makeSparseFile(const off_t offsets[],
> +               const bool startData);
> +
> +#ifdef __linux__
> +/* Create a sparse file. @offsets in KiB. */
> +static int
> +makeSparseFile(const off_t offsets[],
> +               const bool startData)
> +{
> +    int fd = -1;
> +    char path[] = abs_builddir "fileInData.XXXXXX";
> +    off_t len = 0;
> +    size_t i;
> +
> +    if ((fd = mkostemp(path,  O_CLOEXEC|O_RDWR)) < 0)
> +        goto error;
> +
> +    if (unlink(path) < 0)
> +        goto error;
> +
> +    for (i = 0; offsets[i] != (off_t) -1; i++)
> +        len += offsets[i] * 1024;
> +
> +    while (len) {
> +        const char buf[] = "abcdefghijklmnopqrstuvwxyz";
> +        off_t toWrite = sizeof(buf);
> +
> +        if (toWrite > len)
> +            toWrite = len;
> +
> +        if (safewrite(fd, buf, toWrite) < 0) {
> +            fprintf(stderr, "unable to write to %s (errno=%d)\n", path, errno);
> +            goto error;
> +        }
> +
> +        len -= toWrite;
> +    }
> +
> +    len = 0;
> +    for (i = 0; offsets[i] != (off_t) -1; i++) {
> +        bool inData = startData;
> +
> +        if (i % 2)
> +            inData = !inData;
> +
> +        if (!inData &&
> +            fallocate(fd,
> +                      FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                      len, offsets[i] * 1024) < 0) {
> +            fprintf(stderr, "unable to punch a hole at offset %lld length %lld\n",
> +                    (long long) len, (long long) offsets[i]);
> +            goto error;
> +        }
> +
> +        len += offsets[i] * 1024;
> +    }
> +
> +    if (lseek(fd, 0, SEEK_SET) == (off_t) -1) {
> +        fprintf(stderr, "unable to lseek (errno=%d)\n", errno);
> +        goto error;
> +    }
> +
> +    return fd;
> + error:
> +    VIR_FORCE_CLOSE(fd);
> +    return -1;
> +}
> +

I'm too lazy to think... Are making sure the following cases:

start-data-end
start-hole-data-end
start-data-hole-end
start-hole-data-hole-data-end
start-data-hole-data-hole-end
start-hole-data-hole-data-hole-end
start-data-hole-data-hole-data-end

Trying to think if the following would be valid:

start-hole-end

but my brain keeps telling me to stop thinking.

> +#else /* !__linux__ */
> +
> +static int
> +makeSparseFile(const off_t offsets[] ATTRIBUTE_UNUSED,
> +               const bool startData ATTRIBUTE_UNUSED)
> +{
> +    return -1;
> +}
> +
> +#endif /* !__linux__ */
> +
> +
> +#define EXTENT 4
> +static bool
> +holesSupported(void)
> +{
> +    off_t offsets[] = {EXTENT, EXTENT, EXTENT, -1};
> +    off_t tmp;
> +    int fd;
> +    bool ret = false;
> +
> +    if ((fd = makeSparseFile(offsets, true)) < 0)
> +        goto cleanup;
> +
> +    /* The way this works is: there are 4K of data followed by 4K hole followed
> +     * by 4K hole again. Check if the filesystem we are running the test suite
> +     * on supports holes. */
> +    if ((tmp = lseek(fd, 0, SEEK_DATA)) == (off_t) -1)
> +        goto cleanup;
> +
> +    if (tmp != 0)
> +        goto cleanup;
> +
> +    if ((tmp = lseek(fd, tmp, SEEK_HOLE)) == (off_t) -1)
> +        goto cleanup;
> +
> +    if (tmp != EXTENT * 1024)
> +        goto cleanup;
> +
> +    if ((tmp = lseek(fd, tmp, SEEK_DATA)) == (off_t) -1)
> +        goto cleanup;
> +
> +    if (tmp != 2 * EXTENT * 1024)
> +        goto cleanup;
> +
> +    if ((tmp = lseek(fd, tmp, SEEK_HOLE)) == (off_t) -1)
> +        goto cleanup;
> +
> +    if (tmp != 3 * EXTENT * 1024)
> +        goto cleanup;
> +
> +    ret = true;
> + cleanup:
> +    VIR_FORCE_CLOSE(fd);
> +    return ret;
> +}
> +
> +
> +struct testFileInData {
> +    bool startData;     /* whether the list of offsets starts with data section */
> +    off_t *offsets;
> +};
> +
> +
> +static int
> +testFileInData(const void *opaque)
> +{
> +    const struct testFileInData *data = opaque;
> +    int fd = -1;
> +    int ret = -1;
> +    size_t i;
> +
> +    if ((fd = makeSparseFile(data->offsets, data->startData)) < 0)
> +        goto cleanup;
> +
> +    for (i = 0; data->offsets[i] != (off_t) -1; i++) {
> +        bool shouldInData = data->startData;
> +        int realInData;
> +        unsigned long long shouldLen;
> +        unsigned long long realLen;
> +
> +        if (i % 2)
> +            shouldInData = !shouldInData;
> +
> +        if (virFileInData(fd, &realInData, &realLen) < 0)
> +            goto cleanup;
> +
> +        if (realInData != shouldInData) {
> +            fprintf(stderr, "Unexpected data/hole. Expected %s got %s\n",
> +                    shouldInData ? "data" : "hole",
> +                    realInData ? "data" : "hole");
> +            goto cleanup;
> +        }
> +
> +        shouldLen = data->offsets[i] * 1024;
> +        if (realLen != shouldLen) {
> +            fprintf(stderr, "Unexpected section length. Expected %lld got %lld\n",

lld on an ull?


John

I'll be OOO on Mon/Tue - just to "set" expectations of any chance at me
reading a response!


> +                    shouldLen, realLen);
> +            goto cleanup;
> +        }
> +
> +        if (lseek(fd, shouldLen, SEEK_CUR) < 0) {
> +            fprintf(stderr, "Unable to seek\n");
> +            goto cleanup;
> +        }
> +    }
> +
> +    ret = 0;
> +
> + cleanup:
> +    VIR_FORCE_CLOSE(fd);
> +    return ret;
> +}
> +
> +
> +static int
>  mymain(void)
>  {
>      int ret = 0;
> @@ -186,6 +371,24 @@ mymain(void)
>      DO_TEST_SANITIZE_PATH_SAME("gluster://bar.baz/fooo//hoo");
>      DO_TEST_SANITIZE_PATH_SAME("gluster://bar.baz/fooo///////hoo");
>  
> +#define DO_TEST_IN_DATA(inData, ...)                                        \
> +    do {                                                                    \
> +        off_t offsets[] = {__VA_ARGS__, -1};                                \
> +        struct testFileInData data = {                                      \
> +            .startData = inData, .offsets = offsets,                        \
> +        };                                                                  \
> +        if (virTestRun(virTestCounterNext(), testFileInData, &data) < 0)    \
> +            ret = -1;                                                       \
> +    } while (0)
> +
> +    if (holesSupported()) {
> +        DO_TEST_IN_DATA(true, 4, 4, 4);
> +        DO_TEST_IN_DATA(false, 4, 4, 4);
> +        DO_TEST_IN_DATA(true, 8, 8, 8);
> +        DO_TEST_IN_DATA(false, 8, 8, 8);
> +        DO_TEST_IN_DATA(true, 8, 16, 32, 64, 128, 256, 512);
> +        DO_TEST_IN_DATA(false, 8, 16, 32, 64, 128, 256, 512);
> +    }
>      return ret != 0 ? EXIT_FAILURE : EXIT_SUCCESS;
>  }
>  
> 




More information about the libvir-list mailing list