[Libguestfs] [PATCH nbdinfo v2 1/3] common/utils: Add function to convert sizes to human-readable

Mon Sep 20 16:39:59 UTC 2021

On Mon, Sep 20, 2021 at 06:25:34PM +0200, Laszlo Ersek wrote:
> On 09/20/21 13:04, Richard W.M. Jones wrote:
> > For example 1024 is returned as "1K".
> > 
> > This does not attempt to handle decimals or SI units.  If the number
> > isn't some multiple of a power of 1024 then it is returned as bytes (a
> > flag is available to indicate this).
> > 
> > I looked at both the gnulib and qemu versions of this function.  The
> > gnulib version is not under a license which is compatible with libnbd
> > and is also really complicated, although it does handle fractions and
> > SI units.  The qemu version is essentially just frexp + sprintf and
> > doesn't attempt to convert to the human-readable version reversibly.
> > ---
> >  .gitignore                     |  1 +
> >  common/utils/Makefile.am       | 10 +++-
> >  common/utils/human-size.c      | 54 +++++++++++++++++++++
> >  common/utils/human-size.h      | 49 +++++++++++++++++++
> >  common/utils/test-human-size.c | 89 ++++++++++++++++++++++++++++++++++
> >  5 files changed, 201 insertions(+), 2 deletions(-)
> > 
> > diff --git a/.gitignore b/.gitignore
> > index 2aa1fd99..5fc59677 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -31,6 +31,7 @@ Makefile.in
> >  /bash/nbdcopy
> >  /bash/nbdfuse
> >  /bash/nbdinfo
> > +/common/utils/test-human-size
> >  /common/utils/test-vector
> >  /compile
> >  /config.cache
> > diff --git a/common/utils/Makefile.am b/common/utils/Makefile.am
> > index 1ca4a370..b273ada1 100644
> > --- a/common/utils/Makefile.am
> > +++ b/common/utils/Makefile.am
> > @@ -34,6 +34,8 @@ include $(top_srcdir)/common-rules.mk
> >  noinst_LTLIBRARIES = libutils.la
> >  
> >  libutils_la_SOURCES = \
> > +	human-size.c \
> > +	human-size.h \
> >  	vector.c \
> >  	vector.h \
> >  	version.c \
> > @@ -50,8 +52,12 @@ libutils_la_LIBADD = \
> >  
> >  # Unit tests.
> >  
> > -TESTS = test-vector
> > -check_PROGRAMS = test-vector
> > +TESTS = test-human-size test-vector
> > +check_PROGRAMS = test-human-size test-vector
> > +
> > +test_human_size_SOURCES = test-human-size.c human-size.c human-size.h
> > +test_human_size_CPPFLAGS = -I$(srcdir)
> > +test_human_size_CFLAGS = $(WARNINGS_CFLAGS)
> >  
> >  test_vector_SOURCES = test-vector.c vector.c vector.h
> >  test_vector_CPPFLAGS = -I$(srcdir)
> > diff --git a/common/utils/human-size.c b/common/utils/human-size.c
> > new file mode 100644
> > index 00000000..772f2489
> > --- /dev/null
> > +++ b/common/utils/human-size.c
> > @@ -0,0 +1,54 @@
> > +/* nbd client library in userspace
> > + * Copyright (C) 2020-2021 Red Hat Inc.
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> 
> (1) General question: should we adopt "SPDX-License-Identifier"s? They
> are more succinct.

I'd usually follow what qemu or libvirt are doing, and as far as I can
see they are not using these.

> > +
> > +#include <config.h>
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <inttypes.h>
> > +
> > +#include "human-size.h"
> > +
> > +char *
> > +human_size (char *buf, uint64_t bytes, bool *human)
> > +{
> > +  static const char *ext[] = { "E", "P", "T", "G", "M", "K", "" };
> 
> (2) The following would be more frugal:
> 
>   static const char ext[][2] = { "E", "P", "T", "G", "M", "K", "" };
> 
> (each pointer in the current array is 4 bytes or 8 bytes).

Good idea - I'll try this.

> > +  size_t i;
> > +
> > +  if (buf == NULL) {
> > +    buf = malloc (HUMAN_SIZE_LONGEST);
> > +    if (buf == NULL)
> > +      return NULL;
> > +  }
> > +
> > +  /* Work out which extension to use, if any. */
> > +  for (i = 6; i >= 0; --i) {
> > +    if (bytes == 0 || (bytes & 1023) != 0)
> > +      break;
> > +    bytes /= 1024;
> > +  }
> 
> (3) "size_t" is an unsigned type. When the loop is entered with i==0,
> and the "break" is not taken, "i" will flip back to ((size_t)-1), a very
> large positive integer. In other words, the controlling expression of
> this loop is unfalsifiable.

Damn!  I thought GCC would warn me about that ...

> Functionally, that's not a problem; when we enter the loop with i==0, we
> will have shifted out 6*10 = 60 bits on the least significant end, we
> only have 4 bits left, and so (bytes == 0 || (bytes & 1023) != 0) will
> definitely fire. It's just that the controlling expression of the loop
> is misleading (it suggests it has some role, when it doesn't).

But it's still potentially dangerous code.  I'll see if there's a way
to fix this.  I see you've got a better suggestion below.

> (4) Style: I think bit-and (&) should be coupled with bit-shift (>>), or
> else division (/) should be coupled with remainder (%).

OK

> (5) It is not necessary to keep the zero check in the loop body. Once we
> determine (before the loop) that the input is nonzero, we know it must
> have at least one bit that is set. So we can just continue shifting out
> runs of ten zero bits, until we hit a run of ten bits that is not
> all-bits-zero.
> 
> Thus, how about:
> 
>   i = 6;
>   if (bytes != 0) {
>     while ((bytes & 1023) == 0) {
>       bytes >>= 10;
>       i--;
>     }
>   }
> 
> > +
> > +  /* Set to flag to true if we're going to add a human-readable extension. */
> 
> (4) "Set [the] flag to true"?

Oops, yes.

> > +  if (human)
> > +    *human = ext[i][0] != '\0';
> > +
> > +  snprintf (buf, HUMAN_SIZE_LONGEST, "%" PRIu64 "%s", bytes, ext[i]);
> > +  return buf;
> > +}
> > diff --git a/common/utils/human-size.h b/common/utils/human-size.h
> > new file mode 100644
> > index 00000000..9ee78803
> > --- /dev/null
> > +++ b/common/utils/human-size.h
> > @@ -0,0 +1,49 @@
> > +/* nbd client library in userspace
> > + * Copyright (C) 2020-2021 Red Hat Inc.
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#ifndef LIBNBD_HUMAN_SIZE_H
> > +#define LIBNBD_HUMAN_SIZE_H
> > +
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +
> > +/* If you allocate a buffer of at least this length in bytes and pass
> > + * it as the first parameter to human_size, then it will not overrun.
> > + */
> > +#define HUMAN_SIZE_LONGEST 64
> 
> (5) The integer constant expression
> 
>   ((sizeof (uint64_t) * 8 + 2) / 3 + 1)
> 
> would be more frugal (but we might not care).
>
> If we take the number of three-bit groups in the word, and divide that
> by three -- rounded up --, we get the number of necessary octal digits.
> The number of decimal digits needed never exceeds the number of octal
> digits needed, so this is safe. Add one character for the NUL terminator.)

Yeah I was just guessing here.  FWIW gnulib really does the hard work:

https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/human.h;h=61f110fc0215205039ec9729ab34e68686c2d102;hb=HEAD#l30

> > +
> > +/* Convert bytes to a human-readable string.
> > + *
> > + * This is roughly the opposite of nbdkit_parse_size.  It will convert
> > + * multiples of powers of 1024 to the appropriate human size with the
> > + * right extension like 'M' or 'G'.  Anything that cannot be converted
> > + * is returned as bytes.  The *human flag is set to true if the output
> > + * was abbreviated to a human-readable size, or false if it is just
> > + * bytes.
> > + *
> > + * If buf == NULL, a buffer is allocated and returned.  In this case
> > + * the returned buffer must be freed.
> > + *
> > + * buf may also be allocated by the caller, in which case it must be
> > + * at least HUMAN_SIZE_LONGEST bytes.
> > + *
> > + * On error the function returns an error and sets errno.
> > + */
> > +extern char *human_size (char *buf, uint64_t bytes, bool *human);
> > +
> > +#endif /* LIBNBD_HUMAN_SIZE_H */
> > diff --git a/common/utils/test-human-size.c b/common/utils/test-human-size.c
> > new file mode 100644
> > index 00000000..d35a21bf
> > --- /dev/null
> > +++ b/common/utils/test-human-size.c
> > @@ -0,0 +1,89 @@
> > +/* nbd client library in userspace
> > + * Copyright (C) 2020-2021 Red Hat Inc.
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#include <config.h>
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +#include <inttypes.h>
> > +#include <string.h>
> > +
> > +#include "human-size.h"
> > +
> > +static unsigned errors = 0;
> > +
> > +static void
> > +test (uint64_t bytes, const char *expected, bool expected_human_flag)
> > +{
> > +  char actual[HUMAN_SIZE_LONGEST];
> > +  bool actual_human_flag;
> > +
> > +  human_size (actual, bytes, &actual_human_flag);
> > +
> > +  if (strcmp (actual, expected) == 0 ||
> > +      actual_human_flag != expected_human_flag) {
> 
> (6) I don't understand this condition. If I understand the branch *body*
> correctly, this is the "test succeeded" case. For that, we need:
> 
>   strcmp (actual, expected) == 0 &&
>   actual_human_flag == expected_human_flag
> 
> The negation of this condition (for the "test failed" case) would be:
> 
>   strcmp (actual, expected) != 0 ||
>   actual_human_flag != expected_human_flag

Hmm this is a mess, let me fix this too ...

> > +    printf ("test-human-size: %" PRIu64 " -> \"%s\" (%s) OK\n",
> > +            bytes, actual, actual_human_flag ? "true" : "false");
> > +    fflush (stdout);
> > +  }
> > +  else {
> > +    fprintf (stderr,
> > +             "test-human-size: error: test case %" PRIu64
> > +             "expected \"%s\" (%s) "
> 
> (7) Missing space character in the format string between PRIu64 and
> "expected".

Ok.

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top