[Avocado-devel] NRunner: decide on a "wire-format" for time/dates

Tue Nov 12 17:48:03 UTC 2019

On Thu, Oct 31, 2019 at 12:23:58PM -0400, Cleber Rosa wrote:
> On Tue, Oct 29, 2019 at 01:14:30PM -0300, Beraldo Leal wrote:
> > Hi all,
> > 
> > So, we have a Trello card [1] to discuss what date/time format we are
> > going to adopt when saving date/time on a file.
> >
> 
> Hi Beraldo,
> 
> I don't think I meant that the date/time format to be discussed and
> defined was meant to be primary saved on a file.  The mention of it
> being used as a "wire-format" was my attempt to signal the primary
> use.  But, let me make it start with a clearer definition of the
> current state of the "nrunner" code.
> 
> The "avocado nrun" command is, right now, an ad-hoc implementation of
> something similar to an Avocado job.  A loose definition of an Avocado
> job's role is that it runs and collects results for one or more tests.
> The closest thing to collecting results from many jobs there's in
> nrunner right now is the status server[1], which waits for those
> status messages on a TCP socket.  Those messages are currently encoded
> as JSON, so the format of the date/time would has to be encoded in
> either a JSON string or number.
> 
> Note: I'm already working on alternative implementations that
> integrates the nrunner execution into the existing Avocado Job code, by
> writing a "nrunner based" test runner implementation, whose interface
> has now been defined[2] and it's used even by the regular runner[3].
> 
> The "nrunner" based runners, then, have the resposibility of publishing
> relevant events, including test start and test end time.  It's this date/time
> format that I'm most concerned with, because, once those are collected
> by the results server (or job depending on the implementation) it can
> certainly be stored or presented in an alternative format if it makes
> sense to do so.
> 
> > I'm moving the discussion here because it seems better to discuss here
> > than on Trello.
> >
> 
> For sure!
> 
> > When it comes to date/time storage format, I can think of two very
> > well-used standards: 1. Unix Time and 2. ISO 8601.
> > 
> > I’m in favor of the “disambiguation” feature. Read a date/time and not
> > have to guess which timezone is a plus.
> > 
> > I think that few questions should be answered before we decide this:
> > 
> >   1. Is storage a problem?
> 
> I would certainly like to save a few bytes on each message that
> contains a date/time, provided everything else is equal.
> 
> But, to be honest, I don't think reading a JSON number as a date (say
> for Unix time) or a string (say for ISO 8601) would have a signficant
> impact on the transmission/processing/storage costs.  I think if we
> come to the point of needing to optmize the communication, a more
> comprehensive change, such as replacing the protocol/encoding
> altogether would probably yield the best results.
> 
> >   2. Is a CPU bound problem to parse this date/time?
> 
> Like I said before, I doubt that the "status server" would have its
> CPU pressured just for parsing the date/time, no matter the format.  I
> think it's more important that the test runner is given as little work
> as possible, though, so that it causes as little disturbance as
> possible on the test and on the tested system.  Think of low powered
> embedded systems running a test, for instance.  Being able to use a
> native data type and cheap encoding would be favorable IMO.
> 
> >   3. Who is going to read this information? Machine or human?
> >
> 
> Initially the "raw" info is machine readable, even though most people
> would agree that JSON is quite human readable.  When it comes to the
> date/time format itself, a Unix time has poor human readability.
> 
> > I believe that by answering these questions, we can go smoothly with
> > one format or another, as all languages have libraries to handle it.
> >
> 
> Agreed.  I hope I was able to give my general impression on the
> requirements above and answered those points.
> 
> > I have listed below the advantages and disadvantages that I have been
> > able to collect so far. Feel free to add or comment about.
> > 
> > # Unix Time / Posix Time / Epoch Time
> > ## Advantages:
> >   * Better for machine readability;
> >   * Optimized for storage;
> >   * Very well-known with builtin libraries in many languages;
> > 
> > ## Disadvantages:
> >   * No timezone support (assumes UTC);
> >   * Leap seconds are ignored;
> 
> That was news to me.  After reading an article[4] I think it doesn't
> impact our use case.
> 
> >   * Cannot store values before “1970-01-01 00:00:00 UTC”;
> 
> Shouldn't be a problem, as we're not supposed to store tests started
> or that have ended before that. :) 
> 
> >   * On 32-bit systems there is the “Year 2038 problem”;
> 
> This is trickier... and I hate to feel cornered by it.  Even if, to
> the best of my knowledge and assumptions, we won't be dealing with
> 32-bit systems by then, or, the problem would have been solved /
> worked around at another layer.
> 
> <joke>TBH, you shouldn't had mentioned this!</joke>
> 
> > 
> > ## Examples using Unix Time:
> >   * 915148800.25
> >   * 1095379201.00
> >
> 
> The presentation aspect is really what bothers me, which is in direct
> conflict with the fact that the primary consumers of the nrunner
> messages are not humans.  But, given that one can easily see that output
> by running, say, "avocado runnable-run ...", I was bothered by it.
> 
> Anyway, I'm going to dismiss those feelings on the basis of the
> primary use cases.
> 
> > # ISO 8601
> > ## Advantages:
> >   * Better for human readability;
> 
> For sure.
> 
> >   * Very well-known international standard with builtin libraries in
> > many languages;
> >      (First edition in 1988 and updated until nowadays);
> >   * UTC time zone can be represented by only one “Z” char;
> 
> Interesting.
> 
> >   * The lexicographical order of the representation thus corresponds
> > to chronological order;
> 
> Also interesting.
> 
> >     (except for date representations involving negative years or time offset);
> >   * A fraction may be added to the lowest order time element in the
> > representation.
> >     (A decimal mark, either a comma or a dot can be used);
> >   * There is no limit on the number of decimal places for the decimal fraction;
> 
> Does this mean that a very high time resolution can be used?  This was
> one of the questions/concerns I had on the back of my mind...
> 
> >   * Has support for a basic format (without - or : ) and an extended
> > format with separators added to enhance human readability
> >   (The standard notes that: "The basic format should be avoided in
> > plain text.");
> > 
> > ## Disadvantages:
> >   * Needs more time to parse (not so optimal for machine parsing);
> 
> True, but as I've said before, I think the cost of producing it is
> more important than the cost of parsing it (as the results server
> should have much more resources than the test runner).
> 
> >   * Needs more space to store;
> >
> 
> True... for instance, Python's time.time() gives me:
> 
>    >>> len(json.dumps(time.time()))
>    18
> 
> While for ISO 8601 with  
> 
>    >>> len(json.dumps(datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()))
>    34
> 
> > ## Examples using ISO 8601:
> >   * 2019-10-29T11:22:32+00:00
> >   * 2019-10-29T11:22:32Z
> >   * 20191029T112232Z
> >
> 
> I like the last example a lot, but that is the one suggested by the
> standard notes to not be used, right?
> 
> > If the answers to questions 1 and 2 are "no", I think that I would go
> > with ISO 8601 using 'Z' as UTC timezone, always.
> > 
> > And you? Any thoughts? Do you have a third option?
> 
> I think those two are the real contenders indeed.  I'm wondering if
> both formats shouldn't be supported by the status server when reading
> the messages, so that the writing of native runners would be
> facilitated and the load on them would be minimized.
> 
> For the runners producing UNIX times, we could even have something like:
> 
>  $ avocado runnable-run ... | ./contrib/scripts/avocado-beautify-status-messages
> 
> In the best UNIX tradition.
> 
> Thanks for the thorought analisys!
> - Cleber.
>

For adding closure to this topic, it's my understanding that, given it's
a "wire-format", we can keep using Unix time.

- Cleber.

> > 
> > [1] - https://trello.com/c/w4iFhDfM
> > 
> > Regards,
> > -- 
> > Beraldo Leal
> > Senior Software Engineer, Virtualization Team
> > Red Hat
> > 
> 
> [1] https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/avocado/core/nrunner.py#L522
> [2] https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/avocado/core/plugin_interfaces.py#L290
> [3] https://github.com/avocado-framework/avocado/blob/f1cdf81284e01ae2c20b2392b1e3718aefbeec2c/setup.py#L128
> [4] https://derickrethans.nl/leap-seconds-and-what-to-do-with-them.html
>