[Avocado-devel] Implement fetch-assets command line

Wed Sep 25 14:42:44 UTC 2019

On Wed, Sep 25, 2019 at 03:16:55PM +0200, Amador Pahim wrote:
> On Tue, Sep 24, 2019 at 9:42 PM Willian Rampazzo <wrampazz at redhat.com> wrote:
> >
> > Hello everyone,
> >
> > I started working on Trello card
> > https://trello.com/c/T3SC1sZs/1521-implement-fetch-assets-command-line
> > as part of a broader card,
> > https://trello.com/c/CKP7YS6G/1481-on-cache-check-for-asset-fetcher
> > and I would like to bring my findings to a discussion.
> >
> > One way to implement that would be to parse the test source looking
> > for the fetch_asset call and execute it. In theory, it is a straight
> > forward implementation.
> >
> > I have started working in the parser. After some simple tests, some
> > complex situations started to show up. Let me bring an existing
> > example, examples/tests/assets.py:
> >
> >     def setUp(self):
> >         mirrors = ['https://mirrors.peers.community/mirrors/gnu/hello/',
> >                    'https://mirrors.kernel.org/gnu/hello/',
> >                    'http://gnu.c3sl.ufpr.br/ftp/',
> >                    'ftp://ftp.funet.fi/pub/gnu/prep/hello/']
> >         hello = 'hello-2.9.tar.gz'
> >         hello_locations = ["%s/%s" % (loc, hello) for loc in mirrors]
> >         hello_sig = 'hello-2.9.tar.gz.sig'
> >         hello_sig_locations = ["%s/%s" % (loc, hello_sig) for loc in mirrors]
> >         self.hello = self.fetch_asset(
> >             name=hello,
> >             locations=hello_locations)
> >         self.hello_sig = self.fetch_asset(
> >             name=hello_sig,
> >             asset_hash='f3b9fae20c35740004ae7b8de1301836dab4ac30',
> >             locations=hello_sig_locations)
> >
> > When the parser finds the fetch_asset call, it needs to inspect its
> > arguments, looking for variables. If any variable is found, it needs
> > to walk back the code looking for its assignment, so it is able to
> > execute it prior to the fetch_asset execution. If the variables
> > consist of other variables, the parser, again, needs to walk back in
> > the code looking for those assignments, and so on.
> >
> > There are countless ways of creating the right side of a variable
> > assignment, from a single string assignment to a function that builds
> > the content, or conditional assignments.
> >
> > My initial idea is to cover variables consisting of other variables
> > and list comprehension, just like in the example. For now, other
> > complex constructions would be out of this implementation.
> >
> > Any comment, concern, suggestion, is appreciated here as it would help
> > to build a more robust code.
> 
> The problem with not covering all the cases is that, well, you're not
> covering all the cases :)
> And the parsing of code to execute snippets out of it is always a
> tricky thing to do. What if your variable is being filled by a
> function call that acquires data from an external source that requires
> a specific environment... how to reproduce that environment in the
> command liner? There are way too many cases were things can go wrong.
> 
> Have you considered adding support for args-from-config-file in the
> asset fetcher? Something like:
> 
> asset.Asset(args_from_config='/foo/bar/my_asset_01.yaml')
>

That's an interesting approach, in which the upside is the ease of
parsing the data, and the downside is that requiring developers
writing tests to use an external file.  Requiring the "test" to be
split in multiple files may be a useful feature to scale a very large
test, but requiring it for all cases is not a good thing.  We may find
out that, out of the viable options, that this turns out to be the
best one, but we should keep the cons in mind.

Anyway, this actually reminds me of a older idea, on the topic of
parameters, in which we'd use the test's data directories to fetch
files containing parameters that would be applied by default to tests.
But, if we decide on the data format of those files, say YAML or JSON
(wether they contain parameters or asset definitions), we'd be limited
by the static nature of it.

I remember thinking about that problem, and realizing that a
compromise may be to also allow for executable files that would
dynamic content.  So, for test `my.py:My:test_asset`, we would look
for in its data sources directories:

  * my.py/My/test_asset.data/
  * my.py/My.data/
  * my.py.data/

Within any of the data directories we could have plugins that would
either read the asset files and produce parameters for asset.Asset(),
or plugins that would execute files and produce the same thing.  For
instance:

  * my.py/My/test_asset.data/assets.json:

   {"name": "hello-2.9.tar.gz",
    "locations": ["https://mirrors.peers.community/mirrors/gnu/hello/hello-2.9.tar.gz",
                  "https://mirrors.kernel.org/gnu/hello/hello-2.9.tar.gz",
                  "http://gnu.c3sl.ufpr.br/ftp/hello-2.9.tar.gz",
                  "ftp://ftp.funet.fi/pub/gnu/prep/hello/hello-2.9.tar.gz"],
    "asset_hash": "cb0470b0e8f4f7768338f5c5cfe1688c90fbbc74"}

  * my.py/My/test_asset.data/assets.py:

   #!/usr/bin/env python
   import json

   name = "hello-2.9.tar.gz"
   mirrors = ['https://mirrors.peers.community/mirrors/gnu/hello/',
              'https://mirrors.kernel.org/gnu/hello/',
              'http://gnu.c3sl.ufpr.br/ftp/',
              'ftp://ftp.funet.fi/pub/gnu/prep/hello/']
   locations = ["%s/%s" % (loc, name) for loc in mirrors]

   asset = {"name": name,
            "locations": locations,
            "asset_hash": "cb0470b0e8f4f7768338f5c5cfe1688c90fbbc74"}
   print(json.dumps(asset))

The executable option (my.py/My/test_asset.data/assets.py), while it
somehow violates the "don't execute *test* code before running the
test" principle, it's an opt-in feature, and still wouldn't be
executed while listing tests.  When checking and downloading assets,
though, it would be executed.

> Or, instead of a separate config file, I'd also consider some reserved
> keys in the parametrization interface (we have some already:
> https://avocado-framework.readthedocs.io/en/latest/WritingTests.html?highlight=timeout#setting-a-test-timeout).
>

This is similar to:

  https://trello.com/c/WPd4FrIy/1479-add-support-to-specify-assets-in-test-docstring

But your suggestion is better in the sense that users could add
dynamic content in such a variable, and at the same time would limit
or make the the parsing of such content dificult before executing the
test IIUC.

- Cleber.

> That way you would enable the asset fetcher caching tool to read from
> the same configuration input and cache things in advance. Anyone
> willing to use that feature would have to either use the
> args_from_config also in their tests or specify the args accordingly.
>
> >
> > Best regards,
> >
> > Willian Rampazzo
> > Software Engineer
> > Red Hat Brazil
> > 2717 337F 7E4A 5FDF
> >
> 
> 
> -- 
> Pahim
>