[Avocado-devel] Implement fetch-assets command line

Wed Sep 25 21:39:30 UTC 2019

On Wed, Sep 25, 2019 at 06:01:22PM -0300, Willian Rampazzo wrote:
> After going thru the suggestions, and thinking about it, I came up
> with the following starting point:
> 
> 1 - Support a limited test source code parsing. This feature brings a
> simple solution to those tests already using strings into their
> fetch_asset parameters, as stated by Cleber.
>

Keep in mind that I'm assuming this is a viable option without
attempting to do it myself.  I hope it's viable, because it makes
"simple things easy" from the test writer's side.

> 2 - Add an option to use assets.json, as stated by Amador. This file
> would sit in the test data folder and list all assets that are
> necessary for the test. As a list of assets, I thought about a new
> method, fetch_assets, as a wrapper that load all assets from the file
> and call fetch_asset for each one.
>

Sounds good.  Also consider the need for dynamic asset definition (the
idea about executable scripts that produces JSON).

> Using a command-line like 'avocado fetch-assets my_test.py' would
> first look for the configuration file assets.json into the data
> folder, if none were found, it would parse the source file looking for
> simple fetch_asset calls, purely constituted of strings as parameters.
>

Sounds good to me.

- Cleber.

> On Wed, Sep 25, 2019 at 12:35 PM Amador Pahim <amador at pahim.org> wrote:
> >
> > On Wed, Sep 25, 2019 at 4:42 PM Cleber Rosa <crosa at redhat.com> wrote:
> > >
> > > On Wed, Sep 25, 2019 at 03:16:55PM +0200, Amador Pahim wrote:
> > > > On Tue, Sep 24, 2019 at 9:42 PM Willian Rampazzo <wrampazz at redhat.com> wrote:
> > > > >
> > > > > Hello everyone,
> > > > >
> > > > > I started working on Trello card
> > > > > https://trello.com/c/T3SC1sZs/1521-implement-fetch-assets-command-line
> > > > > as part of a broader card,
> > > > > https://trello.com/c/CKP7YS6G/1481-on-cache-check-for-asset-fetcher
> > > > > and I would like to bring my findings to a discussion.
> > > > >
> > > > > One way to implement that would be to parse the test source looking
> > > > > for the fetch_asset call and execute it. In theory, it is a straight
> > > > > forward implementation.
> > > > >
> > > > > I have started working in the parser. After some simple tests, some
> > > > > complex situations started to show up. Let me bring an existing
> > > > > example, examples/tests/assets.py:
> > > > >
> > > > >     def setUp(self):
> > > > >         mirrors = ['https://mirrors.peers.community/mirrors/gnu/hello/',
> > > > >                    'https://mirrors.kernel.org/gnu/hello/',
> > > > >                    'http://gnu.c3sl.ufpr.br/ftp/',
> > > > >                    'ftp://ftp.funet.fi/pub/gnu/prep/hello/']
> > > > >         hello = 'hello-2.9.tar.gz'
> > > > >         hello_locations = ["%s/%s" % (loc, hello) for loc in mirrors]
> > > > >         hello_sig = 'hello-2.9.tar.gz.sig'
> > > > >         hello_sig_locations = ["%s/%s" % (loc, hello_sig) for loc in mirrors]
> > > > >         self.hello = self.fetch_asset(
> > > > >             name=hello,
> > > > >             locations=hello_locations)
> > > > >         self.hello_sig = self.fetch_asset(
> > > > >             name=hello_sig,
> > > > >             asset_hash='f3b9fae20c35740004ae7b8de1301836dab4ac30',
> > > > >             locations=hello_sig_locations)
> > > > >
> > > > > When the parser finds the fetch_asset call, it needs to inspect its
> > > > > arguments, looking for variables. If any variable is found, it needs
> > > > > to walk back the code looking for its assignment, so it is able to
> > > > > execute it prior to the fetch_asset execution. If the variables
> > > > > consist of other variables, the parser, again, needs to walk back in
> > > > > the code looking for those assignments, and so on.
> > > > >
> > > > > There are countless ways of creating the right side of a variable
> > > > > assignment, from a single string assignment to a function that builds
> > > > > the content, or conditional assignments.
> > > > >
> > > > > My initial idea is to cover variables consisting of other variables
> > > > > and list comprehension, just like in the example. For now, other
> > > > > complex constructions would be out of this implementation.
> > > > >
> > > > > Any comment, concern, suggestion, is appreciated here as it would help
> > > > > to build a more robust code.
> > > >
> > > > The problem with not covering all the cases is that, well, you're not
> > > > covering all the cases :)
> > > > And the parsing of code to execute snippets out of it is always a
> > > > tricky thing to do. What if your variable is being filled by a
> > > > function call that acquires data from an external source that requires
> > > > a specific environment... how to reproduce that environment in the
> > > > command liner? There are way too many cases were things can go wrong.
> > > >
> > > > Have you considered adding support for args-from-config-file in the
> > > > asset fetcher? Something like:
> > > >
> > > > asset.Asset(args_from_config='/foo/bar/my_asset_01.yaml')
> > > >
> > >
> > > That's an interesting approach, in which the upside is the ease of
> > > parsing the data, and the downside is that requiring developers
> > > writing tests to use an external file.  Requiring the "test" to be
> > > split in multiple files may be a useful feature to scale a very large
> > > test, but requiring it for all cases is not a good thing.  We may find
> > > out that, out of the viable options, that this turns out to be the
> > > best one, but we should keep the cons in mind.
> > >
> > > Anyway, this actually reminds me of a older idea, on the topic of
> > > parameters, in which we'd use the test's data directories to fetch
> > > files containing parameters that would be applied by default to tests.
> > > But, if we decide on the data format of those files, say YAML or JSON
> > > (wether they contain parameters or asset definitions), we'd be limited
> > > by the static nature of it.
> > >
> > > I remember thinking about that problem, and realizing that a
> > > compromise may be to also allow for executable files that would
> > > dynamic content.  So, for test `my.py:My:test_asset`, we would look
> > > for in its data sources directories:
> > >
> > >   * my.py/My/test_asset.data/
> > >   * my.py/My.data/
> > >   * my.py.data/
> > >
> > > Within any of the data directories we could have plugins that would
> > > either read the asset files and produce parameters for asset.Asset(),
> > > or plugins that would execute files and produce the same thing.  For
> > > instance:
> > >
> > >   * my.py/My/test_asset.data/assets.json:
> > >
> > >    {"name": "hello-2.9.tar.gz",
> > >     "locations": ["https://mirrors.peers.community/mirrors/gnu/hello/hello-2.9.tar.gz",
> > >                   "https://mirrors.kernel.org/gnu/hello/hello-2.9.tar.gz",
> > >                   "http://gnu.c3sl.ufpr.br/ftp/hello-2.9.tar.gz",
> > >                   "ftp://ftp.funet.fi/pub/gnu/prep/hello/hello-2.9.tar.gz"],
> > >     "asset_hash": "cb0470b0e8f4f7768338f5c5cfe1688c90fbbc74"}
> > >
> > >   * my.py/My/test_asset.data/assets.py:
> > >
> > >    #!/usr/bin/env python
> > >    import json
> > >
> > >    name = "hello-2.9.tar.gz"
> > >    mirrors = ['https://mirrors.peers.community/mirrors/gnu/hello/',
> > >               'https://mirrors.kernel.org/gnu/hello/',
> > >               'http://gnu.c3sl.ufpr.br/ftp/',
> > >               'ftp://ftp.funet.fi/pub/gnu/prep/hello/']
> > >    locations = ["%s/%s" % (loc, name) for loc in mirrors]
> > >
> > >    asset = {"name": name,
> > >             "locations": locations,
> > >             "asset_hash": "cb0470b0e8f4f7768338f5c5cfe1688c90fbbc74"}
> > >    print(json.dumps(asset))
> >
> > I really like this. I'd specify some interface though, say importing
> > my.py/My/test_asset.data/assets.py and calling its main(), expecting
> > the json as the returned data, but I got your point and it looks like
> > a very good way to go. A basic plugin that gets the file.json content
> > from the data sources would be a nice first step.
> >
> > >
> > > The executable option (my.py/My/test_asset.data/assets.py), while it
> > > somehow violates the "don't execute *test* code before running the
> > > test" principle, it's an opt-in feature, and still wouldn't be
> > > executed while listing tests.  When checking and downloading assets,
> > > though, it would be executed.
> > >
> > > > Or, instead of a separate config file, I'd also consider some reserved
> > > > keys in the parametrization interface (we have some already:
> > > > https://avocado-framework.readthedocs.io/en/latest/WritingTests.html?highlight=timeout#setting-a-test-timeout).
> > > >
> > >
> > > This is similar to:
> > >
> > >   https://trello.com/c/WPd4FrIy/1479-add-support-to-specify-assets-in-test-docstring
> > >
> > > But your suggestion is better in the sense that users could add
> > > dynamic content in such a variable, and at the same time would limit
> > > or make the the parsing of such content dificult before executing the
> > > test IIUC.
> > >
> > > - Cleber.
> > >
> > > > That way you would enable the asset fetcher caching tool to read from
> > > > the same configuration input and cache things in advance. Anyone
> > > > willing to use that feature would have to either use the
> > > > args_from_config also in their tests or specify the args accordingly.
> > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Willian Rampazzo
> > > > > Software Engineer
> > > > > Red Hat Brazil
> > > > > 2717 337F 7E4A 5FDF
> > > > >
> > > >
> > > >
> > > > --
> > > > Pahim
> > > >
> >
> >
> >
> > --
> > Pahim