[Avocado-devel] Implement fetch-assets command line

Wed Sep 25 15:35:10 UTC 2019

On Wed, Sep 25, 2019 at 4:42 PM Cleber Rosa <crosa at redhat.com> wrote:
>
> On Wed, Sep 25, 2019 at 03:16:55PM +0200, Amador Pahim wrote:
> > On Tue, Sep 24, 2019 at 9:42 PM Willian Rampazzo <wrampazz at redhat.com> wrote:
> > >
> > > Hello everyone,
> > >
> > > I started working on Trello card
> > > https://trello.com/c/T3SC1sZs/1521-implement-fetch-assets-command-line
> > > as part of a broader card,
> > > https://trello.com/c/CKP7YS6G/1481-on-cache-check-for-asset-fetcher
> > > and I would like to bring my findings to a discussion.
> > >
> > > One way to implement that would be to parse the test source looking
> > > for the fetch_asset call and execute it. In theory, it is a straight
> > > forward implementation.
> > >
> > > I have started working in the parser. After some simple tests, some
> > > complex situations started to show up. Let me bring an existing
> > > example, examples/tests/assets.py:
> > >
> > >     def setUp(self):
> > >         mirrors = ['https://mirrors.peers.community/mirrors/gnu/hello/',
> > >                    'https://mirrors.kernel.org/gnu/hello/',
> > >                    'http://gnu.c3sl.ufpr.br/ftp/',
> > >                    'ftp://ftp.funet.fi/pub/gnu/prep/hello/']
> > >         hello = 'hello-2.9.tar.gz'
> > >         hello_locations = ["%s/%s" % (loc, hello) for loc in mirrors]
> > >         hello_sig = 'hello-2.9.tar.gz.sig'
> > >         hello_sig_locations = ["%s/%s" % (loc, hello_sig) for loc in mirrors]
> > >         self.hello = self.fetch_asset(
> > >             name=hello,
> > >             locations=hello_locations)
> > >         self.hello_sig = self.fetch_asset(
> > >             name=hello_sig,
> > >             asset_hash='f3b9fae20c35740004ae7b8de1301836dab4ac30',
> > >             locations=hello_sig_locations)
> > >
> > > When the parser finds the fetch_asset call, it needs to inspect its
> > > arguments, looking for variables. If any variable is found, it needs
> > > to walk back the code looking for its assignment, so it is able to
> > > execute it prior to the fetch_asset execution. If the variables
> > > consist of other variables, the parser, again, needs to walk back in
> > > the code looking for those assignments, and so on.
> > >
> > > There are countless ways of creating the right side of a variable
> > > assignment, from a single string assignment to a function that builds
> > > the content, or conditional assignments.
> > >
> > > My initial idea is to cover variables consisting of other variables
> > > and list comprehension, just like in the example. For now, other
> > > complex constructions would be out of this implementation.
> > >
> > > Any comment, concern, suggestion, is appreciated here as it would help
> > > to build a more robust code.
> >
> > The problem with not covering all the cases is that, well, you're not
> > covering all the cases :)
> > And the parsing of code to execute snippets out of it is always a
> > tricky thing to do. What if your variable is being filled by a
> > function call that acquires data from an external source that requires
> > a specific environment... how to reproduce that environment in the
> > command liner? There are way too many cases were things can go wrong.
> >
> > Have you considered adding support for args-from-config-file in the
> > asset fetcher? Something like:
> >
> > asset.Asset(args_from_config='/foo/bar/my_asset_01.yaml')
> >
>
> That's an interesting approach, in which the upside is the ease of
> parsing the data, and the downside is that requiring developers
> writing tests to use an external file.  Requiring the "test" to be
> split in multiple files may be a useful feature to scale a very large
> test, but requiring it for all cases is not a good thing.  We may find
> out that, out of the viable options, that this turns out to be the
> best one, but we should keep the cons in mind.
>
> Anyway, this actually reminds me of a older idea, on the topic of
> parameters, in which we'd use the test's data directories to fetch
> files containing parameters that would be applied by default to tests.
> But, if we decide on the data format of those files, say YAML or JSON
> (wether they contain parameters or asset definitions), we'd be limited
> by the static nature of it.
>
> I remember thinking about that problem, and realizing that a
> compromise may be to also allow for executable files that would
> dynamic content.  So, for test `my.py:My:test_asset`, we would look
> for in its data sources directories:
>
>   * my.py/My/test_asset.data/
>   * my.py/My.data/
>   * my.py.data/
>
> Within any of the data directories we could have plugins that would
> either read the asset files and produce parameters for asset.Asset(),
> or plugins that would execute files and produce the same thing.  For
> instance:
>
>   * my.py/My/test_asset.data/assets.json:
>
>    {"name": "hello-2.9.tar.gz",
>     "locations": ["https://mirrors.peers.community/mirrors/gnu/hello/hello-2.9.tar.gz",
>                   "https://mirrors.kernel.org/gnu/hello/hello-2.9.tar.gz",
>                   "http://gnu.c3sl.ufpr.br/ftp/hello-2.9.tar.gz",
>                   "ftp://ftp.funet.fi/pub/gnu/prep/hello/hello-2.9.tar.gz"],
>     "asset_hash": "cb0470b0e8f4f7768338f5c5cfe1688c90fbbc74"}
>
>   * my.py/My/test_asset.data/assets.py:
>
>    #!/usr/bin/env python
>    import json
>
>    name = "hello-2.9.tar.gz"
>    mirrors = ['https://mirrors.peers.community/mirrors/gnu/hello/',
>               'https://mirrors.kernel.org/gnu/hello/',
>               'http://gnu.c3sl.ufpr.br/ftp/',
>               'ftp://ftp.funet.fi/pub/gnu/prep/hello/']
>    locations = ["%s/%s" % (loc, name) for loc in mirrors]
>
>    asset = {"name": name,
>             "locations": locations,
>             "asset_hash": "cb0470b0e8f4f7768338f5c5cfe1688c90fbbc74"}
>    print(json.dumps(asset))

I really like this. I'd specify some interface though, say importing
my.py/My/test_asset.data/assets.py and calling its main(), expecting
the json as the returned data, but I got your point and it looks like
a very good way to go. A basic plugin that gets the file.json content
from the data sources would be a nice first step.

>
> The executable option (my.py/My/test_asset.data/assets.py), while it
> somehow violates the "don't execute *test* code before running the
> test" principle, it's an opt-in feature, and still wouldn't be
> executed while listing tests.  When checking and downloading assets,
> though, it would be executed.
>
> > Or, instead of a separate config file, I'd also consider some reserved
> > keys in the parametrization interface (we have some already:
> > https://avocado-framework.readthedocs.io/en/latest/WritingTests.html?highlight=timeout#setting-a-test-timeout).
> >
>
> This is similar to:
>
>   https://trello.com/c/WPd4FrIy/1479-add-support-to-specify-assets-in-test-docstring
>
> But your suggestion is better in the sense that users could add
> dynamic content in such a variable, and at the same time would limit
> or make the the parsing of such content dificult before executing the
> test IIUC.
>
> - Cleber.
>
> > That way you would enable the asset fetcher caching tool to read from
> > the same configuration input and cache things in advance. Anyone
> > willing to use that feature would have to either use the
> > args_from_config also in their tests or specify the args accordingly.
> >
> > >
> > > Best regards,
> > >
> > > Willian Rampazzo
> > > Software Engineer
> > > Red Hat Brazil
> > > 2717 337F 7E4A 5FDF
> > >
> >
> >
> > --
> > Pahim
> >

-- 
Pahim