[Avocado-devel] RFC: Avocado Job API
Lukáš Doktor
ldoktor at redhat.com
Tue Apr 12 09:43:16 UTC 2016
Dne 12.4.2016 v 10:06 Lukáš Doktor napsal(a):
> Hello Cleber,
>
> in general I welcome this RFC. This is my 3rd attempt to make my
> response understandable. First I'm mentioning the problems, but some
> explanations follow at the end of the email.
>
> Dne 11.4.2016 v 14:09 Cleber Rosa napsal(a):
>> Note: the same content on this message is available at:
>>
>> https://github.com/clebergnu/avocado/blob/rfc_job_api/docs/rfcs/job-api.rst
>>
>>
>> Some users may find it easier to read with a prettier formatting.
>>
>> Problem statement
>> =================
>>
>> An Avocado job is created by running the command line ``avocado``
>> application with the ``run`` command, such as::
>>
>> $ avocado run passtest.py
>>
>> But most of Avocado's power is activated by additional command line
>> arguments, such as::
>>
>> $ avocado run passtest.py --vm-domain=vm1
>> $ avocado run passtest.py --remote-hostname=machine1
>>
>> Even though Avocado supports many features, such as running tests
>> locally, on a Virtual Machine and on a remote host, only one those can
>> be used on a given job.
>>
>> The observed limitations are:
>>
>> * Job creation is limited by the expressiveness of command line
>> arguments, this causes mutual exclusion of some features
>> * Mapping features to a subset of tests or conditions is not possible
>> * Once created, and while running, a job can not have its status
>> queried and can not be manipulated
>>
>> Even though Avocado is a young project, its current feature set
>> already exceeds its flexibility. Unfortunately, advanced users are
>> not always free to mix and match those features at will.
>>
>> Reviewing and Evaluating Avocado
>> ================================
>>
>> In light of the given problem, let's take a look at what Avocado is,
>> both by definition and based on its real world, day to day, usage.
>>
>> Avocado By Definition
>> ---------------------
>>
>> Avocado is, by definition, "a set of tools and libraries to help with
>> automated testing". Here, some points can be made about the two
>> components that Avocado are made of:
>>
>> 1. Libraries are commonly flexible enough and expose the right
>> features in a consistent way. Libraries that provide good APIs
>> allow users to solve their own problems, not always anticipated by
>> the library authors.
>>
>> 2. The majority of the Avocado library code fall in two categories:
>> utility and test APIs. Avocado's core libraries are so far, not
>> intended to be consumed by third party code and its use is not
>> supported in any way.
>>
>> 3. Tools (as in command line applications), are commonly a lot less
>> flexible than libraries. Even the ones driven by command line
>> arguments, configuration files and environment variables fall
>> short in flexibility when compared to libraries. That is true even
>> when respecting the basic UNIX principles and features that help to
>> reuse and combine different tools in a single shell session.
>>
>> How Avocado is used
>> -------------------
>>
>> The vast majority of the observed Avocado use cases, present and
>> future, includes running tests. Given the Avocado architecture and
>> its core concepts, this means running a job.
>>
>> Avocado, with regards to its real world usage, is pretty much a job
>> (and test) runner, and there's no escaping that. It's probable that,
>> for every one hundredth ``avocado run`` commands, a different
>> ``avocado <subcommand>`` is executed.
>>
>> Proposed solution & RFC goal
>> ----------------------------
>>
>> By now, the title of this document may seem a little less
>> misleading. Still, let's attempt to make it even more clear.
>>
>> Since Avocado is mostly a job runner that needs to be more flexible,
>> the most natural approach is to turn more of it into a library. This
>> would lead to the creation of a new set of user consumable APIs,
>> albeit for a different set of users. Those APIs should allow the
>> creation of custom job executions, in ways that the Avocado authors
>> have not yet anticipated.
>>
>> Having settled on this solution to the stated problem, the primary
>> goal of this RFC is to propose how such a "Job API" can be
>> implemented.
>>
>> Analysis of a Job Environment
>> =============================
>>
>> To properly implement a Job API, it's necessary to review what
>> influences the creation and execution of a job. Currently, a Job
>> execution based on the current command line, is driven by, at least,
>> the following factors:
>>
>> * Configuration state
>> * Command line parameters
>> * Active plugins
>>
>> The following subsections examines how these would behave in an API
>> based approach to Job execution.
>>
>> Configuration state
>> -------------------
>>
>> Even though Avocado has a well defined `settings`_ module, it only
>> provides support for `getting the value`_ of configuration keys. It
>> lacks the ability to set configuration values at run time.
>>
>> If the configuration state allowed modifications at run time (in a
>> well defined and supported way), users could then create many types of
>> custom jobs with that "tool" alone.
>>
>> Command line parameters
>> -----------------------
>>
>> The need for a strong and predictable correlation between application
>> builtin defaults, configuration keys and command line parameters is
>> also a MUST for the implementation of the Job API.
>>
>> Users writing a custom job will very often need to set a given
>> behavior that may influence different parts of the Job execution.
>>
>> Not only that, many use cases may be implemented simply by changing
>> those defaults in the midst of the job execution.
>>
>> If users know how to map command line parameters into their
>> programmable counterparts, advanced custom jobs will be created much
>> more naturally.
>>
>> Plugins
>> -------
>>
>> Avocado currently relies exclusively on setuptools `entry points`_ to
>> define the active plugins. It may be beneficial to add a secondary
>> activation and deactivation mechanism, one that is locally
>> configurable. This is a rather common pattern, and well supported by
>> the underlying stevedore library.
>>
>> Given that all plugable components of Avocado are updated to adhere to
>> the "new plugin" standard, some use cases could be implemented simply
>> by enabling/disabling plugins (think of "driver" style plugins). This
>> can be exclusively or in addition to setting the plugin's own
>> configuration.
>>
>> Also, depending on the type of plugin, it may be useful to activate,
>> deactivate and configure those plugins per job. Thus, as part of the
>> Job state, APIs would allow for querying/setting plugins.
>>
>> Use cases
>> =========
>>
>> To aid in the design of an API that solves unforeseen needs, let's
>> think about a couple of use cases. Most of these use cases are based
>> on feedback already received and/or features already requested.
>>
>> Ordered and conditional test execution
>> --------------------------------------
>>
>> A user wants to create a custom job that only runs a benchmark test on
>> a VM if the VM installation test succeeds.
>>
>> Possible use case fulfillment
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Pseudo code::
>>
>> #!/usr/bin/env python
>> from avocado import Job
>> from avocado.resolver import resolve
>>
>> job = Job()
>>
>> vm_install =
>> resolve('io-github-autotest-qemu.unattended_install.cdrom.http_ks.default_install.aio_native')
>>
> what plugins are used during "resolve"?
>
>>
>> vm_disk_benchmark = resolve('io-github-autotest-qemu.autotest.bonnie')
>>
>> if job.run_test(vm_install).result == 'PASS':
>> job.run_test(vm_disk_benchmark)
>>
>> API Requirements
>> ~~~~~~~~~~~~~~~~
>>
>> 1. Job creation API
>> 2. Test resolution API
>> 3. Single test execution API
>>
>> Run profilers on a single test
>> ------------------------------
>>
>> A user wants to create a custom job that only runs profilers for the
>> very first test. Running the same profilers for all other tests may
>> be useless to the user, or maybe consume too much I/O resources that
>> would influence the remaining tests.
>>
>> Possible use case fulfillment
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Avocado, has a configuration key that controls profilers::
>>
>> [sysinfo.collect]
>> ...
>> profiler = False
>> ...
>>
>> By exposing the configuration state, the ``profiler`` key of the
>> ``sysinfo.collect`` section could be enabled for one test, and
>> disabled for all others. Pseudo code::
>>
>> #!/usr/bin/env python
>> from avocado import Job
>> from avocado.resolver import resolve
>>
>> job = Job()
>> env = job.environment # property
>>
>> env.config.set('sysinfo.collect', 'profiler', True)
> What config file is used?
>
>> job.run_test(resolve('build'))
>>
>> env.config.set('sysinfo.collect', 'profiler', False)
>> job.run_test(resolve('benchmark'))
>> job.run_test(resolve('stress'))
>> ...
>> job.run_test(resolve('netperf'))
>>
>> API Requirements
>> ~~~~~~~~~~~~~~~~
>>
>> 1. Job creation API
>> 2. Test resolution API
>> 3. Configuration API
>> 4. Single test execution API
>>
>> Multi-host test execution
>> -------------------------
>>
>> Use case description
>> ~~~~~~~~~~~~~~~~~~~~
>>
>> User needs to run the same test on different platforms. User has
>> hosts with the different platforms already setup and remotely
>> accessible.
>>
>> Possible use case fulfillment
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Avocado currently runs all tests in a job with a single runner. The
>> `default runner`_ implementation is a local test runner. Other tests
>> runners include the `remote runner`_ and the `vm runner`_.
>>
>> Pseudo code such as the following could implement the (serial, for
>> simplicity) test execution in multiple different hosts::
>>
>> from avocado import Job
>> from avocado.plugin_manager import require
>> from avocado.resolver import resolve
>>
>> job = Job()
>> print('JOB ID: %s' % job.unique_id)
>> print('JOB LOG: %s' % job.log)
>>
>> runner_plugin = 'avocado.plugins.runner:RemoteTestRunner'
>> require(runner_plugin)
> What plugins are loaded by default and what needs to be required?
>
>>
>> env = job.environment # property
>> env.config.set('plugin.runner', 'default', runner_plugin)
>> env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
>> env.config.set('plugin.runner.RemoteTestRunner', 'password', '123456')
>>
>> test = resolve('hardware_validation.py:RHEL.test')
>>
>> host_list = ['rhel6.x86_64.internal',
>> ...
>> 'rhel7.ppc64.internal']
>>
>> for host in host_list:
>> env.config.set('plugin.runner.RemoteTestRunner', 'host', host)
> Remote runner (as well as most of the plugins) does not support per-test
> granularity. I reached this problem with the multi-test RFC and we have
> to do something with it...
>
>> job.run_test(test)
>>
>> print('JOB STATUS: %s' % job.status)
>>
>> It's actually quite simple to move from a custom Job execution to a
>> custom Job runner, example::
>>
>> #!/usr/bin/env python
>> import sys
>> from avocado import Job
>> from avocado.plugin_manager import require
>> from avocado.resolver import resolve
>>
>> test = resolve(sys.argv[1])
>> host_list = sys.argv[2:]
>>
>> runner_plugin = 'avocado.plugins.runner:RemoteTestRunner'
>> require(runner_plugin)
>>
>> job = Job()
>> print('JOB ID: %s' % job.unique_id)
>> print('JOB LOG: %s' % job.log)
> I don't think we need to print those manually. Avocado uses `avocado.*`
> streams to communicate with outside world, it should keep using it,
> unless the user disables it (changes it ...). Note that avocado should
> not change the setting, in this usage it's the library, not the
> application.
>
>> env = job.environment # property
>> env.config.set('plugin.runner', 'default', runner_plugin)
>> env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
>> env.config.set('plugin.runner.RemoteTestRunner', 'password', '123456')
>>
>> for host in host_list:
>> env.config.set('plugin.runner.RemoteTestRunner', 'host', host)
>> job.run_test(test)
>>
>> print('JOB STATUS: %s' % job.status)
>>
> This took me a while to understand. The difference is that you allow to
> set some things using command line.
>
>> Which could be run as::
>>
>> $ multi hardware_validation.py:RHEL.test
>> rhel{6,7}.{x86_64,ppc64}.internal
>> JOB ID: 54cacfb42f3fa9566b6307ad540fbe594f4a5fa2
>> JOB LOG:
>> /home/<user>/avocado/job-results/job-2016-04-07T16.46-54cacfb/job.log
>> JOB STATUS: AVOCADO_ALL_OK
>>
> Brainstorming here: How about allowing people to invoke the avocado
> parser, which would modify the `config` values? We don't have this
> mapping already, but we talked about the need for the ways to unify
> config, multiplexer and args.
>
> The workflow would be:
>
> 1. parse args:
> - get urls
> - get remote-hostname from it and split it
> - [optionally] get additional arguments eg. from
> --job-params foo=bar [...]
> 2. instantiate the plugin with overridden values
> 3. run the test
>
>> API Requirements
>> ~~~~~~~~~~~~~~~~
>>
>> 1. Job creation API
>> 2. Test resolution API
>> 3. Configuration API
>> 4. Plugin Management API
>> 5. Single test execution API
>>
>> Current shortcomings
>> ~~~~~~~~~~~~~~~~~~~~
>>
>> 1. The current Avocado runner implementations do not follow the "new
>> style" plugin standard.
>>
>> 2. There's no concept of job environment
>>
>> 3. Lack uniform definition of plugin implementation for "driver" style
>> plugins.
>>
>> 4. Lack of automatic ownership of configuration namespace by plugin name.
>>
>>
>> Other use cases
>> ===============
>>
>> The following is a list of other valid use cases which can be
>> discussed at a later time:
>>
>> * Use the multiplexer only for some tests.
> Multiplexer creates multiple tests. What is the result of
> `run_test(test)` then?
>
>>
>> * Use the gdb or wrapper feature only for some tests.
>>
>> * Run Avocado tests and external-runner tests in the same job.
> Nitpick: This is currently possible by using simple tests with arguments.
>
>>
>> * Run tests in parallel.
> And here we go. Until now everything is doable, but this is amazing and
> needful function with lots of challenges in it as currently:
>
> 1. plugins are per-process
> 2. plugins are usually per-job
> 3. config values are not copied for each test (we can't change them
> during execution unless we do so)
>
>>
>> * Take actions based on test results (for example, run or skip other
>> tests)
>>
>> * Post-process the logs or test results before the job is done
> I'd only say the logs. IIRC the discussion about the multi-test RFC, job
> must never-ever change the results. It just runs tests, when there are
> failures it fails, but it is only a container and it must never say this
> test failed but what the heck, let's pass.
>
>>
>> Development Milestones
>> ======================
>>
>> Since it's clear that Avocado demands many changes to be able to
>> completely fulfill all mentioned use cases, it seems like a good idea
>> to define milestones. Those milestones are not intended to set the
>> pace of development, but to allow for the maximum number of real world
>> use cases fulfillment as soon as possible.
>>
>> Milestone 1
>> -----------
>>
>> Includes the delivery of the following APIs:
>>
>> * Job creation API
>> * Test resolution API
>> * Single test execution API
>>
>> Milestone 2
>> -----------
>>
>> Adds to the previous milestone:
>>
>> * Configuration API
>>
>> Milestone 3
>> -----------
>>
>> Adds to the previous milestone:
>>
>> * Plugin management API
>>
>> Milestone 4
>> -----------
>>
>> Introduces proper interfaces where previously Configuration and Plugin
>> management APIs were being used. For instance, where the following
>> pseudo code was being used to set the current test runner::
>>
>> env = job.environment
>> env.config.set('plugin.runner', 'default',
>> 'avocado.plugins.runner:RemoteTestRunner')
>> env.config.set('plugin.runner.RemoteTestRunner', 'username', 'root')
>> env.config.set('plugin.runner.RemoteTestRunner', 'password', '123456')
>>
>> APIs would be introduced that would allow for the following pseudo
>> code::
>>
>> job.load_runner_by_name('RemoteTestRunner')
>> if job.runner.accepts_credentials():
>> job.runner.set_credentials(username='root', password='123456')
> I do like this.
>
>>
>> .. _settings:
>> https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/settings.py
>>
>>
>> .. _getting the value:
>> https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/settings.py#L221
>>
>>
>> .. _default runner:
>> https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/runner.py#L193
>>
>>
>> .. _remote runner:
>> https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/remote/runner.py#L37
>>
>>
>> .. _vm runner:
>> https://github.com/avocado-framework/avocado/blob/0.34.0/avocado/core/remote/runner.py#L263
>>
>>
>> .. _entry points:
>> https://pythonhosted.org/setuptools/pkg_resources.html#entry-points
>>
>
> Uff, lots of thoughts. Let's start with the plugins:
>
> Currently we have
>
> 1. subcommand plugins - not related to this email (run, list, multiplex)
> 2. avocado plugins - related to whole avocado (config)
> 3. discovery - maps urls to tests (loaders)
> 4. job-related - modify the job execution (html, json, remote, sysinfo,
> vm, xunit)
> 5. test-related - allow to tweak the test execution (gdb, wrapper, sysinfo)
> 6. variants-generating - related to test, but results in several test
> variants (multiplexer)
Actually let me elaborate more about the plugins. We should IMO create
`avocado.plugins.job.*` and `avocado.plugins.test.*` plugin namespaces,
where:
avocado.plugins.job - is the (4) type of plugin. It should contain the
TestResults hooks (which defines everything one might need to tweak the
job).
avocado.plugins.test.* - is the (5) type of plugin, which should allow
pre and post test interaction.
I'm still struggling on the RemoteRunner. That does not apply to any of
these as it's job-related, but it modifies the runner. So maybe
similarly to multiplexer it deserves another category as it modifies the
test execution and requires per-test granularity (to allow running tests
on different machines):
job.run_test(test, params=None, Runner=None)
where the runner would be heavily simplified and it should support:
* setUp() # get the connection, check for avocado...
* run_test() # runs the test
* tearDown()
Example:
job.run_test(test)
for host in hosts:
job.run_test(test, runner=RemoteRunner(host))
runner = RemoteRunner(host)
for test in [test1, test2, test3]:
job.run_test(test, runner)
Note: If you dislike parsing it as an argument, we can always set it by
environment or directly to job. Anyway that would require copying this
to job before resuming from `trigger_job` as another test might want to
modify it:
job.runner = RemoteRunner(host)
job.trigger_test(test)
# the test must read/store the "runner" before returning
job.runner = None
job.trigger_test(test)
job.wait()
Anyway these are details. My point is that Runner and Multiplexer are
special.
>
> Some of the (4) are related to test, rather than job, but it was not a
> big issue until now as the runner did not allow mixing them. If we want
> to allow this, then we should modify:
>
> * remote - to trigger tests, rather than jobs (benefit is that default
> runner would get per-test updates and we'd probably got rid of the
> RemoteResults)
> * sysinfo - no actual modification needed (when not running in
> parallel), but logically it should be separated to be triggered
> before/after job (belongs to category (4)) and before/after test (5)
> * vm - the same as remote
>
> The category 5 is related per test, but currently does not support
> modification during run-time. There are additional problems when we want
> to support parallel execution as gdb and wrapper are set per-process.
>
> So to solve all those problems, we can either make those plugins
> test-process-aware (really ugly) or we need to instantiate the plugin
> inside the test process (the `plugin.run` would have to be executed
> inside `avocado.core.runner._run_test`.
>
>
> So to combine my thoughts, the workflow should IMO be (optional user
> steps not related to avocado are marked by '*'):
>
> * initialize logging, pop some arguments from sys.argv, ask for user
> input, ....
> 1. allow to run `from avocado import parser; parser.parse()` to parse
> either dictionary or `sys.argv` when None.
> 2. initialize the config (this also happens on parser.parse() along with
> updating the values from args)
> * modify Job-related config (tweak the job-plugins (4))
> 3. create a Job()
> 4. instantiate variants-generating plugin(s)
> * modify Test-related config
> * instantiate Test-related plugins
> * process logs
> * yield generated variant
> 5. run/trigger test
> ...
> 6. end job
> * whatever the user wants to do after job end
>
> Some explanations
>
> (1) - is optional and without it avocado uses the default config path
> from avocado import parser
> config = parser.parse(["--config", "/foo.ini"])
> # or parser.parse(None) to use sys.argv
>
> (2) - lazily executed when `config` used (in step (1), or eg. in
> resolver, or by using "config")
> from avocado import Config
> config = Config(file=None)
> config.get(...)
> config.set(...)
>
> (3) - assigns job id and invokes the job-plugins (eg pre-job hooks)
> from avocado import Job
> job = Job(config=None)
>
> (4) - gives the object which allows yielding variants (and more)
> from avocado import multiplexer
> mux = multiplexer(files=None)
> # when file=None use `--multiplex value?)
> params = mux.next()
>
> (5) - run the test. There are two ways:
> # modify the avocado environment
> job.run_test()
> # the job.run_test() instantiates the plugins, handles
> # the execution and report test results
>
> # Another way (prefered by me) is
> job.run_test(environment=None, params=None, ...)
> # basically does the same
>
> # The difference is when we use `trigger_test` to run in
> # background, where the first needs to always copy the whole
> # environment (deepcopy), while the second can rely on the user
>
> (6) - finishes the job including post-job plugins triggers
>
> As you can see all steps except (3), (5) and (6) are optional and use
> defaults, while allowing their modification. So milestones would be:
>
> 1. 3,5,6
> 2. 2
> 3. 4
> 4. 1
>
> Regards,
> Lukáš
More information about the Avocado-devel
mailing list