[Avocado-devel] RFC: Configuration by convention

Thu Nov 21 21:23:15 UTC 2019

Hi all,

I am working on a card about "Configuration by convention", and I realized that
it would be better to consult the list first, regarding few key points.

So I would like to share with you this RFC and get your feedbacks.

TL;DR
#####

The number of plugins made by many people and the lack of some name, config
options, and argument type conventions may turn Avocado's usability difficult.
This also makes it challenging to create a future API for executing more
complex jobs. I would like to discuss in this RFC some proposals to improve
this.

And note that, since this is a relatively big change, this RFC, if agreed,
could be broken down into smaller issues to facilitate its acceptance into the
master branch.

Motivation
##########

An Avocado Job is primarily executed through the `avocado run` command line.
The behavior of such an Avocado Job is determined by parsing the following
settings (listed in parsed order):

 1) Default values in source code
 2) Configuration file contents
 3) Command-line options

Currently, the Avocado config file is an .ini file that is parsed by Python's
`configparser` library and this config is broken into sections. Each Avocado
plugin has its dedicated section.

Today, the parsing of the command line options is made by `argparse` library
and produces a dictionary that is given to the `avocado.core.job.Job()` class
as its `config` parameter.

There is no convention on the naming pattern used either on configuration files
or on command-line options. Besides the name convention, there is also a lack
of convention for some argument types. For instance::

 $ avocado run -d

and::

 $ avocado run --sysinfo on

Both are boolean variables, but with different "execution model" (the former
doesn't need arguments and the latter needs `on` or `off` as argument).

Since the Avocado trend is to have more and more plugins, we need to design a
name convention on command-line arguments and settings to avoid chaos.

But, most important: It would be valuable for our users if Avocado provides a
Python API in such a way that developers could write more complex jobs
programmatically and advanced users that know the configuration entries used on
jobs, could do a quick one-off execution on command-line.

Example::

 import sys
 from avocado.core.job import Job

 config = {'references': ['tests/passtest.py:PassTest.test']}

 with Job(config) as j:
   sys.exit(j.run())

Before we address this API use-case, it is important to create this convention
so we can have an intuitive use of Avocado config options.

.. note:: We understand that, plugin developers have the flexibility to
          configure they options as desired but inside Avocado core and plugin,
          settings should have a good naming convention.

Specification
#############

Standards for Command Line Interface
------------------------------------

When it comes to the command line interface, a very interesting recommendation
is the POSIX Standard's recommendation for arguments[1]. Avocado should try to
follow this standard and its recommendations.

This pattern does not cover long options (starting with --). For this, we should
also embrace the GNU extension[2].

One of the goals of this extension, by introducing long options, was to make
command-line utilities user-friendly. Also, another aim was to try to create a
norm among different command-line utilities. Thus, --verbose, --debug,
--version (with other options) would have the same behavior in many programs.
Avocado should try to, where applicable, use the GNU long options table[3] as
reference.

Many of these recommendations are obvious and already used by Avocado or
enforced by default, thanks to libraries like `argparse`.

However, those libraries do not force the developer to follow all
recommendations.

Besides the basic ones, here are some recommendations we should try to follow
and pay attention to:

  1. Option-arguments should not be optional (Guideline 7, from POSIX). So we
     should avoid this::

        avocado run --loaders [LOADERS [LOADERS ...]]

  or::

        avocado run --store-logging-stream [STREAM[:LEVEL] [STREAM[:LEVEL] ...]]

     We can have::

        avocado run --loaders LOADER,LOADER,LOADER,...

     or::

        avocado run --loader LOADER --loader LOADER --loader LOADER

  2. Use hyphens not underscore: Long options consist of ‘--’ followed by a
     name made of alphanumeric characters and dashes. Option names are
     typically one to three words long, with hyphens to separate words. Users
     can abbreviate the option names as long as the abbreviations are unique.
     Also, underscore, sometimes it gets "eaten" by a terminal border and
     thus looks like space.

  3. When naming subcommands options you don’t have to worry about name
     conflicts outside the subcommand scope, just keep them short, simple and
     intuitive.

Argument Types
~~~~~~~~~~~~~~

Basic types, like strings and integers, are clear how to use. But here is a
list of what should expect when using other types:

  1. **Booleans**: Boolean options should be expressed as "flags" args (without
       the "option-argument"). Flags, when present, should represent a
       True/Active value.  This will reduce the command line size. We should
       avoid using this::

        avocado run --json-job-result {on,off}

  2. **Lists**: When an option argument has multiple values we should use the
       space as the separator.

Presentation
~~~~~~~~~~~~

Finding options easily, either in the manual or in the help, favor usability
and avoids chaos.

We can arrange the display of these options in alphabetical order within each
section.

Standards for Config File Interface
-----------------------------------

.. note:: Many other config file options could be used here, but since that
          this is another discussion, I'm assuming that we are going to keep
          using `configparser` for a while.

As one of the main motivations of this RFC is to create a convention to avoid
chaos and make the job execution API use as straightforward as possible, I
believe that the config file should be as close as possible to the dictionary
that will be passed to this API.

For this reason, this may be the most critical point of this RFC. We should
create a pattern that is intuitive for the developer to convert from one format
to another without much juggling.

Nested Sections
~~~~~~~~~~~~~~~

While the current `configparser` library does not support nested sections,
Avocado can use the dot character as a convention for that. i.e:
`[runner.output]`.

This convention will be important soon, when converting a dictionary into a
config file and vice-versa.

And since almost everything in Avocado is a plugin, each plugin section should
**not** use the "plugins" prefix and **must** respect the reserved sections
mentioned before. Currently, we have a mix of sections that start with
"plugins" and sections that don't.

Plugin section name
~~~~~~~~~~~~~~~~~~~

I am not quite sure here and would like to know the opinion of those who are
the longest in the project. Perhaps this is a little controversial point. But I
believe we can touch here to improve our convention.

Most plugins currently have the same name as the python module. Example: human,
diff, tap, nrun, run, journal, replay, sysinfo, etc.

These are examples of "good" names.

However, some other plugins do not follow this convention. Ex: runnable_run,
runnable_run_recipe, task_run, task_run_recipe, archive, etc.

I believe that having a convention here helps when writing more complex tests,
configfiles, as well as easily finding plugins in various parts of the project,
either on a manual page or during the installation procedure.

I understand that the name of the plugin is different from the module name in
python, but anyway, should we follow PEP8 in this case?

        From PEP8: Modules should have short, all-lowercase names. Underscores
        can be used in the module name if it improves readability. Python
        packages should also have short, all-lowercase names, although the use
        of underscores is discouraged.

Reserved Sections
~~~~~~~~~~~~~~~~~

We should reserve a few sections as reserved for the Avocado's core
functionalities. i.e: main, plugins, logs, job, etc...

Not sure here, it makes sense?

Config Types
~~~~~~~~~~~~

`configparser` do not guess datatypes of values in configuration files, always
storing them internally as strings. This means that if you need other
datatypes, you should convert on your own

There are few methods on this library to help us: `getboolean()`, `getint()`
and `getfloat()`. Basic types here, are also straightforward.

Regarding boolean values, `getboolean()` can accept `yes/no`, `on/off`,
`true/false` or `1/0`. But we should adopt one style and stick with it. I
would suggest using `true/false`.

Presentation
------------

As the avocado trend is to have more and more plugins, I believe that to make
it easier for the user to find where each configuration is, we should split the
file into smaller files, leaving one file for each plugin. Avocado already
supports that with the conf.d directory. What do you think?

Backwards Comaptibility
#######################

In order to keep a good naming convention, this set of changes probably will
rename some args and/or config file options.

While some changes proposed here are simple and do not affect Avocado's
behavior, others are critical and may break Avocado jobs.

Command line syntax changes
---------------------------

If these changes are acceptable, these command-line conversions will lead to a
"syntax error".

We can have a transition period with a "deprecated message" but it may not be
worth it. I'm not sure yet. What do you think?

Plugin name changes
-------------------

Again, if these changes are feasible, changing the modules names and/or the
'name' attribute of plugins will require to change the config files inside
Avocado as well. This will not break unless the user is using an old config
file. In that case, we can also have a "deprecated message" and accept the old
config file option for some time. Any other drawbacks that I can't see?

Security Implications
#####################

Avocado users should have the warranty that their jobs are running on isolated
environment.

We should consider this and keep in mind that any moves here should continue
with this assumption.

How to Teach This
#################

We should provide a complete configuration reference guide section in our
User's Documentation.

In the future, the Job API should also be very well detailed so sphinx could
generate good documentation on our Test Writer's Guide.

Besides a good documentation, there is no better way to learn than by example.
If our plugins, options and settings follow a good convention it will serve as
template to new plugins.

If these changes are accepted by the community and implemented, this RFC could
be adapted to become a section on one of our guides, maybe something like the a
Python PEP that should be followed when developing new plugins.

Open Issues
###########

.. note:: Links to open issues that are related to this.

References
##########

[1] - https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html
[2] - https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html
[3] - https://www.gnu.org/prep/standards/html_node/Option-Table.html#Option-Table

Regards,
Beraldo