[Avocado-devel] RFC: Configuration by convention

Cleber Rosa crosa at redhat.com
Tue Dec 3 02:08:53 UTC 2019


On Thu, Nov 21, 2019 at 06:23:15PM -0300, Beraldo Leal wrote:
> Hi all,
> 
> I am working on a card about "Configuration by convention", and I realized that
> it would be better to consult the list first, regarding few key points.
> 
> So I would like to share with you this RFC and get your feedbacks.
> 
> TL;DR
> #####
> 
> The number of plugins made by many people and the lack of some name, config
> options, and argument type conventions may turn Avocado's usability difficult.
> This also makes it challenging to create a future API for executing more
> complex jobs. I would like to discuss in this RFC some proposals to improve
> this.
>

I'd argue that even without plugins the lack of convention (or another
type or order setting mechanism) can induce growth pains.  But yes, the
modularity aspect of Avocado features make this a tougher problem.

> And note that, since this is a relatively big change, this RFC, if agreed,
> could be broken down into smaller issues to facilitate its acceptance into the
> master branch.
> 
> Motivation
> ##########
> 
> An Avocado Job is primarily executed through the `avocado run` command line.
> The behavior of such an Avocado Job is determined by parsing the following
> settings (listed in parsed order):
> 
>  1) Default values in source code

There's possibly a lack of convention/order in this item alone.  For
instance, we have "avocado/core/defaults.py" with some defaults, but
I'm sure there are other such defaults scattered around the project,
with ad-hoc names.

A good starting point for setting a convention (say one of the cards
you listed) would be to determine how to set the default on
"avocado/core/defaults.py".  Also, another action item could be to
make sure that we don't have "default worthy" variables set elsewhere.
For instance, in "avocado/core/runner.py" I see:

   DEFAULT_TIMEOUT = 86400

Which is very similar to some of the default values in defaults.py.

>  2) Configuration file contents
>  3) Command-line options
> 
> Currently, the Avocado config file is an .ini file that is parsed by Python's
> `configparser` library and this config is broken into sections. Each Avocado
> plugin has its dedicated section.
> 
> Today, the parsing of the command line options is made by `argparse` library
> and produces a dictionary that is given to the `avocado.core.job.Job()` class
> as its `config` parameter.
> 
> There is no convention on the naming pattern used either on configuration files
> or on command-line options. Besides the name convention, there is also a lack
> of convention for some argument types. For instance::
> 
>  $ avocado run -d
> 
> and::
> 
>  $ avocado run --sysinfo on
> 
> Both are boolean variables, but with different "execution model" (the former
> doesn't need arguments and the latter needs `on` or `off` as argument).
> 
> Since the Avocado trend is to have more and more plugins, we need to design a
> name convention on command-line arguments and settings to avoid chaos.
> 
> But, most important: It would be valuable for our users if Avocado provides a
> Python API in such a way that developers could write more complex jobs
> programmatically and advanced users that know the configuration entries used on
> jobs, could do a quick one-off execution on command-line.
> 
> Example::
> 
>  import sys
>  from avocado.core.job import Job
> 
>  config = {'references': ['tests/passtest.py:PassTest.test']}
> 
>  with Job(config) as j:
>    sys.exit(j.run())
> 
> Before we address this API use-case, it is important to create this convention
> so we can have an intuitive use of Avocado config options.
> 
> .. note:: We understand that, plugin developers have the flexibility to
>           configure they options as desired but inside Avocado core and plugin,
>           settings should have a good naming convention.
> 
> 
> Specification
> #############
> 
> 
> Standards for Command Line Interface
> ------------------------------------
> 
> When it comes to the command line interface, a very interesting recommendation
> is the POSIX Standard's recommendation for arguments[1]. Avocado should try to
> follow this standard and its recommendations.
> 
> This pattern does not cover long options (starting with --). For this, we should
> also embrace the GNU extension[2].
> 
> One of the goals of this extension, by introducing long options, was to make
> command-line utilities user-friendly. Also, another aim was to try to create a
> norm among different command-line utilities. Thus, --verbose, --debug,
> --version (with other options) would have the same behavior in many programs.
> Avocado should try to, where applicable, use the GNU long options table[3] as
> reference.
> 
> Many of these recommendations are obvious and already used by Avocado or
> enforced by default, thanks to libraries like `argparse`.
> 
> However, those libraries do not force the developer to follow all
> recommendations.
> 
> Besides the basic ones, here are some recommendations we should try to follow
> and pay attention to:
> 
>   1. Option-arguments should not be optional (Guideline 7, from POSIX). So we
>      should avoid this::
>      
>         avocado run --loaders [LOADERS [LOADERS ...]]
> 
>   or::
>   
>         avocado run --store-logging-stream [STREAM[:LEVEL] [STREAM[:LEVEL] ...]]
> 
>      We can have::
> 
>         avocado run --loaders LOADER,LOADER,LOADER,...
> 
>      or::
> 
>         avocado run --loader LOADER --loader LOADER --loader LOADER
>

OK, I see and agree with the main point that a given option *should*
be given.  Choosing the specific style would be the second issue
then.  Again, maybe this can be another little card.  In such a card,
besides its resolution, it would be nice to document what developers
should follow (this may be the start of either a "Developer Guide"
section or a "Plugin Developer Guide" itself.

>   2. Use hyphens not underscore: Long options consist of ‘--’ followed by a
>      name made of alphanumeric characters and dashes. Option names are
>      typically one to three words long, with hyphens to separate words. Users
>      can abbreviate the option names as long as the abbreviations are unique.
>      Also, underscore, sometimes it gets "eaten" by a terminal border and
>      thus looks like space.
>

Are you aware of any long options in Avocado that violates this?  Or
is this just supposed to be part of the guidelines?  If so, it could
be preserved in a "Developer Guide" like suggested above.

>   3. When naming subcommands options you don’t have to worry about name
>      conflicts outside the subcommand scope, just keep them short, simple and
>      intuitive.
>

Sounds good.

> Argument Types
> ~~~~~~~~~~~~~~
> 
> Basic types, like strings and integers, are clear how to use. But here is a
> list of what should expect when using other types:
> 
>   1. **Booleans**: Boolean options should be expressed as "flags" args (without
>        the "option-argument"). Flags, when present, should represent a
>        True/Active value.  This will reduce the command line size. We should
>        avoid using this::
> 
>         avocado run --json-job-result {on,off}
>

Let's suppose that the command line options take precedence over the
built-in defaults and configuration file settings.  How would you say
that the "json-job-result" feature should be disabled (taking
precedence over what's defined in the built-in defaults and
configuration file settings)?

For instance, autoconf scripts will usually have both a
'--enable-$(feature)' and '--disable-$(feature)' options.  Is this
what you're proposing?

>   2. **Lists**: When an option argument has multiple values we should use the
>        space as the separator.
>

Examples of current violations to this norm, if they exist, would
improve the discussion IMO.  But, I peeked on further responses on
this thread, and I believe examples were given, so no need to
replicate them here. :)

> 
> Presentation
> ~~~~~~~~~~~~
> 
> Finding options easily, either in the manual or in the help, favor usability
> and avoids chaos.
> 
> We can arrange the display of these options in alphabetical order within each
> section.
>

I guess you mean that we should/could... which I agree.  But I also
wonder how.

> 
> Standards for Config File Interface
> -----------------------------------
> 
> .. note:: Many other config file options could be used here, but since that
>           this is another discussion, I'm assuming that we are going to keep
>           using `configparser` for a while.
> 
> As one of the main motivations of this RFC is to create a convention to avoid
> chaos and make the job execution API use as straightforward as possible, I
> believe that the config file should be as close as possible to the dictionary
> that will be passed to this API.
> 
> For this reason, this may be the most critical point of this RFC. We should
> create a pattern that is intuitive for the developer to convert from one format
> to another without much juggling.
>

Agreed with the problem statement and general direction.

> Nested Sections
> ~~~~~~~~~~~~~~~
> 
> While the current `configparser` library does not support nested sections,
> Avocado can use the dot character as a convention for that. i.e:
> `[runner.output]`.
> 
> This convention will be important soon, when converting a dictionary into a
> config file and vice-versa.
> 
> And since almost everything in Avocado is a plugin, each plugin section should
> **not** use the "plugins" prefix and **must** respect the reserved sections
> mentioned before. Currently, we have a mix of sections that start with
> "plugins" and sections that don't.
>

OK, I see signs of improved consistency here and I like it.

> Plugin section name
> ~~~~~~~~~~~~~~~~~~~
> 
> I am not quite sure here and would like to know the opinion of those who are
> the longest in the project. Perhaps this is a little controversial point. But I
> believe we can touch here to improve our convention.
> 
> Most plugins currently have the same name as the python module. Example: human,
> diff, tap, nrun, run, journal, replay, sysinfo, etc.
> 
> These are examples of "good" names.
> 
> However, some other plugins do not follow this convention. Ex: runnable_run,
> runnable_run_recipe, task_run, task_run_recipe, archive, etc.
>

Are you also proposing that every plugin should de implemented in
separate Python module?  IMO it would improve consistency indeed, but
could add an overhead and increase the amount of boilerplate code
repetion.

One recent example is the "avocado/plugins/resolvers.py" file, which
initially was a file per-plugin.  The reviewer noticed the amount of
duplicate imports and that the plugins were very similar in purpose
and behavior.  In that ocasion, it make sense to me to follow the
reviewer's point.

And adding to that point and your comment, the various Python modules
containing plugins related to the nrunner, "runnable_run",
"runnable_run_recipe", etc, could easily be on the very same
"avocado/plugins/nrunner.py" module.

> I believe that having a convention here helps when writing more complex tests,
> configfiles, as well as easily finding plugins in various parts of the project,
> either on a manual page or during the installation procedure.
> 
> I understand that the name of the plugin is different from the module name in
> python, but anyway, should we follow PEP8 in this case?
> 
>         From PEP8: Modules should have short, all-lowercase names. Underscores
>         can be used in the module name if it improves readability. Python
>         packages should also have short, all-lowercase names, although the use
>         of underscores is discouraged.
>

I think this is a worthy goal.  I don't think the currently used
module names that follow the command line convention is a good idea
anyway (given the examples above).

> Reserved Sections
> ~~~~~~~~~~~~~~~~~
> 
> We should reserve a few sections as reserved for the Avocado's core
> functionalities. i.e: main, plugins, logs, job, etc...
> 
> Not sure here, it makes sense?
>

TBH, I don't have a failproof and unchangeable opinion on this.  I'll
defer until later, and if necessary return to this point.  If I don't,
please remind me! :)


> Config Types
> ~~~~~~~~~~~~
> 
> `configparser` do not guess datatypes of values in configuration files, always
> storing them internally as strings. This means that if you need other
> datatypes, you should convert on your own
> 
> There are few methods on this library to help us: `getboolean()`, `getint()`
> and `getfloat()`. Basic types here, are also straightforward.
> 
> Regarding boolean values, `getboolean()` can accept `yes/no`, `on/off`,
> `true/false` or `1/0`. But we should adopt one style and stick with it. I
> would suggest using `true/false`.
>

You talk about configparser only.  What about similar values given on
the command line?  Should we have a common utility library that can
check/return the right data types and be used on both configuration
file and command line parser?

One example is that
https://docs.python.org/3/library/argparse.html#type will take any
callable, and we may be tempted to use "bool" on a value here and then
getboolean() with configparser, resulting in very different behavior.

Finally, let me say that setting a simple but effetive type system is
not an easy task, but it's indeed a necessary step here.

> 
> Presentation
> ------------
> 
> As the avocado trend is to have more and more plugins, I believe that to make
> it easier for the user to find where each configuration is, we should split the
> file into smaller files, leaving one file for each plugin. Avocado already
> supports that with the conf.d directory. What do you think?
> 
>

If it reinforces a convention, so that users will automatically think
of looking at a given file for a specific configuration, then I think
this is fine.

But, I have to say that I think the final parsed content on the config
file and the structure of that content is much more important.  If I
would focus on something, I'd focus on the content structure (section
names, key names, etc), and not necessarily on how the file names
(given that this would only be a recommendation, right?).  I mean,
if I put "[foo]" inside "/etc/avocado/conf.d/bar.conf", will it be
read and parsed?

> Backwards Comaptibility
> #######################
> 
> In order to keep a good naming convention, this set of changes probably will
> rename some args and/or config file options.
> 
> While some changes proposed here are simple and do not affect Avocado's
> behavior, others are critical and may break Avocado jobs.
> 
> Command line syntax changes
> ---------------------------
> 
> If these changes are acceptable, these command-line conversions will lead to a
> "syntax error".
> 
> We can have a transition period with a "deprecated message" but it may not be
> worth it. I'm not sure yet. What do you think?
>

In general, unless the development cost is really prohibitive, it's a
good idea to have a transitional period.  If that's the case, we can
validate such changes, announce them really loudly on a given release,
and then apply them on the next release.

> Plugin name changes
> -------------------
> 
> Again, if these changes are feasible, changing the modules names and/or the
> 'name' attribute of plugins will require to change the config files inside
> Avocado as well. This will not break unless the user is using an old config
> file. In that case, we can also have a "deprecated message" and accept the old
> config file option for some time. Any other drawbacks that I can't see?
> 
>

No, seems fine.

> Security Implications
> #####################
> 
> Avocado users should have the warranty that their jobs are running on isolated
> environment.
> 
> We should consider this and keep in mind that any moves here should continue
> with this assumption.
> 
> How to Teach This
> #################
> 
> We should provide a complete configuration reference guide section in our
> User's Documentation.
> 
> In the future, the Job API should also be very well detailed so sphinx could
> generate good documentation on our Test Writer's Guide.
>

This is indeed highly desirable.

> Besides a good documentation, there is no better way to learn than by example.
> If our plugins, options and settings follow a good convention it will serve as
> template to new plugins.
>

+1.

> If these changes are accepted by the community and implemented, this RFC could
> be adapted to become a section on one of our guides, maybe something like the a
> Python PEP that should be followed when developing new plugins.
>

Agreed.

Thanks a lot of the write up.
- Cleber.

> Open Issues
> ###########
> 
> .. note:: Links to open issues that are related to this.
> 
> References
> ##########
> 
> [1] - https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html
> [2] - https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html
> [3] - https://www.gnu.org/prep/standards/html_node/Option-Table.html#Option-Table
> 
> Regards,
> Beraldo
> 
> 




More information about the Avocado-devel mailing list