[Avocado-devel] [RFC] Improve job status

Amador Pahim apahim at redhat.com
Tue Apr 5 21:37:21 UTC 2016


Hi folks,

This is the RFC for the rework in avocado job exit status. Some 
discussion have already happened on github, but still we should document 
the decisions and open the discussion for a broader audience as well.

Motivation
=======

Currently the job expects from the runner a list of tests that failed to 
determine the exit code Avocado will finish with. If the list is empty, 
the exit code is 0. Otherwise, 1. This implementation is very limited, 
given the number of possibilities of test ending status and the exit 
codes. The goal of this RFC is to determine the internal API between job 
and runner, the relationship between the tests status and the Avocado 
exit codes and the meaning of the exit codes.

Use cases/current issues:

- When all tests end with 'PASS' the avocado exit code is 0, which means 
"AVOCADO_ALL_OK".
- When some or all tests end with 'FAIL', avocado exit code is 1, which 
is defined as "AVOCADO_TESTS_FAIL".
- When the job is interrupted with CTRL+C: Current test is INTERRUPTED,  
avocado exit code is "AVOCADO_TESTS_FAIL".
- When the job hits the timeout before finish the tests, we have 2 
possible results:
-- Timeout during a test: The test is interrupted, user sees the status 
ERROR (this status is buggy, it's being fixed, but it's not part of this 
RFC) for the test and next tests are skipped, avocado exit code is 
"AVOCADO_TESTS_FAIL".
-- Timeout between tests: Next tests are skipped, avocado exit code is 
"AVOCADO_ALL_OK".


Internals
======

We have currently a dictionary with the status as key and True or False 
as value for each status:

mapping = {"SKIP": True, "ABORT": False, "ERROR": False, FAIL": False, 
"WARN": True, "PASS": True, "START": True, "ALERT": False, "RUNNING": 
False, "NOSTATUS": False, "INTERRUPTED": False}

That dictionary tells the runner is a status is good or bad:

...
if not status.mapping[test_state['status']]:
     failures.append(test_state['name'])
...
return failures
...

Based on that return, the job decides between 0 or 1 as the exit code:

...
tests_status = not bool(failures)
if tests_status:
     return exit_codes.AVOCADO_ALL_OK
else:
     return exit_codes.AVOCADO_TESTS_FAIL
...

Currently the exit codes available are:

AVOCADO_ALL_OK = 0
AVOCADO_TESTS_FAIL = 1
AVOCADO_JOB_FAIL = 2
AVOCADO_FAIL = 3
AVOCADO_JOB_INTERRUPTED = 4


Recommended Solution
===============

Runner should be able to provide a more accurate information to the job, 
better representing what actually happened to the tests. After some 
discussion in github, we are currently proposing the minimum enough 
information for the runner to report so the job can decide the best fit 
for the exit code:

On the runner:
- Instead of a list called 'failures', the proposal is to have a set, 
called 'summary'.
- If the job hits the timeout, being the test reported as INTERRUPTED or 
SKIP, we add the string 'INTERRUPTED' to the 'summary'.
- If the test finishes with a bad status ('False' in the mapping), we 
add the string FAIL to the 'summary'.
- If the test finishes with a good status, we don't add anything to the 
'summary'.
- If the runner someway crashes, 'summary' will not be returned and the 
job should handle that.

On the job:
- Receive the summary and test:
-- If the string "INTERRUPTED" is there, exit with 
"AVOCADO_JOB_INTERRUPTED", regardless if any test failed.
-- If we don't have "INTERRUPTED" in 'summary' but still we have 
something there, exit with "AVOCADO_TESTS_FAIL".
-- Empty 'summary' means job should exit with "AVOCADO_ALL_OK".
-- 'None' in 'summary' means runner crashed and job should exit with 
"AVOCADO_JOB_FAIL".


Additional Improvements
================

There is a request to the exit codes to be ORable. To do so, we have to 
use different codes of what we have currently, changing them to numbers 
that set only one bit to 1 when converted to binary:

AVOCADO_ALL_OK = 0
AVOCADO_TESTS_FAIL = 1
AVOCADO_JOB_FAIL = 2
AVOCADO_FAIL = 4
AVOCADO_JOB_INTERRUPTED = 8

That way, the test status should be a code that can be used to have more 
information about what happened to the group of tests. Example:

Test1: PASS
Test2: FAIL
Test3: INTERRUPTED
Test4: SKIP

On the example above, we have a FAILed test, making job to use the 
AVOCADO_TESTS_FAIL code, and an INTERRUPTED test, making job to use the 
AVOCADO_JOB_INTERRUPTED. PASS and SKIP are considered good statuses, so 
the final job exit code would be 9 (AVOCADO_ALL_OK | AVOCADO_TESTS_FAIL 
| AVOCADO_JOB_INTERRUPTED).

This request is quite well designed, but still there is room for 
discussion before it gains upstream.

Thanks,
--
apahim











More information about the Avocado-devel mailing list