[Avocado-devel] [RFC] Improve job status

Lukáš Doktor ldoktor at redhat.com
Fri Apr 8 10:51:03 UTC 2016


Dne 5.4.2016 v 23:37 Amador Pahim napsal(a):
> Hi folks,
>
> This is the RFC for the rework in avocado job exit status. Some
> discussion have already happened on github, but still we should document
> the decisions and open the discussion for a broader audience as well.

Some discussion already happened also here: 
https://trello.com/c/SU5fixgH/510-improve-job-statuses

>
> Motivation
> =======
>
> Currently the job expects from the runner a list of tests that failed to
> determine the exit code Avocado will finish with. If the list is empty,
> the exit code is 0. Otherwise, 1. This implementation is very limited,
> given the number of possibilities of test ending status and the exit
> codes. The goal of this RFC is to determine the internal API between job
> and runner, the relationship between the tests status and the Avocado
> exit codes and the meaning of the exit codes.
>
> Use cases/current issues:
>
> - When all tests end with 'PASS' the avocado exit code is 0, which means
> "AVOCADO_ALL_OK".
> - When some or all tests end with 'FAIL', avocado exit code is 1, which
> is defined as "AVOCADO_TESTS_FAIL".
> - When the job is interrupted with CTRL+C: Current test is INTERRUPTED,
> avocado exit code is "AVOCADO_TESTS_FAIL".
This had been fixed by you and AVOCADO_JOB_INTERRUPTED is reported
> - When the job hits the timeout before finish the tests, we have 2
> possible results:
> -- Timeout during a test: The test is interrupted, user sees the status
> ERROR (this status is buggy, it's being fixed, but it's not part of this
> RFC) for the test and next tests are skipped, avocado exit code is
> "AVOCADO_TESTS_FAIL".
Also fixed by you,
> -- Timeout between tests: Next tests are skipped, avocado exit code is
> "AVOCADO_ALL_OK".
IMO this last is a bug and it should report `AVOCADO_JOB_INTERRUPTED`, 
because the job was interrupted, but unless I'm wrong this is the 
current behavior (after your fix).

>
>
> Internals
> ======
>
> We have currently a dictionary with the status as key and True or False
> as value for each status:
>
> mapping = {"SKIP": True, "ABORT": False, "ERROR": False, FAIL": False,
> "WARN": True, "PASS": True, "START": True, "ALERT": False, "RUNNING":
> False, "NOSTATUS": False, "INTERRUPTED": False}
>
> That dictionary tells the runner is a status is good or bad:
>
> ...
> if not status.mapping[test_state['status']]:
>      failures.append(test_state['name'])
> ...
> return failures
> ...
>
> Based on that return, the job decides between 0 or 1 as the exit code:
>
> ...
> tests_status = not bool(failures)
> if tests_status:
>      return exit_codes.AVOCADO_ALL_OK
> else:
>      return exit_codes.AVOCADO_TESTS_FAIL
> ...
>
> Currently the exit codes available are:
>
> AVOCADO_ALL_OK = 0
> AVOCADO_TESTS_FAIL = 1
> AVOCADO_JOB_FAIL = 2
> AVOCADO_FAIL = 3
> AVOCADO_JOB_INTERRUPTED = 4
>
>
> Recommended Solution
> ===============
>
> Runner should be able to provide a more accurate information to the job,
> better representing what actually happened to the tests. After some
> discussion in github, we are currently proposing the minimum enough
> information for the runner to report so the job can decide the best fit
> for the exit code:
>
> On the runner:
> - Instead of a list called 'failures', the proposal is to have a set,
> called 'summary'.
> - If the job hits the timeout, being the test reported as INTERRUPTED or
> SKIP, we add the string 'INTERRUPTED' to the 'summary'.
> - If the test finishes with a bad status ('False' in the mapping), we
> add the string FAIL to the 'summary'.
> - If the test finishes with a good status, we don't add anything to the
> 'summary'.
> - If the runner someway crashes, 'summary' will not be returned and the
> job should handle that.
>
> On the job:
> - Receive the summary and test:
> -- If the string "INTERRUPTED" is there, exit with
> "AVOCADO_JOB_INTERRUPTED", regardless if any test failed.
> -- If we don't have "INTERRUPTED" in 'summary' but still we have
> something there, exit with "AVOCADO_TESTS_FAIL".
> -- Empty 'summary' means job should exit with "AVOCADO_ALL_OK".
> -- 'None' in 'summary' means runner crashed and job should exit with
> "AVOCADO_JOB_FAIL".
Already upstream by you.

>
>
> Additional Improvements
> ================
>
> There is a request to the exit codes to be ORable. To do so, we have to
> use different codes of what we have currently, changing them to numbers
> that set only one bit to 1 when converted to binary:
>
> AVOCADO_ALL_OK = 0
> AVOCADO_TESTS_FAIL = 1
> AVOCADO_JOB_FAIL = 2
> AVOCADO_FAIL = 4
> AVOCADO_JOB_INTERRUPTED = 8
>
> That way, the test status should be a code that can be used to have more
> information about what happened to the group of tests. Example:
>
> Test1: PASS
> Test2: FAIL
> Test3: INTERRUPTED
> Test4: SKIP
>
> On the example above, we have a FAILed test, making job to use the
> AVOCADO_TESTS_FAIL code, and an INTERRUPTED test, making job to use the
> AVOCADO_JOB_INTERRUPTED. PASS and SKIP are considered good statuses, so
> the final job exit code would be 9 (AVOCADO_ALL_OK | AVOCADO_TESTS_FAIL
> | AVOCADO_JOB_INTERRUPTED).
This is basically the idea from the trello card. I agree with it, it 
just requires deeper changes to avocado and job.

Regards,
Lukáš

>
> This request is quite well designed, but still there is room for
> discussion before it gains upstream.
>
> Thanks,
> --
> apahim
>
>
>
>
>
>
>
>
> _______________________________________________
> Avocado-devel mailing list
> Avocado-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/avocado-devel




More information about the Avocado-devel mailing list