[Avocado-devel] [RFC] Improve job status

Tue Apr 12 01:19:15 UTC 2016

On 04/08/2016 10:23 PM, Olav Philipp Henschel wrote:
> Hi guys,
>
> I liked the idea of the ORable exit codes.
> Have you thought about what should be the exit code and behavior when 
> avocado is aborted with other signals, such as SIGTERM, for example?

Probably we should use AVOCADO_JOB_INTERRUPTED for SIGTERM as well. The 
point is we don't have a handler for it is this is a good subject for a 
card.

> Is it AVOCADO_FAIL? Shouldn't it handle the signal similarly to SIGINT 
> and interrupt the current test immediately?

Yes, I think it should be AVOCADO_JOB_INTERRUPTED indeed.

> I'm asking that because I use an automated environment that calls 
> avocado and I've tried to abort it with SIGTERM, but it looks like it 
> does not handle this signal, so I've resorted to send SIGINT, wait the 
> 2 seconds interval and send SIGINT again. This worked for me, but the 
> different behaviors seemed confusing at first.

Thank you for the feedback. The card is created: 
https://trello.com/c/6qBdaSM8/648-handle-sigterm.

Best,
--
apahim

>
> Regards,
> Olav
>
>
> On 08-04-2016 07:51, Lukáš Doktor wrote:
>> Dne 5.4.2016 v 23:37 Amador Pahim napsal(a):
>>> Hi folks,
>>>
>>> This is the RFC for the rework in avocado job exit status. Some
>>> discussion have already happened on github, but still we should 
>>> document
>>> the decisions and open the discussion for a broader audience as well.
>>
>> Some discussion already happened also here: 
>> https://trello.com/c/SU5fixgH/510-improve-job-statuses
>>
>>>
>>> Motivation
>>> =======
>>>
>>> Currently the job expects from the runner a list of tests that 
>>> failed to
>>> determine the exit code Avocado will finish with. If the list is empty,
>>> the exit code is 0. Otherwise, 1. This implementation is very limited,
>>> given the number of possibilities of test ending status and the exit
>>> codes. The goal of this RFC is to determine the internal API between 
>>> job
>>> and runner, the relationship between the tests status and the Avocado
>>> exit codes and the meaning of the exit codes.
>>>
>>> Use cases/current issues:
>>>
>>> - When all tests end with 'PASS' the avocado exit code is 0, which 
>>> means
>>> "AVOCADO_ALL_OK".
>>> - When some or all tests end with 'FAIL', avocado exit code is 1, which
>>> is defined as "AVOCADO_TESTS_FAIL".
>>> - When the job is interrupted with CTRL+C: Current test is INTERRUPTED,
>>> avocado exit code is "AVOCADO_TESTS_FAIL".
>> This had been fixed by you and AVOCADO_JOB_INTERRUPTED is reported
>>> - When the job hits the timeout before finish the tests, we have 2
>>> possible results:
>>> -- Timeout during a test: The test is interrupted, user sees the status
>>> ERROR (this status is buggy, it's being fixed, but it's not part of 
>>> this
>>> RFC) for the test and next tests are skipped, avocado exit code is
>>> "AVOCADO_TESTS_FAIL".
>> Also fixed by you,
>>> -- Timeout between tests: Next tests are skipped, avocado exit code is
>>> "AVOCADO_ALL_OK".
>> IMO this last is a bug and it should report 
>> `AVOCADO_JOB_INTERRUPTED`, because the job was interrupted, but 
>> unless I'm wrong this is the current behavior (after your fix).
>>
>>>
>>>
>>> Internals
>>> ======
>>>
>>> We have currently a dictionary with the status as key and True or False
>>> as value for each status:
>>>
>>> mapping = {"SKIP": True, "ABORT": False, "ERROR": False, FAIL": False,
>>> "WARN": True, "PASS": True, "START": True, "ALERT": False, "RUNNING":
>>> False, "NOSTATUS": False, "INTERRUPTED": False}
>>>
>>> That dictionary tells the runner is a status is good or bad:
>>>
>>> ...
>>> if not status.mapping[test_state['status']]:
>>>      failures.append(test_state['name'])
>>> ...
>>> return failures
>>> ...
>>>
>>> Based on that return, the job decides between 0 or 1 as the exit code:
>>>
>>> ...
>>> tests_status = not bool(failures)
>>> if tests_status:
>>>      return exit_codes.AVOCADO_ALL_OK
>>> else:
>>>      return exit_codes.AVOCADO_TESTS_FAIL
>>> ...
>>>
>>> Currently the exit codes available are:
>>>
>>> AVOCADO_ALL_OK = 0
>>> AVOCADO_TESTS_FAIL = 1
>>> AVOCADO_JOB_FAIL = 2
>>> AVOCADO_FAIL = 3
>>> AVOCADO_JOB_INTERRUPTED = 4
>>>
>>>
>>> Recommended Solution
>>> ===============
>>>
>>> Runner should be able to provide a more accurate information to the 
>>> job,
>>> better representing what actually happened to the tests. After some
>>> discussion in github, we are currently proposing the minimum enough
>>> information for the runner to report so the job can decide the best fit
>>> for the exit code:
>>>
>>> On the runner:
>>> - Instead of a list called 'failures', the proposal is to have a set,
>>> called 'summary'.
>>> - If the job hits the timeout, being the test reported as 
>>> INTERRUPTED or
>>> SKIP, we add the string 'INTERRUPTED' to the 'summary'.
>>> - If the test finishes with a bad status ('False' in the mapping), we
>>> add the string FAIL to the 'summary'.
>>> - If the test finishes with a good status, we don't add anything to the
>>> 'summary'.
>>> - If the runner someway crashes, 'summary' will not be returned and the
>>> job should handle that.
>>>
>>> On the job:
>>> - Receive the summary and test:
>>> -- If the string "INTERRUPTED" is there, exit with
>>> "AVOCADO_JOB_INTERRUPTED", regardless if any test failed.
>>> -- If we don't have "INTERRUPTED" in 'summary' but still we have
>>> something there, exit with "AVOCADO_TESTS_FAIL".
>>> -- Empty 'summary' means job should exit with "AVOCADO_ALL_OK".
>>> -- 'None' in 'summary' means runner crashed and job should exit with
>>> "AVOCADO_JOB_FAIL".
>> Already upstream by you.
>>
>>>
>>>
>>> Additional Improvements
>>> ================
>>>
>>> There is a request to the exit codes to be ORable. To do so, we have to
>>> use different codes of what we have currently, changing them to numbers
>>> that set only one bit to 1 when converted to binary:
>>>
>>> AVOCADO_ALL_OK = 0
>>> AVOCADO_TESTS_FAIL = 1
>>> AVOCADO_JOB_FAIL = 2
>>> AVOCADO_FAIL = 4
>>> AVOCADO_JOB_INTERRUPTED = 8
>>>
>>> That way, the test status should be a code that can be used to have 
>>> more
>>> information about what happened to the group of tests. Example:
>>>
>>> Test1: PASS
>>> Test2: FAIL
>>> Test3: INTERRUPTED
>>> Test4: SKIP
>>>
>>> On the example above, we have a FAILed test, making job to use the
>>> AVOCADO_TESTS_FAIL code, and an INTERRUPTED test, making job to use the
>>> AVOCADO_JOB_INTERRUPTED. PASS and SKIP are considered good statuses, so
>>> the final job exit code would be 9 (AVOCADO_ALL_OK | AVOCADO_TESTS_FAIL
>>> | AVOCADO_JOB_INTERRUPTED).
>> This is basically the idea from the trello card. I agree with it, it 
>> just requires deeper changes to avocado and job.
>>
>> Regards,
>> Lukáš
>>
>>>
>>> This request is quite well designed, but still there is room for
>>> discussion before it gains upstream.
>>>
>>> Thanks,
>>> -- 
>>> apahim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Avocado-devel mailing list
>>> Avocado-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/avocado-devel
>>
>> _______________________________________________
>> Avocado-devel mailing list
>> Avocado-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/avocado-devel
>>
>
> _______________________________________________
> Avocado-devel mailing list
> Avocado-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/avocado-devel