[Avocado-devel] no sigterm_handler stack trace on test timeout?

Cleber Rosa crosa at redhat.com
Tue Nov 13 17:49:00 UTC 2018



On 11/13/18 10:22 AM, Brian J. Murrell wrote:
> When I run a sample test:
> 
> ```
> import time
> from avocado import Test
> 
> class TimeoutTest(Test):
> 
>     timeout = 3
> 
>     def setUp(self):
>         print "setUp"
> 
>     def tearDown(self):
>         print "tearDown"
> 
>     def test(self):
>         try:
>             sleep_time = self.params.get('sleep_time', default=5)
>             self.log.info('Sleeping for %.2f seconds (%.2f more than the timeout)',
>                           sleep_time, sleep_time - self.timeout)
>             time.sleep(sleep_time)
>         except ValueError as e:
>             print e
> ```
> 
> to demonstrate what a timeout looks like I get:
> 
> ```
> 2018-11-13 10:02:55,588 varianter        L0112 DEBUG| PARAMS (key=timeout, path=*, default=3) => 3
> 2018-11-13 10:02:55,588 test             L0240 INFO | START 1-timeouttest.py:TimeoutTest.test
> 2018-11-13 10:02:55,725 output           L0655 DEBUG| setUp
> 2018-11-13 10:02:55,726 varianter        L0112 DEBUG| PARAMS (key=sleep_time, path=*, default=5) => 5
> 2018-11-13 10:02:55,726 timeouttest      L0032 INFO | Sleeping for 5.00 seconds (2.00 more than the timeout)
> 2018-11-13 10:02:58,623 stacktrace       L0039 ERROR| 
> 2018-11-13 10:02:58,624 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
> 2018-11-13 10:02:58,624 stacktrace       L0045 ERROR| Traceback (most recent call last):
> 2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|   File "/home/brian/daos/daos/src/tests/ftest/timeouttest.py", line 33, in test
> 2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|     time.sleep(sleep_time)
> 2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler
> 2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|     raise RuntimeError("Test interrupted by SIGTERM")
> 2018-11-13 10:02:58,626 stacktrace       L0045 ERROR| RuntimeError: Test interrupted by SIGTERM
> 2018-11-13 10:02:58,626 stacktrace       L0046 ERROR| 
> 2018-11-13 10:02:58,626 test             L0601 DEBUG| Local variables:
> 2018-11-13 10:02:58,663 test             L0604 DEBUG|  -> self <class 'timeouttest.TimeoutTest'>: 1-timeouttest.py:TimeoutTest.test
> 2018-11-13 10:02:58,663 test             L0604 DEBUG|  -> sleep_time <type 'int'>: 5
> 2018-11-13 10:02:58,663 output           L0655 DEBUG| tearDown
> 2018-11-13 10:02:58,664 test             L0729 ERROR| Traceback (most recent call last):
> 
> 2018-11-13 10:02:58,664 test             L0729 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado
>     raise test_exception
> 
> 2018-11-13 10:02:58,664 test             L0729 ERROR| RuntimeError: Test interrupted by SIGTERM
> 
> 2018-11-13 10:02:58,664 test             L0744 ERROR| ERROR 1-timeouttest.py:TimeoutTest.test -> RuntimeError: Test interrupted by SIGTERM
> 2018-11-13 10:02:58,664 test             L0733 INFO | 
> 2018-11-13 10:02:58,674 runner           L0239 DEBUG| Original status: {'traceback': 'Traceback (most recent call last):\n  File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado\n    raise test_exception\nRuntimeError: Test interrupted by SIGTERM\n', 'class_name': 'TimeoutTest', 'time_end': 1542121378.664672, 'paused': False, 'time_start': 1542121375.724386, 'basedir': '/home/brian/daos/daos/src/tests/ftest', 'time_elapsed': 2.9402859210968018, 'params': <AvocadoParams *: * ([]),0: / (['//']),1: r ([]),2: u ([]),3: n ([])>, 'job_logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0', 'job_unique_id': '2629ff06bbecdbf6e3c6f49019d72fad50fdd190', 'srcdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test/src', 'logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test', 'status': 'ERROR', 'workdir': '/var/tmp/avocado
>  _ncd6mf/1-timeouttest.py_TimeoutTest.test', 'running': False, 'logfile': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test/debug.log', 'fail_reason': 'Test interrupted by SIGTERM', 'name': '1-timeouttest.py:TimeoutTest.test', 'whiteboard': '', 'fail_class': 'RuntimeError', 'timeout': 3, 'paused_msg': ''}
> 2018-11-13 10:02:58,674 runner           L0253 ERROR| ERROR Test reported status but did not finish -> TestAbortedError: 1-timeouttest.py:TimeoutTest.test.
> 2018-11-13 10:02:58,674 runner           L0061 ERROR| Runner error occurred: Timeout reached
> Original status: ERROR
> {'whiteboard': '', 'class_name': 'TimeoutTest', 'time_end': -1, 'paused': False, 'time_start': -1, 'early_status': True, 'basedir': '/home/brian/daos/daos/src/tests/ftest', 'time_elapsed': 3.099815845489502, 'params': <AvocadoParams *: * ([]),0: / (['//']),1: r ([]),2: u ([]),3: n ([])>, 'job_logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0', 'job_unique_id': '2629ff06bbecdbf6e3c6f49019d72fad50fdd190', 'srcdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test/src', 'logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test', 'text_output': '10:02:55 DEBUG| setUp\n10:02:55 DEBUG| PARAMS (key=sleep_time, path=*, default=5) => 5\n10:02:55 INFO | Sleeping for 5.00 seconds (2.00 more than the timeout)\n10:02:58 ERROR| \n10:02:58 ERROR| Reproduced traceback 
>  from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596\n10:02:58 ERROR| Traceback (most recent call last):\n10:02:58 ERROR|   File "/home/brian/daos/daos/src/tests/ftest/timeouttest.py", line 33, in test\n10:02:58 ERROR|     time.sleep(sleep_time)\n10:02:58 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler\n10:02:58 ERROR|     raise RuntimeError("Test interrupted by SIGTERM")\n10:02:58 ERROR| RuntimeError: Test interrupted by SIGTERM\n10:02:58 ERROR| \n10:02:58 DEBUG| Local variables:\n10:02:58 DEBUG|  -> self <class \'timeouttest.TimeoutTest\'>: 1-timeouttest.py:TimeoutTest.test\n10:02:58 DEBUG|  -> sleep_time <type \'int\'>: 5\n10:02:58 DEBUG| tearDown\n10:02:58 ERROR| Traceback (most recent call last):\n\n10:02:58 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado\n    raise test_exception\n\n10:02:58 ERROR| RuntimeError: Test interrupted by SIGTERM\n\n10:02:58 ERROR| ERROR 
>  1-timeouttest.py:TimeoutTest.test -> RuntimeError: Test interrupted by SIGTERM\n10:02:58 INFO | \n', 'running': False, 'logfile': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test/debug.log', 'fail_reason': 'Test reported status but did not finish', 'name': '1-timeouttest.py:TimeoutTest.test', 'traceback': 'Traceback not available', 'fail_class': 'type', 'timeout': 3, 'paused_msg': ''}
> ```
> 
> Observing the traceback that comes from sigterm_handler, I assume that
> every test that times out should have this.  Is this a correct
> assumption?
> 

AFAICT, every test gets this handler installed.

> However during a test that ran in our testsuite last night I see:
> 
> ```
> 2018-11-13 00:48:14,478 varianter        L0112 DEBUG| PARAMS (key=timeout, path=*, default=None) => None
> 2018-11-13 00:48:14,478 test             L0240 INFO | START 3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67
> 2018-11-13 00:48:14,618 varianter        L0112 DEBUG| PARAMS (key=server_group, path=/server/, default=daos_server) => 'daos_server'
> 2018-11-13 00:48:14,632 varianter        L0112 DEBUG| PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['wolf-53vm2', 'wolf-53vm3', 'wolf-53vm4', 'wolf-53vm5']
> ...
> 2018-11-13 00:48:23,420 varianter        L0112 DEBUG| PARAMS (key=mode, path=/run/createtests/createmode/*/, default=None) => 146
> 2018-11-13 00:48:23,420 varianter        L0112 DEBUG| PARAMS (key=setname, path=/run/createtests/createset/, default=None) => 'daos_server'
> 2018-11-13 00:48:23,421 varianter        L0112 DEBUG| PARAMS (key=size, path=/run/createtests/createsize/, default=None) => 1073741
> 2018-11-13 00:48:23,421 varianter        L0112 DEBUG| PARAMS (key=svc, path=/run/createtests/createsvc/*/, default=None) => [3, 'PASS']
> 2018-11-13 00:48:31,701 process          L0389 INFO | Running '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530'
> 2018-11-13 00:48:32,095 process          L0499 INFO | Command '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530' finished with 0 after 0.216053962708s
> 2018-11-13 00:48:32,104 process          L0488 DEBUG| [stdout] SUCCESS killing pool service leader
> 2018-11-13 01:18:15,511 runner           L0239 DEBUG| Original status: {}
> 2018-11-13 01:18:15,555 runner           L0253 ERROR| ERROR Test died without reporting the status. -> TestAbortedError: 3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67.
> 2018-11-13 01:18:15,555 runner           L0255 WARNI| Killing hanged test process 29558
> 2018-11-13 01:18:16,663 runner           L0061 ERROR| Runner error occurred: Timeout reached
> Original status: ERROR
> {'whiteboard': '', 'class_name': 'PoolSvc', 'time_end': -1, 'paused': False, 'time_start': -1, 'early_status': True, 'basedir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool', 'time_elapsed': 1801.2555010318756, 'params': <AvocadoParams *: * ([]),0: /run/* (['/run/hosts/test_machine1/', '/run/server/', '/run/createtests/createmode/mode_RW/', '/run/createtests/createset/', '/run/createtests/createsize/', '/run/createtests/createsvc/svc3/'])>, 'job_logdir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c', 'job_unique_id': 'df25f8c77b77367c02120479c795140a2caa59bf', 'srcdir': '/var/tmp/avocado_9uWWRj/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py_PoolSvc.test_poolsvc_mode_RW-createset-createsize-svc3-test_machine1-server-aa67/src', 'logdir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocad
>  o/job-results/job-2018-11-13T00.47-df25f8c/test-results/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_9uWWRj/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py_PoolSvc.test_poolsvc_mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'text_output': "00:48:14 DEBUG| PARAMS (key=server_group, path=/server/, default=daos_server) => 'daos_server'\n00:48:14 DEBUG| PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['wolf-53vm2', 'wolf-53vm3', 'wolf-53vm4', 'wolf-53vm5']\n00:48:14 DEBUG| Host file is: /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/hostfile91180\n00:48:14 DEBUG| Start CMD>>>>/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/orterun --np 4 --hostfile /var/lib/jenkins/workspace/daos-sta
>  ck-org_daos_PR-13-centos7/install/tmp/hostfile91180 --enable-recovery -x DD_SUBSYS='' -x OFI_INTERFACE='eth0' -x DD_MASK='all' -x OFI_PORT='23350' -x CRT_ATTACH_INFO_PATH='/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp' -x D_LOG_MASK='DEBUG,RPC=ERR,MEM=ERR' -x DAOS_SINGLETON_CLI='1' -x ABT_ENV_MAX_NUM_XSTREAMS='64' -x DD_LOG='/tmp/Functional_quick/daos.log' -x CRT_PHY_ADDR_STR='ofi+sockets' -x D_LOG_FILE='/tmp/Functional_quick/daos.log' -x ABT_MAX_NUM_XSTREAMS='64' -x CRT_CTX_SHARE_ADDR='1' /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daos_server -g daos_server -c 1  -a /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/\n00:48:18 DEBUG| <SERVER>: Starting Servers\n00:48:18 DEBUG| 2018/11/13 00:48:01 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:00 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org
>  _daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:03 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:47:59 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:03 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:03 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:48:00 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:00 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:48:01 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_serv
>  er.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:01 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:47:59 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:47:59 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| DAOS server (v0.0.2) process 17338 started on rank 1 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 25654 started on rank 2 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 5135 started on rank 3 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 8903 started on rank 0 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| \n00:48:18 DEBUG| <SERVER> server started and took 3.10468387604 seconds to start\n00:48:23 DEBUG| PARAMS (key=mode, path=/run/createtests/createmode/*/, default=None) => 146\n00:48:23 DEBUG| PARAMS (key=setname, path=/
>  run/createtests/createset/, default=None) => 'daos_server'\n00:48:23 DEBUG| PARAMS (key=size, path=/run/createtests/createsize/, default=None) => 1073741\n00:48:23 DEBUG| PARAMS (key=svc, path=/run/createtests/createsvc/*/, default=None) => [3, 'PASS']\n00:48:31 INFO | Running '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530'\n00:48:32 INFO | Command '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530' finished with 0 after 0.216053962708s\n00:48:32 DEBUG| [stdout] SUCCESS killing pool service leader\n", 'running': False, 'logfile': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c/test-results/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-creates
>  ize-svc3-test_machine1-server-aa67/debug.log', 'fail_reason': 'Test died without reporting the status.', 'name': '3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'traceback': 'Traceback not available', 'fail_class': 'type', 'timeout': 1800, 'paused_msg': ''}
> ```
> 

I don't see a code path in which test processes don't have the sigterm
handler installed, but there are different code paths regarding the
ending of the test process.  On the first example I can see:

"ERROR Test reported status but did not finish"

While on the second I can see:

"ERROR Test died without reporting the status"

> Without any of the timeout stack trace nor any evidence that the test's
> tearDown() was run.  Shouldn't I see the sigterm_handler stack trace at
> minimum?
> 

On this second example, the process is supposed to have died already, so
it makes no sense to send it a SIGTERM.

With regards to tearDown(), it's not something that gets run on test the
 process signal handler (as a cleanup stage), but it's part of the
general test execution flow.  Avocado attempts to make sure tearDown()
is executed in spite of error conditions on the test() methods, but not
to the point of intercepting SIGTERM and running tearDown() code at that
time.

Is this what you're trying to achieve here?

Regards,
- Cleber.

> Cheers,
> b.
> 
> 
> 

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]
[  7ABB 96EB 8B46 B94D 5E0F  E9BB 657E 8D33 A5F2 09F3  ]




More information about the Avocado-devel mailing list