[Avocado-devel] no sigterm_handler stack trace on test timeout?

Brian J. Murrell brian at interlinx.bc.ca
Tue Nov 13 15:22:06 UTC 2018


When I run a sample test:

```
import time
from avocado import Test

class TimeoutTest(Test):

    timeout = 3

    def setUp(self):
        print "setUp"

    def tearDown(self):
        print "tearDown"

    def test(self):
        try:
            sleep_time = self.params.get('sleep_time', default=5)
            self.log.info('Sleeping for %.2f seconds (%.2f more than the timeout)',
                          sleep_time, sleep_time - self.timeout)
            time.sleep(sleep_time)
        except ValueError as e:
            print e
```

to demonstrate what a timeout looks like I get:

```
2018-11-13 10:02:55,588 varianter        L0112 DEBUG| PARAMS (key=timeout, path=*, default=3) => 3
2018-11-13 10:02:55,588 test             L0240 INFO | START 1-timeouttest.py:TimeoutTest.test
2018-11-13 10:02:55,725 output           L0655 DEBUG| setUp
2018-11-13 10:02:55,726 varianter        L0112 DEBUG| PARAMS (key=sleep_time, path=*, default=5) => 5
2018-11-13 10:02:55,726 timeouttest      L0032 INFO | Sleeping for 5.00 seconds (2.00 more than the timeout)
2018-11-13 10:02:58,623 stacktrace       L0039 ERROR| 
2018-11-13 10:02:58,624 stacktrace       L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
2018-11-13 10:02:58,624 stacktrace       L0045 ERROR| Traceback (most recent call last):
2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|   File "/home/brian/daos/daos/src/tests/ftest/timeouttest.py", line 33, in test
2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|     time.sleep(sleep_time)
2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler
2018-11-13 10:02:58,625 stacktrace       L0045 ERROR|     raise RuntimeError("Test interrupted by SIGTERM")
2018-11-13 10:02:58,626 stacktrace       L0045 ERROR| RuntimeError: Test interrupted by SIGTERM
2018-11-13 10:02:58,626 stacktrace       L0046 ERROR| 
2018-11-13 10:02:58,626 test             L0601 DEBUG| Local variables:
2018-11-13 10:02:58,663 test             L0604 DEBUG|  -> self <class 'timeouttest.TimeoutTest'>: 1-timeouttest.py:TimeoutTest.test
2018-11-13 10:02:58,663 test             L0604 DEBUG|  -> sleep_time <type 'int'>: 5
2018-11-13 10:02:58,663 output           L0655 DEBUG| tearDown
2018-11-13 10:02:58,664 test             L0729 ERROR| Traceback (most recent call last):

2018-11-13 10:02:58,664 test             L0729 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado
    raise test_exception

2018-11-13 10:02:58,664 test             L0729 ERROR| RuntimeError: Test interrupted by SIGTERM

2018-11-13 10:02:58,664 test             L0744 ERROR| ERROR 1-timeouttest.py:TimeoutTest.test -> RuntimeError: Test interrupted by SIGTERM
2018-11-13 10:02:58,664 test             L0733 INFO | 
2018-11-13 10:02:58,674 runner           L0239 DEBUG| Original status: {'traceback': 'Traceback (most recent call last):\n  File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado\n    raise test_exception\nRuntimeError: Test interrupted by SIGTERM\n', 'class_name': 'TimeoutTest', 'time_end': 1542121378.664672, 'paused': False, 'time_start': 1542121375.724386, 'basedir': '/home/brian/daos/daos/src/tests/ftest', 'time_elapsed': 2.9402859210968018, 'params': <AvocadoParams *: * ([]),0: / (['//']),1: r ([]),2: u ([]),3: n ([])>, 'job_logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0', 'job_unique_id': '2629ff06bbecdbf6e3c6f49019d72fad50fdd190', 'srcdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test/src', 'logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test', 'running': False, 'logfile': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test/debug.log', 'fail_reason': 'Test interrupted by SIGTERM', 'name': '1-timeouttest.py:TimeoutTest.test', 'whiteboard': '', 'fail_class': 'RuntimeError', 'timeout': 3, 'paused_msg': ''}
2018-11-13 10:02:58,674 runner           L0253 ERROR| ERROR Test reported status but did not finish -> TestAbortedError: 1-timeouttest.py:TimeoutTest.test.
2018-11-13 10:02:58,674 runner           L0061 ERROR| Runner error occurred: Timeout reached
Original status: ERROR
{'whiteboard': '', 'class_name': 'TimeoutTest', 'time_end': -1, 'paused': False, 'time_start': -1, 'early_status': True, 'basedir': '/home/brian/daos/daos/src/tests/ftest', 'time_elapsed': 3.099815845489502, 'params': <AvocadoParams *: * ([]),0: / (['//']),1: r ([]),2: u ([]),3: n ([])>, 'job_logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0', 'job_unique_id': '2629ff06bbecdbf6e3c6f49019d72fad50fdd190', 'srcdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test/src', 'logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test', 'text_output': '10:02:55 DEBUG| setUp\n10:02:55 DEBUG| PARAMS (key=sleep_time, path=*, default=5) => 5\n10:02:55 INFO | Sleeping for 5.00 seconds (2.00 more than the timeout)\n10:02:58 ERROR| \n10:02:58 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596\n10:02:58 ERROR| Traceback (most recent call last):\n10:02:58 ERROR|   File "/home/brian/daos/daos/src/tests/ftest/timeouttest.py", line 33, in test\n10:02:58 ERROR|     time.sleep(sleep_time)\n10:02:58 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler\n10:02:58 ERROR|     raise RuntimeError("Test interrupted by SIGTERM")\n10:02:58 ERROR| RuntimeError: Test interrupted by SIGTERM\n10:02:58 ERROR| \n10:02:58 DEBUG| Local variables:\n10:02:58 DEBUG|  -> self <class \'timeouttest.TimeoutTest\'>: 1-timeouttest.py:TimeoutTest.test\n10:02:58 DEBUG|  -> sleep_time <type \'int\'>: 5\n10:02:58 DEBUG| tearDown\n10:02:58 ERROR| Traceback (most recent call last):\n\n10:02:58 ERROR|   File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado\n    raise test_exception\n\n10:02:58 ERROR| RuntimeError: Test interrupted by SIGTERM\n\n10:02:58 ERROR| ERROR 1-timeouttest.py:TimeoutTest.test -> RuntimeError: Test interrupted by SIGTERM\n10:02:58 INFO | \n', 'running': False, 'logfile': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test/debug.log', 'fail_reason': 'Test reported status but did not finish', 'name': '1-timeouttest.py:TimeoutTest.test', 'traceback': 'Traceback not available', 'fail_class': 'type', 'timeout': 3, 'paused_msg': ''}
```

Observing the traceback that comes from sigterm_handler, I assume that
every test that times out should have this.  Is this a correct
assumption?

However during a test that ran in our testsuite last night I see:

```
2018-11-13 00:48:14,478 varianter        L0112 DEBUG| PARAMS (key=timeout, path=*, default=None) => None
2018-11-13 00:48:14,478 test             L0240 INFO | START 3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67
2018-11-13 00:48:14,618 varianter        L0112 DEBUG| PARAMS (key=server_group, path=/server/, default=daos_server) => 'daos_server'
2018-11-13 00:48:14,632 varianter        L0112 DEBUG| PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['wolf-53vm2', 'wolf-53vm3', 'wolf-53vm4', 'wolf-53vm5']
...
2018-11-13 00:48:23,420 varianter        L0112 DEBUG| PARAMS (key=mode, path=/run/createtests/createmode/*/, default=None) => 146
2018-11-13 00:48:23,420 varianter        L0112 DEBUG| PARAMS (key=setname, path=/run/createtests/createset/, default=None) => 'daos_server'
2018-11-13 00:48:23,421 varianter        L0112 DEBUG| PARAMS (key=size, path=/run/createtests/createsize/, default=None) => 1073741
2018-11-13 00:48:23,421 varianter        L0112 DEBUG| PARAMS (key=svc, path=/run/createtests/createsvc/*/, default=None) => [3, 'PASS']
2018-11-13 00:48:31,701 process          L0389 INFO | Running '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530'
2018-11-13 00:48:32,095 process          L0499 INFO | Command '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530' finished with 0 after 0.216053962708s
2018-11-13 00:48:32,104 process          L0488 DEBUG| [stdout] SUCCESS killing pool service leader
2018-11-13 01:18:15,511 runner           L0239 DEBUG| Original status: {}
2018-11-13 01:18:15,555 runner           L0253 ERROR| ERROR Test died without reporting the status. -> TestAbortedError: 3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67.
2018-11-13 01:18:15,555 runner           L0255 WARNI| Killing hanged test process 29558
2018-11-13 01:18:16,663 runner           L0061 ERROR| Runner error occurred: Timeout reached
Original status: ERROR
{'whiteboard': '', 'class_name': 'PoolSvc', 'time_end': -1, 'paused': False, 'time_start': -1, 'early_status': True, 'basedir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool', 'time_elapsed': 1801.2555010318756, 'params': <AvocadoParams *: * ([]),0: /run/* (['/run/hosts/test_machine1/', '/run/server/', '/run/createtests/createmode/mode_RW/', '/run/createtests/createset/', '/run/createtests/createsize/', '/run/createtests/createsvc/svc3/'])>, 'job_logdir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c', 'job_unique_id': 'df25f8c77b77367c02120479c795140a2caa59bf', 'srcdir': '/var/tmp/avocado_9uWWRj/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py_PoolSvc.test_poolsvc_mode_RW-createset-createsize-svc3-test_machine1-server-aa67/src', 'logdir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c/test-results/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_9uWWRj/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py_PoolSvc.test_poolsvc_mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'text_output': "00:48:14 DEBUG| PARAMS (key=server_group, path=/server/, default=daos_server) => 'daos_server'\n00:48:14 DEBUG| PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['wolf-53vm2', 'wolf-53vm3', 'wolf-53vm4', 'wolf-53vm5']\n00:48:14 DEBUG| Host file is: /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/hostfile91180\n00:48:14 DEBUG| Start CMD>>>>/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/orterun --np 4 --hostfile /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/hostfile91180 --enable-recovery -x DD_SUBSYS='' -x OFI_INTERFACE='eth0' -x DD_MASK='all' -x OFI_PORT='23350' -x CRT_ATTACH_INFO_PATH='/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp' -x D_LOG_MASK='DEBUG,RPC=ERR,MEM=ERR' -x DAOS_SINGLETON_CLI='1' -x ABT_ENV_MAX_NUM_XSTREAMS='64' -x DD_LOG='/tmp/Functional_quick/daos.log' -x CRT_PHY_ADDR_STR='ofi+sockets' -x D_LOG_FILE='/tmp/Functional_quick/daos.log' -x ABT_MAX_NUM_XSTREAMS='64' -x CRT_CTX_SHARE_ADDR='1' /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daos_server -g daos_server -c 1  -a /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/\n00:48:18 DEBUG| <SERVER>: Starting Servers\n00:48:18 DEBUG| 2018/11/13 00:48:01 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:00 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:03 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:47:59 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:03 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:03 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:48:00 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:00 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:48:01 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:01 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:47:59 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:47:59 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| DAOS server (v0.0.2) process 17338 started on rank 1 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 25654 started on rank 2 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 5135 started on rank 3 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 8903 started on rank 0 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| \n00:48:18 DEBUG| <SERVER> server started and took 3.10468387604 seconds to start\n00:48:23 DEBUG| PARAMS (key=mode, path=/run/createtests/createmode/*/, default=None) => 146\n00:48:23 DEBUG| PARAMS (key=setname, path=/run/createtests/createset/, default=None) => 'daos_server'\n00:48:23 DEBUG| PARAMS (key=size, path=/run/createtests/createsize/, default=None) => 1073741\n00:48:23 DEBUG| PARAMS (key=svc, path=/run/createtests/createsvc/*/, default=None) => [3, 'PASS']\n00:48:31 INFO | Running '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530'\n00:48:32 INFO | Command '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader  --uuid=7E461A34-F049-4925-847A-5DA1D9D35530' finished with 0 after 0.216053962708s\n00:48:32 DEBUG| [stdout] SUCCESS killing pool service leader\n", 'running': False, 'logfile': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c/test-results/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67/debug.log', 'fail_reason': 'Test died without reporting the status.', 'name': '3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'traceback': 'Traceback not available', 'fail_class': 'type', 'timeout': 1800, 'paused_msg': ''}
```

Without any of the timeout stack trace nor any evidence that the test's
tearDown() was run.  Shouldn't I see the sigterm_handler stack trace at
minimum?

Cheers,
b.






More information about the Avocado-devel mailing list