[Avocado-devel] no sigterm_handler stack trace on test timeout?
Cleber Rosa
crosa at redhat.com
Tue Nov 13 17:49:00 UTC 2018
On 11/13/18 10:22 AM, Brian J. Murrell wrote:
> When I run a sample test:
>
> ```
> import time
> from avocado import Test
>
> class TimeoutTest(Test):
>
> timeout = 3
>
> def setUp(self):
> print "setUp"
>
> def tearDown(self):
> print "tearDown"
>
> def test(self):
> try:
> sleep_time = self.params.get('sleep_time', default=5)
> self.log.info('Sleeping for %.2f seconds (%.2f more than the timeout)',
> sleep_time, sleep_time - self.timeout)
> time.sleep(sleep_time)
> except ValueError as e:
> print e
> ```
>
> to demonstrate what a timeout looks like I get:
>
> ```
> 2018-11-13 10:02:55,588 varianter L0112 DEBUG| PARAMS (key=timeout, path=*, default=3) => 3
> 2018-11-13 10:02:55,588 test L0240 INFO | START 1-timeouttest.py:TimeoutTest.test
> 2018-11-13 10:02:55,725 output L0655 DEBUG| setUp
> 2018-11-13 10:02:55,726 varianter L0112 DEBUG| PARAMS (key=sleep_time, path=*, default=5) => 5
> 2018-11-13 10:02:55,726 timeouttest L0032 INFO | Sleeping for 5.00 seconds (2.00 more than the timeout)
> 2018-11-13 10:02:58,623 stacktrace L0039 ERROR|
> 2018-11-13 10:02:58,624 stacktrace L0042 ERROR| Reproduced traceback from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596
> 2018-11-13 10:02:58,624 stacktrace L0045 ERROR| Traceback (most recent call last):
> 2018-11-13 10:02:58,625 stacktrace L0045 ERROR| File "/home/brian/daos/daos/src/tests/ftest/timeouttest.py", line 33, in test
> 2018-11-13 10:02:58,625 stacktrace L0045 ERROR| time.sleep(sleep_time)
> 2018-11-13 10:02:58,625 stacktrace L0045 ERROR| File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler
> 2018-11-13 10:02:58,625 stacktrace L0045 ERROR| raise RuntimeError("Test interrupted by SIGTERM")
> 2018-11-13 10:02:58,626 stacktrace L0045 ERROR| RuntimeError: Test interrupted by SIGTERM
> 2018-11-13 10:02:58,626 stacktrace L0046 ERROR|
> 2018-11-13 10:02:58,626 test L0601 DEBUG| Local variables:
> 2018-11-13 10:02:58,663 test L0604 DEBUG| -> self <class 'timeouttest.TimeoutTest'>: 1-timeouttest.py:TimeoutTest.test
> 2018-11-13 10:02:58,663 test L0604 DEBUG| -> sleep_time <type 'int'>: 5
> 2018-11-13 10:02:58,663 output L0655 DEBUG| tearDown
> 2018-11-13 10:02:58,664 test L0729 ERROR| Traceback (most recent call last):
>
> 2018-11-13 10:02:58,664 test L0729 ERROR| File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado
> raise test_exception
>
> 2018-11-13 10:02:58,664 test L0729 ERROR| RuntimeError: Test interrupted by SIGTERM
>
> 2018-11-13 10:02:58,664 test L0744 ERROR| ERROR 1-timeouttest.py:TimeoutTest.test -> RuntimeError: Test interrupted by SIGTERM
> 2018-11-13 10:02:58,664 test L0733 INFO |
> 2018-11-13 10:02:58,674 runner L0239 DEBUG| Original status: {'traceback': 'Traceback (most recent call last):\n File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado\n raise test_exception\nRuntimeError: Test interrupted by SIGTERM\n', 'class_name': 'TimeoutTest', 'time_end': 1542121378.664672, 'paused': False, 'time_start': 1542121375.724386, 'basedir': '/home/brian/daos/daos/src/tests/ftest', 'time_elapsed': 2.9402859210968018, 'params': <AvocadoParams *: * ([]),0: / (['//']),1: r ([]),2: u ([]),3: n ([])>, 'job_logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0', 'job_unique_id': '2629ff06bbecdbf6e3c6f49019d72fad50fdd190', 'srcdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test/src', 'logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test', 'status': 'ERROR', 'workdir': '/var/tmp/avocado
> _ncd6mf/1-timeouttest.py_TimeoutTest.test', 'running': False, 'logfile': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test/debug.log', 'fail_reason': 'Test interrupted by SIGTERM', 'name': '1-timeouttest.py:TimeoutTest.test', 'whiteboard': '', 'fail_class': 'RuntimeError', 'timeout': 3, 'paused_msg': ''}
> 2018-11-13 10:02:58,674 runner L0253 ERROR| ERROR Test reported status but did not finish -> TestAbortedError: 1-timeouttest.py:TimeoutTest.test.
> 2018-11-13 10:02:58,674 runner L0061 ERROR| Runner error occurred: Timeout reached
> Original status: ERROR
> {'whiteboard': '', 'class_name': 'TimeoutTest', 'time_end': -1, 'paused': False, 'time_start': -1, 'early_status': True, 'basedir': '/home/brian/daos/daos/src/tests/ftest', 'time_elapsed': 3.099815845489502, 'params': <AvocadoParams *: * ([]),0: / (['//']),1: r ([]),2: u ([]),3: n ([])>, 'job_logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0', 'job_unique_id': '2629ff06bbecdbf6e3c6f49019d72fad50fdd190', 'srcdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test/src', 'logdir': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_ncd6mf/1-timeouttest.py_TimeoutTest.test', 'text_output': '10:02:55 DEBUG| setUp\n10:02:55 DEBUG| PARAMS (key=sleep_time, path=*, default=5) => 5\n10:02:55 INFO | Sleeping for 5.00 seconds (2.00 more than the timeout)\n10:02:58 ERROR| \n10:02:58 ERROR| Reproduced traceback
> from: /usr/lib/python2.7/site-packages/avocado/core/test.py:596\n10:02:58 ERROR| Traceback (most recent call last):\n10:02:58 ERROR| File "/home/brian/daos/daos/src/tests/ftest/timeouttest.py", line 33, in test\n10:02:58 ERROR| time.sleep(sleep_time)\n10:02:58 ERROR| File "/usr/lib/python2.7/site-packages/avocado/core/runner.py", line 312, in sigterm_handler\n10:02:58 ERROR| raise RuntimeError("Test interrupted by SIGTERM")\n10:02:58 ERROR| RuntimeError: Test interrupted by SIGTERM\n10:02:58 ERROR| \n10:02:58 DEBUG| Local variables:\n10:02:58 DEBUG| -> self <class \'timeouttest.TimeoutTest\'>: 1-timeouttest.py:TimeoutTest.test\n10:02:58 DEBUG| -> sleep_time <type \'int\'>: 5\n10:02:58 DEBUG| tearDown\n10:02:58 ERROR| Traceback (most recent call last):\n\n10:02:58 ERROR| File "/usr/lib/python2.7/site-packages/avocado/core/test.py", line 667, in _run_avocado\n raise test_exception\n\n10:02:58 ERROR| RuntimeError: Test interrupted by SIGTERM\n\n10:02:58 ERROR| ERROR
> 1-timeouttest.py:TimeoutTest.test -> RuntimeError: Test interrupted by SIGTERM\n10:02:58 INFO | \n', 'running': False, 'logfile': '/home/brian/daos/daos/src/tests/ftest/avocado/job-results/job-2018-11-13T10.02-2629ff0/test-results/1-timeouttest.py:TimeoutTest.test/debug.log', 'fail_reason': 'Test reported status but did not finish', 'name': '1-timeouttest.py:TimeoutTest.test', 'traceback': 'Traceback not available', 'fail_class': 'type', 'timeout': 3, 'paused_msg': ''}
> ```
>
> Observing the traceback that comes from sigterm_handler, I assume that
> every test that times out should have this. Is this a correct
> assumption?
>
AFAICT, every test gets this handler installed.
> However during a test that ran in our testsuite last night I see:
>
> ```
> 2018-11-13 00:48:14,478 varianter L0112 DEBUG| PARAMS (key=timeout, path=*, default=None) => None
> 2018-11-13 00:48:14,478 test L0240 INFO | START 3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67
> 2018-11-13 00:48:14,618 varianter L0112 DEBUG| PARAMS (key=server_group, path=/server/, default=daos_server) => 'daos_server'
> 2018-11-13 00:48:14,632 varianter L0112 DEBUG| PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['wolf-53vm2', 'wolf-53vm3', 'wolf-53vm4', 'wolf-53vm5']
> ...
> 2018-11-13 00:48:23,420 varianter L0112 DEBUG| PARAMS (key=mode, path=/run/createtests/createmode/*/, default=None) => 146
> 2018-11-13 00:48:23,420 varianter L0112 DEBUG| PARAMS (key=setname, path=/run/createtests/createset/, default=None) => 'daos_server'
> 2018-11-13 00:48:23,421 varianter L0112 DEBUG| PARAMS (key=size, path=/run/createtests/createsize/, default=None) => 1073741
> 2018-11-13 00:48:23,421 varianter L0112 DEBUG| PARAMS (key=svc, path=/run/createtests/createsvc/*/, default=None) => [3, 'PASS']
> 2018-11-13 00:48:31,701 process L0389 INFO | Running '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader --uuid=7E461A34-F049-4925-847A-5DA1D9D35530'
> 2018-11-13 00:48:32,095 process L0499 INFO | Command '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader --uuid=7E461A34-F049-4925-847A-5DA1D9D35530' finished with 0 after 0.216053962708s
> 2018-11-13 00:48:32,104 process L0488 DEBUG| [stdout] SUCCESS killing pool service leader
> 2018-11-13 01:18:15,511 runner L0239 DEBUG| Original status: {}
> 2018-11-13 01:18:15,555 runner L0253 ERROR| ERROR Test died without reporting the status. -> TestAbortedError: 3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67.
> 2018-11-13 01:18:15,555 runner L0255 WARNI| Killing hanged test process 29558
> 2018-11-13 01:18:16,663 runner L0061 ERROR| Runner error occurred: Timeout reached
> Original status: ERROR
> {'whiteboard': '', 'class_name': 'PoolSvc', 'time_end': -1, 'paused': False, 'time_start': -1, 'early_status': True, 'basedir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool', 'time_elapsed': 1801.2555010318756, 'params': <AvocadoParams *: * ([]),0: /run/* (['/run/hosts/test_machine1/', '/run/server/', '/run/createtests/createmode/mode_RW/', '/run/createtests/createset/', '/run/createtests/createsize/', '/run/createtests/createsvc/svc3/'])>, 'job_logdir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c', 'job_unique_id': 'df25f8c77b77367c02120479c795140a2caa59bf', 'srcdir': '/var/tmp/avocado_9uWWRj/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py_PoolSvc.test_poolsvc_mode_RW-createset-createsize-svc3-test_machine1-server-aa67/src', 'logdir': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocad
> o/job-results/job-2018-11-13T00.47-df25f8c/test-results/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'status': 'ERROR', 'workdir': '/var/tmp/avocado_9uWWRj/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py_PoolSvc.test_poolsvc_mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'text_output': "00:48:14 DEBUG| PARAMS (key=server_group, path=/server/, default=daos_server) => 'daos_server'\n00:48:14 DEBUG| PARAMS (key=test_machines, path=/run/hosts/*, default=None) => ['wolf-53vm2', 'wolf-53vm3', 'wolf-53vm4', 'wolf-53vm5']\n00:48:14 DEBUG| Host file is: /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/hostfile91180\n00:48:14 DEBUG| Start CMD>>>>/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/orterun --np 4 --hostfile /var/lib/jenkins/workspace/daos-sta
> ck-org_daos_PR-13-centos7/install/tmp/hostfile91180 --enable-recovery -x DD_SUBSYS='' -x OFI_INTERFACE='eth0' -x DD_MASK='all' -x OFI_PORT='23350' -x CRT_ATTACH_INFO_PATH='/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp' -x D_LOG_MASK='DEBUG,RPC=ERR,MEM=ERR' -x DAOS_SINGLETON_CLI='1' -x ABT_ENV_MAX_NUM_XSTREAMS='64' -x DD_LOG='/tmp/Functional_quick/daos.log' -x CRT_PHY_ADDR_STR='ofi+sockets' -x D_LOG_FILE='/tmp/Functional_quick/daos.log' -x ABT_MAX_NUM_XSTREAMS='64' -x CRT_CTX_SHARE_ADDR='1' /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daos_server -g daos_server -c 1 -a /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/tmp/\n00:48:18 DEBUG| <SERVER>: Starting Servers\n00:48:18 DEBUG| 2018/11/13 00:48:01 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:00 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org
> _daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:03 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:47:59 DAOS config read from /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/daos_server.yml\n00:48:18 DEBUG| 2018/11/13 00:48:03 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:03 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:48:00 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:00 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:48:01 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_serv
> er.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:48:01 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| 2018/11/13 00:47:59 Active config saved to /var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/etc/.daos_server.active.yml (read-only)\n00:48:18 DEBUG| 2018/11/13 00:47:59 Management interface listening on: 0.0.0.0:10000\n00:48:18 DEBUG| DAOS server (v0.0.2) process 17338 started on rank 1 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 25654 started on rank 2 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 5135 started on rank 3 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| DAOS server (v0.0.2) process 8903 started on rank 0 (out of 4) with 1 xstream(s)\n00:48:18 DEBUG| \n00:48:18 DEBUG| <SERVER> server started and took 3.10468387604 seconds to start\n00:48:23 DEBUG| PARAMS (key=mode, path=/run/createtests/createmode/*/, default=None) => 146\n00:48:23 DEBUG| PARAMS (key=setname, path=/
> run/createtests/createset/, default=None) => 'daos_server'\n00:48:23 DEBUG| PARAMS (key=size, path=/run/createtests/createsize/, default=None) => 1073741\n00:48:23 DEBUG| PARAMS (key=svc, path=/run/createtests/createsvc/*/, default=None) => [3, 'PASS']\n00:48:31 INFO | Running '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader --uuid=7E461A34-F049-4925-847A-5DA1D9D35530'\n00:48:32 INFO | Command '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/install/bin/daosctl kill-leader --uuid=7E461A34-F049-4925-847A-5DA1D9D35530' finished with 0 after 0.216053962708s\n00:48:32 DEBUG| [stdout] SUCCESS killing pool service leader\n", 'running': False, 'logfile': '/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/avocado/job-results/job-2018-11-13T00.47-df25f8c/test-results/3-_var_lib_jenkins_workspace_daos-stack-org_daos_PR-13-centos7_src_tests_ftest_pool_PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-creates
> ize-svc3-test_machine1-server-aa67/debug.log', 'fail_reason': 'Test died without reporting the status.', 'name': '3-/var/lib/jenkins/workspace/daos-stack-org_daos_PR-13-centos7/src/tests/ftest/pool/PoolSvc.py:PoolSvc.test_poolsvc;mode_RW-createset-createsize-svc3-test_machine1-server-aa67', 'traceback': 'Traceback not available', 'fail_class': 'type', 'timeout': 1800, 'paused_msg': ''}
> ```
>
I don't see a code path in which test processes don't have the sigterm
handler installed, but there are different code paths regarding the
ending of the test process. On the first example I can see:
"ERROR Test reported status but did not finish"
While on the second I can see:
"ERROR Test died without reporting the status"
> Without any of the timeout stack trace nor any evidence that the test's
> tearDown() was run. Shouldn't I see the sigterm_handler stack trace at
> minimum?
>
On this second example, the process is supposed to have died already, so
it makes no sense to send it a SIGTERM.
With regards to tearDown(), it's not something that gets run on test the
process signal handler (as a cleanup stage), but it's part of the
general test execution flow. Avocado attempts to make sure tearDown()
is executed in spite of error conditions on the test() methods, but not
to the point of intercepting SIGTERM and running tearDown() code at that
time.
Is this what you're trying to achieve here?
Regards,
- Cleber.
> Cheers,
> b.
>
>
>
--
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]
[ 7ABB 96EB 8B46 B94D 5E0F E9BB 657E 8D33 A5F2 09F3 ]
More information about the Avocado-devel
mailing list