Stopped detach/attach status

Oleg Nesterov oleg at redhat.com
Mon Oct 5 02:32:08 UTC 2009


(add Roland)

On 10/01, Jan Kratochvil wrote:
>
> the ptrace-testsuite
> 	http://sourceware.org/systemtap/wiki/utrace/tests
>
> currently FAILs (also) on Fedora 12 kernel-2.6.31.1-48.fc12.x86_64 for:
> 	FAIL: detach-stopped
> 	FAIL: stopped-attach-transparency
>
> Do you agree with the testcases and is it planned to fix them for F12?

I do not know. I'd leave this to Roland. I mean, if he thinks this
should be fixed - I'll try to fix.

But. This all looks unfixeable to me. In my opinion, the kernel is
obviously wrong, and test-case are wrong too. And any fix in this
area is user-visible and can break the current expectations.

As for kernel, I lost any hope to understand what is the _supposed_
behaviour.

As for user-space, I don't really understand the second test-case,
this again means I don't understand the supposed behaviour.


Firstly, I think we should un-revert edaba2c5334492f82d39ec35637c6dea5176a977.
This unconditional wakeup is hopelessly wrong imho, and it is removed
from utrace-ptrace code. But this breaks another test-case,
attach-wait-on-stopped. I still think this test-case is wrong.
We had a lengthy discussion about this.

Now, this patch

	--- TTT_32/kernel/signal.c~PT_STOP	2009-10-04 04:08:36.000000000 +0200
	+++ TTT_32/kernel/signal.c	2009-10-05 03:17:39.000000000 +0200
	@@ -1708,7 +1708,7 @@ static int do_signal_stop(int signr)
		 */
		if (sig->group_stop_count) {
			if (!--sig->group_stop_count)
	-			sig->flags = SIGNAL_STOP_STOPPED;
	+			sig->flags = SIGNAL_STOP_STOPPED | SIGNAL_STOP_DEQUEUED;
			current->exit_code = sig->group_exit_code;
			__set_current_state(TASK_STOPPED);
		}

fixes the tests above. Of course this change is not enough, I did
it just to verify I really understand what happens.

Except, stopped-attach-transparency prints

	Excessive waiting SIGSTOP after the second attach/detach

afaics the test-case is not right here. attach_detach() leaves the
traced threads in STOPPED state, why pid_notifying_sigstop() should
fail?


But as I said, I do not really understand what this test-case tries
to do. What ptrace(PTRACE_DETACH, SIGSTOP) should mean? I think that
ptrace(PTRACE_DETACH, signr) should mean the tracee should proceed
with this signal, as if it was sent by, say, kill.

In this case, I don't understand why stopped-attach-transparency
"sends" SIGSTOP to every sub-thread. If the tracer wants to stop
the thread group after detach, it can do

	ptrace(PTRACE_DETACH, anythread, SIGSTOP);
	for_each_other_thread(pid)
		ptrace(PTRACE_DETACH, anythread, 0);

or just

	kill(SIGSTOP);
	for_each_thread(pid)
		ptrace(PTRACE_DETACH, anythread, 0);

I do not say this will really work with the current implementaion,
we have other bugs/races. I mean I'd expect this should be the right
way to do detach+stop.


And. Currently PTRACE_CONT/PTRACE_DETACH/etc wakes up the tracee even
if the thread group is stopped. This is obviously not right, but
utrace-ptrace does the same. I guess we can't fix this without breaking
existing applications.


In short: I don't know what to do ;)

Oleg.




More information about the utrace-devel mailing list