Stopped detach/attach status
Oleg Nesterov
oleg at redhat.com
Mon Oct 5 02:32:08 UTC 2009
(add Roland)
On 10/01, Jan Kratochvil wrote:
>
> the ptrace-testsuite
> http://sourceware.org/systemtap/wiki/utrace/tests
>
> currently FAILs (also) on Fedora 12 kernel-2.6.31.1-48.fc12.x86_64 for:
> FAIL: detach-stopped
> FAIL: stopped-attach-transparency
>
> Do you agree with the testcases and is it planned to fix them for F12?
I do not know. I'd leave this to Roland. I mean, if he thinks this
should be fixed - I'll try to fix.
But. This all looks unfixeable to me. In my opinion, the kernel is
obviously wrong, and test-case are wrong too. And any fix in this
area is user-visible and can break the current expectations.
As for kernel, I lost any hope to understand what is the _supposed_
behaviour.
As for user-space, I don't really understand the second test-case,
this again means I don't understand the supposed behaviour.
Firstly, I think we should un-revert edaba2c5334492f82d39ec35637c6dea5176a977.
This unconditional wakeup is hopelessly wrong imho, and it is removed
from utrace-ptrace code. But this breaks another test-case,
attach-wait-on-stopped. I still think this test-case is wrong.
We had a lengthy discussion about this.
Now, this patch
--- TTT_32/kernel/signal.c~PT_STOP 2009-10-04 04:08:36.000000000 +0200
+++ TTT_32/kernel/signal.c 2009-10-05 03:17:39.000000000 +0200
@@ -1708,7 +1708,7 @@ static int do_signal_stop(int signr)
*/
if (sig->group_stop_count) {
if (!--sig->group_stop_count)
- sig->flags = SIGNAL_STOP_STOPPED;
+ sig->flags = SIGNAL_STOP_STOPPED | SIGNAL_STOP_DEQUEUED;
current->exit_code = sig->group_exit_code;
__set_current_state(TASK_STOPPED);
}
fixes the tests above. Of course this change is not enough, I did
it just to verify I really understand what happens.
Except, stopped-attach-transparency prints
Excessive waiting SIGSTOP after the second attach/detach
afaics the test-case is not right here. attach_detach() leaves the
traced threads in STOPPED state, why pid_notifying_sigstop() should
fail?
But as I said, I do not really understand what this test-case tries
to do. What ptrace(PTRACE_DETACH, SIGSTOP) should mean? I think that
ptrace(PTRACE_DETACH, signr) should mean the tracee should proceed
with this signal, as if it was sent by, say, kill.
In this case, I don't understand why stopped-attach-transparency
"sends" SIGSTOP to every sub-thread. If the tracer wants to stop
the thread group after detach, it can do
ptrace(PTRACE_DETACH, anythread, SIGSTOP);
for_each_other_thread(pid)
ptrace(PTRACE_DETACH, anythread, 0);
or just
kill(SIGSTOP);
for_each_thread(pid)
ptrace(PTRACE_DETACH, anythread, 0);
I do not say this will really work with the current implementaion,
we have other bugs/races. I mean I'd expect this should be the right
way to do detach+stop.
And. Currently PTRACE_CONT/PTRACE_DETACH/etc wakes up the tracee even
if the thread group is stopped. This is obviously not right, but
utrace-ptrace does the same. I guess we can't fix this without breaking
existing applications.
In short: I don't know what to do ;)
Oleg.
More information about the utrace-devel
mailing list