systemtap PR10171, itrace crashes

Frank Ch. Eigler fche at redhat.com
Sat Nov 21 19:47:47 UTC 2009


Hi -

itrace is a utrace engine included in the systemtap runtime that
activates UTRACE_SINGLESTEP / BLOCKSTEP on targeted userspace
processes.  It seems to work OK on ppc64, but crashes lots on recent
fedora (inc. rawhide) level utrace on x86/x86-64.  (Lately I've been
unable to boot a utrace/master branch version of the kernel to try it
against.)

The systemtap code is at git://sourceware.org/git/systemtap.git,
master branch for the near-original code, pr10653-itrace-crashes
branch for a partial rewrite.  The key file is runtime/itrace.c.
tapsets-itrace.cxx generates code that calls into this for new or
dying threads.  The crash is now intermittent (an "improvement"
over a near-100% crash on rawhide x86 machines), and is triggered
with a systemtap testsuite run thusly:

% configure ... ; make; make install   # build systemtap
% sudo make installcheck RUNTESTFLAGS=itrace.exp 

While hacking up my near-rewritten form of runtime/itrace.c, I came
across several cases where utrace made calls on our engine seemingly
long after a UTRACE_DETACH was to take effect.  This included cases
where a signal or other reporting callback had already returned
UTRACE_DETACH, or after a debugging task seemingly completed a
utrace_control(DETACH).

So I added hack after hack, using engine->data as an indication that
someone else has already requested a DETACH, and to keep echoing that
back to utrace until it "takes".  (Maybe it's a bug in x86
single-stepping logic that didn't appear on ppc.  Dunno.  We've also
had to disable itrace on ia64 for suspicious crashes that may also be
the same problem.  sourceware.org/PR10171 and /PR10653.)

Could you guys take a look at old/new runtime/itrace.c, to see what
the code's trying to do with utrace?  Do you have a
utrace/master-level kernel handy to try it on?  I don't understand why
any of these hacks are supposed to be necessary.


A typical kernel crash log follows (3-cpu virtual machine).  Sometimes
this triggers somewhat *after* the systemtap session has completed, up
to and including unloading of the runtime/itrace.c-containing module.

[  634.676957] stap_198c84f334a07f5b8157be1fd1be6bec_1461: systemtap: 1.0/0.143, base: fa97b000, memory: 16305+20172+1344+13600 data+text+ctx+net, probes: 5
[  634.926718] 
[  634.926719] =============================================
[  634.927113] [ INFO: possible recursive locking detected ]
[  634.927113] 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  634.927113] ---------------------------------------------
[  634.927113] ls_5066/5705 is trying to acquire lock:
[  634.927113]  (&sighand->siglock){......}, at: [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[  634.927113] 
[  634.927113] but task is already holding lock:
[  634.927113]  (&sighand->siglock){......}, at: [<c048407d>] utrace_get_signal+0x50b/0x53d
[  634.927113] 
[  634.927113] other info that might help us debug this:
[  634.927113] 1 lock held by ls_5066/5705:
[  634.927113]  #0:  (&sighand->siglock){......}, at: [<c048407d>] utrace_get_signal+0x50b/0x53d
[  634.927113] 
[  634.927113] stack backtrace:
[  634.927113] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  634.927113] Call Trace:
[  634.927113]  [<c07b3339>] ? printk+0x14/0x1b
[  634.927113]  [<c04690df>] __lock_acquire+0x833/0xb6f
[  634.927113]  [<c048407d>] ? utrace_get_signal+0x50b/0x53d
[  634.927113]  [<c04694ae>] lock_acquire+0x93/0xb1
[  634.927113]  [<c044e6e0>] ? get_signal_to_deliver+0x5c/0x344
[  634.927113]  [<c07b578b>] _spin_lock_irq+0x2e/0x5e
[  634.927113]  [<c044e6e0>] ? get_signal_to_deliver+0x5c/0x344
[  634.927113]  [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[  634.927113]  [<c04028dc>] do_signal+0x5f/0x703
[  634.927113]  [<c044e0b3>] ? force_sig_info+0xa2/0xac
[  634.927113]  [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[  634.927113]  [<c040acee>] ? send_sigtrap+0x6d/0x74
[  634.927113]  [<c0402fa7>] do_notify_resume+0x27/0x77
[  634.927113]  [<c0403718>] work_notifysig+0x13/0x1b
[  639.001030] BUG: spinlock lockup on CPU#3, ls_5066/5705, ec094704 (Not tainted)
[  818.430111] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  818.433097] Call Trace:
[  818.435096]  [<c07b3339>] ? printk+0x14/0x1b
[  818.437097]  [<c05d451a>] _raw_spin_lock+0x102/0x128
[  818.439096]  [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[  818.441097]  [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[  818.541097]  [<c04028dc>] do_signal+0x5f/0x703
[  818.543096]  [<c044e0b3>] ? force_sig_info+0xa2/0xac
[  818.545097]  [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[  818.547096]  [<c040acee>] ? send_sigtrap+0x6d/0x74
[  818.549097]  [<c0402fa7>] do_notify_resume+0x27/0x77
[  818.551312]  [<c0403718>] work_notifysig+0x13/0x1b
[  818.553097] sending NMI to all CPUs:
[  818.555096] NMI backtrace for cpu 3
[  818.555096] 
[  818.555096] Pid: 5705, comm: ls_5066 Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) 
[  818.555096] EIP: 0060:[<c0418cb1>] EFLAGS: 00210046 CPU: 3
[  818.555096] EIP is at __default_send_IPI_dest_field+0x5f/0x64
[  818.555096] EAX: 00000c00 EBX: 0f000000 ECX: 00000800 EDX: fffff000
[  818.555096] ESI: 00000002 EDI: 00000c00 EBP: eb601e08 ESP: eb601dfc
[  818.555096]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  818.555096] CR0: 8005003b CR2: 00ac3eac CR3: 2a44d000 CR4: 000006d0
[  818.555096] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  818.555096] DR6: ffff4ff0 DR7: 00000400
[  818.555096] Call Trace:
[  818.555096]  [<c0418d36>] default_send_IPI_mask_logical+0x80/0xac
[  818.555096]  [<c0418ad3>] default_send_IPI_all+0x27/0x68
[  818.555096]  [<c0419050>] arch_trigger_all_cpu_backtrace+0x32/0x54
[  818.555096]  [<c05d451f>] _raw_spin_lock+0x107/0x128
[  818.555096]  [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[  818.555096]  [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[  818.555096]  [<c04028dc>] do_signal+0x5f/0x703
[  818.555096]  [<c044e0b3>] ? force_sig_info+0xa2/0xac
[  818.555096]  [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[  818.555096]  [<c040acee>] ? send_sigtrap+0x6d/0x74
[  818.555096]  [<c0402fa7>] do_notify_resume+0x27/0x77
[  818.555096]  [<c0403718>] work_notifysig+0x13/0x1b
[  818.555096] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  818.555096] Call Trace:
[  818.555096]  [<c0402656>] ? show_regs+0x24/0x2a
[  818.555096]  [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171
[  818.555096]  [<c07b619f>] do_nmi+0xa8/0x289
[  818.555096]  [<c07b5d5c>] nmi_stack_correct+0x2f/0x34
[  818.555096]  [<c041007b>] ? do_machine_check+0x683/0x6ec
[  818.555096]  [<c04100e0>] ? do_machine_check+0x6e8/0x6ec
[  818.555096]  [<c0418cb1>] ? __default_send_IPI_dest_field+0x5f/0x64
[  818.555096]  [<c0418d36>] default_send_IPI_mask_logical+0x80/0xac
[  818.555096]  [<c0418ad3>] default_send_IPI_all+0x27/0x68
[  818.555096]  [<c0419050>] arch_trigger_all_cpu_backtrace+0x32/0x54
[  818.555096]  [<c05d451f>] _raw_spin_lock+0x107/0x128
[  818.555096]  [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[  818.555096]  [<c044e6e0>] get_signal_to_deliver+0x5c/0x344
[  818.555096]  [<c04028dc>] do_signal+0x5f/0x703
[  818.555096]  [<c044e0b3>] ? force_sig_info+0xa2/0xac
[  818.555096]  [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c
[  818.555096]  [<c040acee>] ? send_sigtrap+0x6d/0x74
[  818.555096]  [<c0402fa7>] do_notify_resume+0x27/0x77
[  818.555096]  [<c0403718>] work_notifysig+0x13/0x1b
[  818.556078] NMI backtrace for cpu 2
[  818.556078] 
[  818.556078] Pid: 0, comm: swapper Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) 
[  818.556078] EIP: 0060:[<c04201dc>] EFLAGS: 00000246 CPU: 2
[  818.556078] EIP is at native_safe_halt+0xa/0xc
[  818.556078] EAX: c0409645 EBX: c0a3301c ECX: f6d02ba0 EDX: 00000000
[  818.556078] ESI: 00000002 EDI: 00000000 EBP: f6cfff78 ESP: f6cfff78
[  818.556078]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  818.556078] CR0: 8005003b CR2: b76be000 CR3: 27f2e000 CR4: 000006d0
[  818.556078] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  818.556078] DR6: ffff0ff0 DR7: 00000400
[  818.556078] Call Trace:
[  818.556078]  [<c040964a>] default_idle+0x44/0x6e
[  818.556078]  [<c04026f7>] cpu_idle+0x9b/0xb4
[  818.556078]  [<c07afcd0>] start_secondary+0x204/0x242
[  818.556078] Pid: 0, comm: swapper Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  818.556078] Call Trace:
[  818.556078]  [<c0402656>] ? show_regs+0x24/0x2a
[  818.556078]  [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171
[  818.556078]  [<c07b619f>] do_nmi+0xa8/0x289
[  818.556078]  [<c07b5d5c>] nmi_stack_correct+0x2f/0x34
[  818.556078]  [<c0409645>] ? default_idle+0x3f/0x6e
[  818.556078]  [<c0409645>] ? default_idle+0x3f/0x6e
[  818.556078]  [<c04201dc>] ? native_safe_halt+0xa/0xc
[  818.556078]  [<c040964a>] default_idle+0x44/0x6e
[  818.556078]  [<c04026f7>] cpu_idle+0x9b/0xb4
[  818.556078]  [<c07afcd0>] start_secondary+0x204/0x242
[  634.927053] NMI backtrace for cpu 1
[  634.927053] 
[  634.927053] Pid: 5698, comm: stapio Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) 
[  634.927053] EIP: 0060:[<c05d0ad5>] EFLAGS: 00200096 CPU: 1
[  634.927053] EIP is at delay_tsc+0x1f/0x66
[  634.927053] EAX: b833d718 EBX: 00000001 ECX: 00000001 EDX: 00033e0e
[  634.927053] ESI: 00000001 EDI: 00000000 EBP: ea53fe6c ESP: ea53fe5c
[  634.927053]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  634.927053] CR0: 8005003b CR2: 08f862b8 CR3: 27f2e000 CR4: 000006d0
[  634.927053] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  634.927053] DR6: ffff0ff0 DR7: 00000400
[  634.927053] Call Trace:
[  634.927053]  [<c05d0a58>] __delay+0xe/0x10
[  634.927053]  [<c05d44d3>] _raw_spin_lock+0xbb/0x128
[  634.927053]  [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[  634.927053]  [<c0484239>] utrace_do_stop+0x1e/0x75
[  634.927053]  [<c0484359>] utrace_control+0xc9/0x1b8
[  634.927053]  [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<c04e667e>] ? rcu_read_unlock+0x0/0x1e
[  634.927053]  [<c059b0ed>] ? selinux_file_permission+0x4f/0x96
[  634.927053]  [<c059242f>] ? security_file_permission+0x14/0x16
[  634.927053]  [<c04e56a5>] ? rw_verify_area+0x9d/0xc0
[  634.927053]  [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<c04e5a74>] vfs_write+0x85/0xe4
[  634.927053]  [<c04e5b71>] sys_write+0x40/0x62
[  634.927053]  [<c0403610>] syscall_call+0x7/0xb
[  634.927053] Pid: 5698, comm: stapio Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  634.927053] EAX: b833d718 EBX: 00000001 ECX: 00000001 EDX: 00033e0e
[  634.927053] ESI: 00000001 EDI: 00000000 EBP: ea53fe6c ESP: ea53fe5c
[  634.927053]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  634.927053] CR0: 8005003b CR2: 08f862b8 CR3: 27f2e000 CR4: 000006d0
[  634.927053] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  634.927053] DR6: ffff0ff0 DR7: 00000400
[  634.927053] Call Trace:
[  634.927053]  [<c05d0a58>] __delay+0xe/0x10
[  634.927053]  [<c05d44d3>] _raw_spin_lock+0xbb/0x128
[  634.927053]  [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[  634.927053]  [<c0484239>] utrace_do_stop+0x1e/0x75
[  634.927053]  [<c0484359>] utrace_control+0xc9/0x1b8
[  634.927053]  [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<c04e667e>] ? rcu_read_unlock+0x0/0x1e
[  634.927053]  [<c059b0ed>] ? selinux_file_permission+0x4f/0x96
[  634.927053]  [<c059242f>] ? security_file_permission+0x14/0x16
[  634.927053]  [<c04e56a5>] ? rw_verify_area+0x9d/0xc0
[  634.927053]  [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<c04e5a74>] vfs_write+0x85/0xe4
[  634.927053]  [<c04e5b71>] sys_write+0x40/0x62
[  634.927053]  [<c0403610>] syscall_call+0x7/0xb
[  634.927053] Pid: 5698, comm: stapio Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1
[  634.927053] Call Trace:
[  634.927053]  [<c0402656>] ? show_regs+0x24/0x2a
[  634.927053]  [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171
[  634.927053]  [<c07b619f>] do_nmi+0xa8/0x289
[  634.927053]  [<c07b5d5c>] nmi_stack_correct+0x2f/0x34
[  634.927053]  [<c046007b>] ? ktime_get_ts+0x33/0xa2
[  634.927053]  [<c05d0ad5>] ? delay_tsc+0x1f/0x66
[  634.927053]  [<c05d0a58>] __delay+0xe/0x10
[  634.927053]  [<c05d44d3>] _raw_spin_lock+0xbb/0x128
[  634.927053]  [<c07b57aa>] _spin_lock_irq+0x4d/0x5e
[  634.927053]  [<c0484239>] utrace_do_stop+0x1e/0x75
[  634.927053]  [<c0484359>] utrace_control+0xc9/0x1b8
[  634.927053]  [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<c04e667e>] ? rcu_read_unlock+0x0/0x1e
[  634.927053]  [<c059b0ed>] ? selinux_file_permission+0x4f/0x96
[  634.927053]  [<c059242f>] ? security_file_permission+0x14/0x16
[  634.927053]  [<c04e56a5>] ? rw_verify_area+0x9d/0xc0
[  634.927053]  [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461]
[  634.927053]  [<c04e5a74>] vfs_write+0x85/0xe4
[  634.927053]  [<c04e5b71>] sys_write+0x40/0x62
[  634.927053]  [<c0403610>] syscall_call+0x7/0xb
[  819.261347] NMI backtrace for cpu 0
[  819.261347] 
[  819.261347] Pid: 0, comm: swapper Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) 
[  819.261347] EIP: 0060:[<c04201dc>] EFLAGS: 00000246 CPU: 0
[  819.261347] EIP is at native_safe_hal




More information about the utrace-devel mailing list