utrace bug

Wed Oct 17 06:45:13 UTC 2007

Hi
 I was executing Alexey's testcase cited in  
 (http://marc.info/?l=linux-kernel&m=117128445312243&w=2) to test the utrace and
 system crashed on pressing ctr+c.

 Environment: 2.6.23-rc7, ppc64.

6:mon> e
cpu 0x6: Vector: 300 (Data Access) at [c00000002510b650]
    pc: c00000000038b0f8: ._spin_lock+0x20/0x88
    lr: c0000000000b1e78: .get_utrace_lock_attached+0x50/0xc0
    sp: c00000002510b8d0
   msr: 8000000000009032
   dar: 7f9d0000419e0058
 dsisr: 40000000
  current = 0xc000000035f800b0
  paca    = 0xc00000000058d900
    pid   = 23108, comm = a.out

 On further analysis, I could make these observations
 1)When a process dies, it tries to go through all tracees list and detachs the engine.
   As in ptrace_exit()

                list_for_each_safe_rcu(pos, n, &tsk->ptracees) {
                        state = list_entry(pos, struct ptrace_state, entry);
                        error = utrace_detach(state->task, state->engine);

6:mon> t
[c00000002510b950] c0000000000b1e78 .get_utrace_lock_attached+0x50/0xc0
[c00000002510b9e0] c0000000000b331c .utrace_detach+0x30/0x148
[c00000002510ba80] c0000000000b778c .ptrace_exit+0xa0/0x1c8
[c00000002510bb20] c000000000071848 .do_exit+0x188/0xa54
[c00000002510bbc0] c0000000000721e8 .sys_exit_group+0x0/0x8
[c00000002510bc50] c00000000007d6f8 .get_signal_to_deliver+0x480/0x4f4
[c00000002510bd00] c0000000000126d4 .do_signal+0x68/0x32c
[c00000002510be30] c000000000008af0 do_work+0x28/0x2c
--- Exception: c00 (System Call) at 000000000ff18d2c
SP (ffe0f030) is in userspace

 2) But when process tries to access "state->task", it looks like state->task
 has been released and all fields in it has invalid values.

6:mon> r
R00 = 0000000080000006   R16 = 0000000000000000
R01 = c00000002510b8d0   R17 = 0000000000000000
R02 = c00000000067a808   R18 = 0000000000000000
R03 = 7f9d0000419e0058   R19 = 0000000000000000
R04 = c00000007f86b740   R20 = 0000000000000000
R05 = 8000000000c24000   R21 = 0000000000000000
R06 = 8000000000000000   R22 = 0000000000000000
R07 = 000000007fffffff   R23 = 0000000000000000
R08 = c000000008133408   R24 = c000000035f807b8
R09 = c00000009f14aa30   R25 = c00000002510bea0
R10 = c000000000574e84   R26 = c00000002510bd90
R11 = fffffffffffffffd   R27 = ffffffffffffffff
R12 = 4000000000000000   R28 = c00000007f86b740
R13 = c00000000058d900   R29 = c0000000279f00b0
R14 = 0000000000000000   R30 = c000000000616430
R15 = 0000000000000000   R31 = 7f9d0000419e0058
pc  = c00000000038b0f8 ._spin_lock+0x20/0x88
lr  = c0000000000b1e78 .get_utrace_lock_attached+0x50/0xc0
msr = 8000000000009032   cr  = 22000448
ctr = 800000000014dcd0   xer = 0000000000000000   trap =  300
dar = 7f9d0000419e0058   dsisr = 40000000

task->utrace r29+1904
6:mon> d c0000000279f0820
c0000000279f0820 7f9d0000419e0038 7c0018287c005800  |....A..8|..(|.X.|
c0000000279f0830 4082000c7d20192d 40c2fff04c00012c  |@...} .- at ...L..,|
c0000000279f0840 2f80000040de0044 813f004893a90008  |/... at ..D.?.H....|

 3) Reason for this error could be, While parent process(Reader process) was 
 going through the  rcu tracess list, some writer process(Another thread from the
 same group  through ptrace_detach()) goes deletes it from tracees rcu list 
 (state->entry). So parent process(Reader) holding the reference to old rcu list,
 access the stacte->task(which is deleted) and system crashes.

 4) Since we need both reader and writer running parallely to recreate this
 issue, Its very rare to reproduce this bug.

 This leads me to suspect a possible issue with the usage of RCU in utrace. 

 Please let me know your comments.