From aeo at utx.com Sat Jan 2 21:54:00 2010 From: aeo at utx.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sat, 02 Jan 2010 21:54:00 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVsxvPStcTasr/J87zGvLC358/V?= Message-ID: <201001022153.o02LrCfn026221@mx1.redhat.com> utrace-devel???????????????? ?????2010?1?8-9? ----? ? ????: ?????????;???????????????????;???????? ??????????????;??????????????????????????? ??????????????? ????: 2500?/????????????????????? ?????500??????????????????????? ?????020-80560638?020-85917945 ??????????????chinammc21 at 126.comrom kmnpe-h3 at hotmail.com Sun Jan 3 14:28:31 2010 From: kmnpe-h3 at hotmail.com (Alex.Cang) Date: Sun, 3 Jan 2010 22:28:31 +0800 Subject: =?GB2312?B?sta0osXky8253MDt1+7Qwrei1bnH98rG?= Message-ID: <201001031424.o03EOfi7001070@mx1.redhat.com> ============================================================================= ????????????? ============================================================================= ?????? 2010?01?21-22? ?? ?????????400-8899,628 ???????021-5109,9475 ???????020-3397,2216 / 3452,0981 ? ??hrlawclub @126.coml ????????????????? l ????????????????? ??????????JIT?VMI 1????JIT????? 2?????????VMI????? ?????????????VMI?????? 3????JIT???????? ????????????????????????? 4? ?????Milk-Runhrlawclub @126.com ============================================================================= From vendas at mardeb.com.br Mon Jan 4 13:23:06 2010 From: vendas at mardeb.com.br (Piero Jimenez) Date: Mon, 4 Jan 2010 13:23:06 GMT Subject: TINTAS INDUSTRIAIS LACA E ESMALTE SINTETICO Message-ID: <20100104131822.E9DC235410D@mhu16mtaz-tvt-spo.fly.com.br> An HTML attachment was scrubbed... URL: From reto at uycj.com Mon Jan 4 15:18:27 2010 From: reto at uycj.com (=?GB2312?B?xeDRtQ==?=) Date: Mon, 4 Jan 2010 23:18:27 +0800 Subject: =?GB2312?B?Qzc6dXRyYWNlLWRldmVstNO8vMr119/P8rncwO0=?= Message-ID: <201001041518.o04FIERC002607@mx1.redhat.com> utrace-devel??????? ?????2010?1?25?26? ?? ?????2010?1?28?29? ?? ?????3200????????1600?????????/???????????????????? ???????CEO/?????????/???????/???????????/??????????? ????????PMO???????????????????????? ????????????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc21 at 126.comction Plan???? ????????????????? --------------------------------------------------------------------------------------------------- ???? ???????0.5? 1)???????????? ????????????????????1.5? 1)?????????????????? 2)?????????????? 3)?????????????? 4)?????????????????????????????????????????????????????????????????????????????? 5)???????????? 6)??????? 7)?????????????? 8??????????????????????? 9???????? ????????????????3.5? 1)???????? 2)????? 3)????????? 4)????????? 5)????????? 6)????????? 7)????????? ???????????????1.5? 1)????????????????????????? 2)????????????????????????????? 3)????????? 4)????????????? 5)??????? 6)????????????? 7)?????????????????????? 8)???????? 9)??????????? 10)?????????????? 11)??????????? 12)?????????? 13)????????????? 14)???????????????? 15)?????????????????????????????????????????? 16)????????? 17)???????????????????? 18)????????? 19)?????????????????????? ???????????????????????????1.0? 1)???????? 2)???????????? 3)????????????????????????? 4)????????????????? 5)???????????? 6)?????????SMART??????????????PBC?? 7)?????????????SMART 8)?????????SMART???????????SMART 9)???????PDCA?? 10)????????????????????????????????? 11)?????????? 12)??????????? 13)PERT??????GANNT 14)???????????PERT? 15)?????????????????????????????? 16)???????????? 17)????????? 18)???????????? 19)??????????????????? ?????????????????????????????2.0? 1)???????????? 2)??????????? 3)???????????? 4)???????????? 5)????????????????? 6)????????? 7)???????? 8)???????? 9)???????/???? 10)??????????? 11)????????????? 12)???????????? 13)????????????????????????? 14)????????????????????????????? 15)???????????????????????????????????????????????????????????????????????? 16)??????????????????????????????? 17)???? 30 ???????????????????????????????????????????????????????????? 18)?????????????????????????????????? 19)????????????????????????????????? 20)????????????????????????????? 21)??????????????????????? 22)??????????????????? ???????????????????????????1.5? 1)??????????? 2)?????????????? 3)????????? 4)?????????????????????? 5)???????????????????????? 6)?????????????????????? 7)??????????????????????????? 8)???????????????????????? 9)?????????????????????????? 10)?????????????????????? 11)????????????????????????? 12)??????????????????????PCB? 13)????????????????? 14)?????????????? 15)???????????? 16)??????????????????? 17)?????????? 18)??????????????????? 19)???????????????? 20)???????????? 21)??????????????????? 22)????????????????????????? 23)??????? ???????????????????????????2.0? 1)?????????? 2)???????????? 3)?????????????????????? 4)?????????????????? 5)???????? 6)???????????????? 7)???????????????? 8)???????????????????????? 9)????????????????? 10)???????????????????? 11)????? ???????????????????0.5? 1)????????? 2)??????? 3)????????????????? 4)??????????? 5)????????????????????? -------------------------------------------------------------------------------------------------------- ???? Gilesrom oleg at redhat.com Mon Jan 4 15:52:25 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Mon, 4 Jan 2010 16:52:25 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <20100104155225.GA16650@redhat.com> Hi! We have some strange problems with utrace on s390, and so far this _looks_ like a s390 problem. Looks like, on any CPU user_enable_single_step() does not "work" until at least one thread with per_info.single_step = 1 does the context switch. This doesn't matter with the old ptrace implementation, but with utrace the tracee itself does user_enable_single_step(current) and returns to user-mode. Until it does at least one context switch the single-stepping doesn't work, after that everything works fine till the next reboot. To rule out the possible problems with ptrace or utrace, I did the trivial patch: --- K/kernel/sys.c~ 2009-12-29 10:45:25.787198223 -0500 +++ K/kernel/sys.c 2010-01-03 13:04:00.485591316 -0500 @@ -1444,6 +1444,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsi error = 0; switch (option) { + case 666: + user_enable_single_step(current); + break; + + case 777: + /* same as 666, but force the context switch + * after user_enable_single_step() */ + user_enable_single_step(current); + schedule_timeout_interruptible(HZ/10); + break; + case PR_SET_PDEATHSIG: if (!valid_signal(arg2)) { error = -EINVAL; --- K/arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 +++ K/arch/s390/kernel/traps.c 2009-12-30 10:31:12.985266686 -0500 @@ -378,11 +378,14 @@ static inline void __user *get_check_add void __kprobes do_single_step(struct pt_regs *regs) { + printk("SS enter\n"); + if (notify_die(DIE_SSTEP, "sstep", regs, 0, 0, SIGTRAP) == NOTIFY_STOP){ + printk(KERN_INFO "SS cancelled ???\n"); return; } - if (tracehook_consider_fatal_signal(current, SIGTRAP)) +// if (tracehook_consider_fatal_signal(current, SIGTRAP)) force_sig(SIGTRAP, current); } ------------------------------------------------------------------------------- The change in do_single_step() just removes "is it traced" check and adds a couple of printk's. With this patch I assume that the task which does prctl(666) should be killed by SIGTRAP, but this doesn't happen: # taskset -c 0 perl -le 'syscall 172,666 and die $!' # taskset -c 0 perl -le 'syscall 172,666 and die $!' # taskset -c 0 perl -le 'syscall 172,666 and die $!' (syscall 172,666 == prctl(666)) the task exits normally, there is nothing in dmesg. However, # taskset -c 0 perl -le 'syscall 172,777 and die $!' Trace/breakpoint trap Now prctl(777)->user_enable_single_step() does work, the task is killed by do_single_step()->force_sig(SIGTRAP). Now prctl(666) works too on CPU 0 # taskset -c 0 perl -le 'syscall 172,666 and die $!' Trace/breakpoint trap # taskset -c 0 perl -le 'syscall 172,666 and die $!' Trace/breakpoint trap # taskset -c 0 perl -le 'syscall 172,666 and die $!' Trace/breakpoint trap And please note "# taskset -c 0", we can repeat the same on another CPU: # taskset -c 1 perl -le 'syscall 172,666 and die $!' # taskset -c 1 perl -le 'syscall 172,666 and die $!' doesn't work, but # taskset -c 1 perl -le 'syscall 172,777 and die $!' Trace/breakpoint trap magically "fixes" user_enable_single_step(), now we can use prctl(666) on CPU 1. The kernel is 2.6.32.2 plus ca633fd006486ed2c2d3b542283067aab61e6dc8, could you help? Oleg. From schwidefsky at de.ibm.com Mon Jan 4 16:16:26 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Mon, 4 Jan 2010 17:16:26 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100104155225.GA16650@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> Message-ID: <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> On Mon, 4 Jan 2010 16:52:25 +0100 Oleg Nesterov wrote: > Hi! > > We have some strange problems with utrace on s390, and so far this _looks_ > like a s390 problem. > > Looks like, on any CPU user_enable_single_step() does not "work" until at > least one thread with per_info.single_step = 1 does the context switch. > > This doesn't matter with the old ptrace implementation, but with utrace > the tracee itself does user_enable_single_step(current) and returns to > user-mode. Until it does at least one context switch the single-stepping > doesn't work, after that everything works fine till the next reboot. The PER control registers only get reloaded on task switch. Can you test if this patch fixes your problem? -- Subject: [PATCH] fix loading of PER control registers for utrace. From: Martin Schwidefsky If the current task enables / disables PER tracing for itself the PER control registers need to be loaded in FixPerRegisters. Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/ptrace.c | 3 +++ 1 file changed, 3 insertions(+) --- a/arch/s390/kernel/ptrace.c +++ b/arch/s390/kernel/ptrace.c @@ -98,6 +98,9 @@ FixPerRegisters(struct task_struct *task per_info->control_regs.bits.storage_alt_space_ctl = 1; else per_info->control_regs.bits.storage_alt_space_ctl = 0; + + if (task == current) + __ctl_load(per_info->control_regs.words, 9, 11); } void user_enable_single_step(struct task_struct *task) -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From oleg at redhat.com Mon Jan 4 18:14:12 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Mon, 4 Jan 2010 19:14:12 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> Message-ID: <20100104181412.GA21146@redhat.com> On 01/04, Martin Schwidefsky wrote: > > On Mon, 4 Jan 2010 16:52:25 +0100 > Oleg Nesterov wrote: > > > We have some strange problems with utrace on s390, and so far this _looks_ > > like a s390 problem. > > > > Looks like, on any CPU user_enable_single_step() does not "work" until at > > least one thread with per_info.single_step = 1 does the context switch. > > The PER control registers only get reloaded on task switch. Can you test > if this patch fixes your problem? > > -- > Subject: [PATCH] fix loading of PER control registers for utrace. > > From: Martin Schwidefsky > > If the current task enables / disables PER tracing for itself the > PER control registers need to be loaded in FixPerRegisters. > > Signed-off-by: Martin Schwidefsky > --- > arch/s390/kernel/ptrace.c | 3 +++ > 1 file changed, 3 insertions(+) > > --- a/arch/s390/kernel/ptrace.c > +++ b/arch/s390/kernel/ptrace.c > @@ -98,6 +98,9 @@ FixPerRegisters(struct task_struct *task > per_info->control_regs.bits.storage_alt_space_ctl = 1; > else > per_info->control_regs.bits.storage_alt_space_ctl = 0; > + > + if (task == current) > + __ctl_load(per_info->control_regs.words, 9, 11); > } Yes it does fix the problem! Thanks a lot Martin. However. Could you please look at 6580807da14c423f0d0a708108e6df6ebc8bc83d ? I am worried, perhaps this commit is not enough for s390. OK, do_single_step() tracehook_consider_fatal_signal(), this means the forked thread will not be killed by SIGTRAP if it is not auto-attached, but still this may be wrong. IOW. I think this problem is minor and probably can be ignored, but if I remove tracehook_consider_fatal_signal() check from do_single_step(), --- a/arch/s390/kernel/traps.c +++ b/arch/s390/kernel/traps.c @@ -382,8 +382,7 @@ void __kprobes do_single_step(struct pt_ SIGTRAP) == NOTIFY_STOP){ return; } - if (tracehook_consider_fatal_signal(current, SIGTRAP)) - force_sig(SIGTRAP, current); + force_sig(SIGTRAP, current); } static void default_trap_handler(struct pt_regs * regs, long interruption_code) ------------------------------------------------------------------- then the test-case from 6580807da14c423f0d0a708108e6df6ebc8bc83d fails. This probably means that copy_process()->user_disable_single_step() is not enough to clear the "this task wants single-stepping" copied from parent. Thanks! Oleg. From oleg at redhat.com Mon Jan 4 19:30:44 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Mon, 4 Jan 2010 20:30:44 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100104181412.GA21146@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> Message-ID: <20100104193044.GB21146@redhat.com> On 01/04, Oleg Nesterov wrote: > > IOW. I think this problem is minor and probably can be ignored, Or may be not... Even if the child is not killed by SIGTRAP, it can get a lot of unnecessary traps. To verify, I did another trivial patch (below), and the test case from 6580807da14c423f0d0a708108e6df6ebc8bc83d does trigger a lot of "false step" printks. Hmm. And sometimes there is nothing in dmesg, but the test-case needs a lot of time to complete. "taskset -c" seems to always trigger printk's. Magic. Oleg. --- arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 +++ arch/s390/kernel/traps.c 2010-01-04 13:19:51.038187586 -0500 @@ -384,6 +384,8 @@ void __kprobes do_single_step(struct pt_ } if (tracehook_consider_fatal_signal(current, SIGTRAP)) force_sig(SIGTRAP, current); + else + printk("false step\n"); } static void default_trap_handler(struct pt_regs * regs, long interruption_code) From roland at redhat.com Mon Jan 4 20:46:56 2010 From: roland at redhat.com (Roland McGrath) Date: Mon, 4 Jan 2010 12:46:56 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Monday, 4 January 2010 17:16:26 +0100 <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> Message-ID: <20100104204656.B2396D532@magilla.sf.frob.com> > The PER control registers only get reloaded on task switch. Can you test > if this patch fixes your problem? Long ago when I first worked with David Wilder on s390 arch code, I remember we made this change. It seems to have been forgotten in the later rounds of reworking and merging. Thanks, Roland From roland at redhat.com Mon Jan 4 21:11:47 2010 From: roland at redhat.com (Roland McGrath) Date: Mon, 4 Jan 2010 13:11:47 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Oleg Nesterov's message of Monday, 4 January 2010 19:14:12 +0100 <20100104181412.GA21146@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> Message-ID: <20100104211147.4CC94D532@magilla.sf.frob.com> > This probably means that copy_process()->user_disable_single_step() > is not enough to clear the "this task wants single-stepping" copied > from parent. I would suspect s390's TIF_SINGLE_STEP flag here. That flag means "a single-step trap occurred". This is what causes do_single_step to be called before returning to user mode, rather than the machine trap doing it directly as is done in the other arch implementations. If I'm right, then "this task wants single-stepping" is not the problem, and that really is fully cleared. In fact, looking at s390's copy_thread (arch/s390/kernel/process.c) it clears out all the state that is actually touched by user_enable_single_step and user_disable_single_step. So for s390 the new fork.c call is actually superfluous AFAICT. The problem is that the copied parent state includes the "this task has a pending single-step to report" flag. IMHO it clearly makes sense for s390's copy_thread to clear this flag in a new task, which it does not do now. An alternative to that would be just to make its user_disable_single_step clear the flag. That could in theory also have an effect on e.g. the (authentic) pending step report of a tracee that was stopped with TIF_SINGLE_STEP set when its tracer detached. This might be considered a good thing, but since every other arch posts the SIGTRAP immediately they all have the equivalent issue and s390 doesn't need to be any "better" than they are before we have a generic resolution to the whole subject of tracer-induced signals (which we won't get into now). I'm not even sure from my insufficient reading of the s390 assembly code whether this path is even possible, i.e. do_signal called before do_single_step. Martin, I suggest having copy_thread clear TIF_SINGLE_STEP. That bit is always task-private state that should not be copied. Btw, given the complexity of FixPerRegisters (and its new additional cost on task==current), you might want to make user_*_single_step bail out if per_info.single_step is already set/clear on entry. Thanks, Roland From schwidefsky at de.ibm.com Tue Jan 5 09:26:06 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Tue, 5 Jan 2010 10:26:06 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100104181412.GA21146@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> Message-ID: <20100105102606.4f223990@mschwide.boeblingen.de.ibm.com> On Mon, 4 Jan 2010 19:14:12 +0100 Oleg Nesterov wrote: > On 01/04, Martin Schwidefsky wrote: > > Subject: [PATCH] fix loading of PER control registers for utrace. > > > > From: Martin Schwidefsky > > > > If the current task enables / disables PER tracing for itself the > > PER control registers need to be loaded in FixPerRegisters. > > > > Signed-off-by: Martin Schwidefsky > > --- > > arch/s390/kernel/ptrace.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > --- a/arch/s390/kernel/ptrace.c > > +++ b/arch/s390/kernel/ptrace.c > > @@ -98,6 +98,9 @@ FixPerRegisters(struct task_struct *task > > per_info->control_regs.bits.storage_alt_space_ctl = 1; > > else > > per_info->control_regs.bits.storage_alt_space_ctl = 0; > > + > > + if (task == current) > > + __ctl_load(per_info->control_regs.words, 9, 11); > > } > > Yes it does fix the problem! Thanks a lot Martin. Ok, I will add that patch to the git390 queue. > However. Could you please look at 6580807da14c423f0d0a708108e6df6ebc8bc83d ? > I am worried, perhaps this commit is not enough for s390. OK, do_single_step() > tracehook_consider_fatal_signal(), this means the forked thread will not > be killed by SIGTRAP if it is not auto-attached, but still this may be > wrong. > > IOW. I think this problem is minor and probably can be ignored, but if > I remove tracehook_consider_fatal_signal() check from do_single_step(), > > --- a/arch/s390/kernel/traps.c > +++ b/arch/s390/kernel/traps.c > @@ -382,8 +382,7 @@ void __kprobes do_single_step(struct pt_ > SIGTRAP) == NOTIFY_STOP){ > return; > } > - if (tracehook_consider_fatal_signal(current, SIGTRAP)) > - force_sig(SIGTRAP, current); > + force_sig(SIGTRAP, current); > } > > static void default_trap_handler(struct pt_regs * regs, long interruption_code) > ------------------------------------------------------------------- > > then the test-case from 6580807da14c423f0d0a708108e6df6ebc8bc83d > fails. This probably means that copy_process()->user_disable_single_step() > is not enough to clear the "this task wants single-stepping" copied > from parent. user_disable_single_step() does not remove the TIF_SINGLE_STEP bit from the forked task. Perhaps we should just clear the bit in the function. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From schwidefsky at de.ibm.com Tue Jan 5 09:50:30 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Tue, 5 Jan 2010 10:50:30 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100104211147.4CC94D532@magilla.sf.frob.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> Message-ID: <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> On Mon, 4 Jan 2010 13:11:47 -0800 (PST) Roland McGrath wrote: > > This probably means that copy_process()->user_disable_single_step() > > is not enough to clear the "this task wants single-stepping" copied > > from parent. > > I would suspect s390's TIF_SINGLE_STEP flag here. That flag means "a > single-step trap occurred". This is what causes do_single_step to be > called before returning to user mode, rather than the machine trap doing it > directly as is done in the other arch implementations. Just my thinking as well. > If I'm right, then "this task wants single-stepping" is not the problem, > and that really is fully cleared. In fact, looking at s390's copy_thread > (arch/s390/kernel/process.c) it clears out all the state that is actually > touched by user_enable_single_step and user_disable_single_step. So for > s390 the new fork.c call is actually superfluous AFAICT. /* Don't copy debug registers */ memset(&p->thread.per_info, 0, sizeof(p->thread.per_info)); Yep, the call from fork.c is indeed superfluous. > The problem is that the copied parent state includes the "this task has a > pending single-step to report" flag. IMHO it clearly makes sense for > s390's copy_thread to clear this flag in a new task, which it does not do now. > > An alternative to that would be just to make its user_disable_single_step > clear the flag. That could in theory also have an effect on e.g. the > (authentic) pending step report of a tracee that was stopped with > TIF_SINGLE_STEP set when its tracer detached. This might be considered a > good thing, but since every other arch posts the SIGTRAP immediately they > all have the equivalent issue and s390 doesn't need to be any "better" than > they are before we have a generic resolution to the whole subject of > tracer-induced signals (which we won't get into now). I'm not even sure > from my insufficient reading of the s390 assembly code whether this path is > even possible, i.e. do_signal called before do_single_step. do_signal is called before do_single_step. The order of checks of the TIF_ bits is 1) machine checks, 2) need resched, 3) signal pending, 4) notify resume, 5) restarting system call, 6) single step. But why is that important ? If the TIF_SINGLE_STEP bit is set the order of do_signal vs. do_single_step does not seem to be important to me. There will be a SIGTRAP if TIF_SINGLE_STEP is set, no ? But I agree, it is probably better to make all arches look the same in regard to that pending step report. > Martin, I suggest having copy_thread clear TIF_SINGLE_STEP. > That bit is always task-private state that should not be copied. Then let us do this. > Btw, given the complexity of FixPerRegisters (and its new additional cost > on task==current), you might want to make user_*_single_step bail out if > per_info.single_step is already set/clear on entry. The LCTLG of multiple control registers is rather expensive. Does it happen often that user_*_single_step is called without need? For gdb is doesn't matter, the cost to switch between tracer and tracee is already large, the cycles added to FixPerRegisters won't matter much. For utrace things might be different. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From oleg at redhat.com Tue Jan 5 15:36:33 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 5 Jan 2010 16:36:33 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> Message-ID: <20100105153633.GA9376@redhat.com> On 01/05, Martin Schwidefsky wrote: > > On Mon, 4 Jan 2010 13:11:47 -0800 (PST) > Roland McGrath wrote: > > > > This probably means that copy_process()->user_disable_single_step() > > > is not enough to clear the "this task wants single-stepping" copied > > > from parent. > > > > I would suspect s390's TIF_SINGLE_STEP flag here. That flag means "a > > single-step trap occurred". This is what causes do_single_step to be > > called before returning to user mode, rather than the machine trap doing it > > directly as is done in the other arch implementations. > > Just my thinking as well. Oh, I am not sure. But I don't understand TIF_SINGLE_STEP on s390, absolutely. For example, why do_signal() sets TIF_SINGLE_STEP? Why can't we do --- a/arch/s390/kernel/signal.c +++ b/arch/s390/kernel/signal.c @@ -500,18 +500,10 @@ void do_signal(struct pt_regs *regs) clear_thread_flag(TIF_RESTORE_SIGMASK); /* - * If we would have taken a single-step trap - * for a normal instruction, act like we took - * one for the handler setup. - */ - if (current->thread.per_info.single_step) - set_thread_flag(TIF_SINGLE_STEP); - - /* * Let tracing know that we've done the handler setup. */ tracehook_signal_handler(signr, &info, &ka, regs, - test_thread_flag(TIF_SINGLE_STEP)); + current->thread.per_info.single_step); } return; } ? Apart from arch/s390/signal.c, TIF_SINGLE_STEP is used by entry.S but I don't understand this asm at all. Anyway. I modified the debugging patch a bit: --- K/arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 +++ K/arch/s390/kernel/traps.c 2010-01-05 09:49:19.541792379 -0500 @@ -384,6 +384,8 @@ void __kprobes do_single_step(struct pt_ } if (tracehook_consider_fatal_signal(current, SIGTRAP)) force_sig(SIGTRAP, current); + else + printk("XXX: %d %d\n", current->pid, test_thread_flag(TIF_SINGLE_STEP)); } static void default_trap_handler(struct pt_regs * regs, long interruption_code) ------------------------------------------------------------------------------- Now, when I run this test-case #include #include #include #include #include #include int main(void) { int pid, status; if (!(pid = fork())) { assert(ptrace(PTRACE_TRACEME) == 0); kill(getpid(), SIGSTOP); if (!fork()) return 43; wait(&status); return WEXITSTATUS(status); } for (;;) { assert(pid == wait(&status)); if (WIFEXITED(status)) break; assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); } assert(WEXITSTATUS(status) == 43); return 0; } dmesg shows 799 lines of XXX: 2389 0 The kernel is 2.6.32.2 + utrace, but CONFIG_UTRACE is not set. Oleg. From schwidefsky at de.ibm.com Tue Jan 5 15:46:10 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Tue, 5 Jan 2010 16:46:10 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105153633.GA9376@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> Message-ID: <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> On Tue, 5 Jan 2010 16:36:33 +0100 Oleg Nesterov wrote: > On 01/05, Martin Schwidefsky wrote: > > > > On Mon, 4 Jan 2010 13:11:47 -0800 (PST) > > Roland McGrath wrote: > > > > > > This probably means that copy_process()->user_disable_single_step() > > > > is not enough to clear the "this task wants single-stepping" copied > > > > from parent. > > > > > > I would suspect s390's TIF_SINGLE_STEP flag here. That flag means "a > > > single-step trap occurred". This is what causes do_single_step to be > > > called before returning to user mode, rather than the machine trap doing it > > > directly as is done in the other arch implementations. > > > > Just my thinking as well. > > Oh, I am not sure. But I don't understand TIF_SINGLE_STEP on s390, > absolutely. > > For example, why do_signal() sets TIF_SINGLE_STEP? Why can't we do > > --- a/arch/s390/kernel/signal.c > +++ b/arch/s390/kernel/signal.c > @@ -500,18 +500,10 @@ void do_signal(struct pt_regs *regs) > clear_thread_flag(TIF_RESTORE_SIGMASK); > > /* > - * If we would have taken a single-step trap > - * for a normal instruction, act like we took > - * one for the handler setup. > - */ > - if (current->thread.per_info.single_step) > - set_thread_flag(TIF_SINGLE_STEP); > - > - /* > * Let tracing know that we've done the handler setup. > */ > tracehook_signal_handler(signr, &info, &ka, regs, > - test_thread_flag(TIF_SINGLE_STEP)); > + current->thread.per_info.single_step); > } > return; > } > > ? The reason why we set the TIF_SINGLE_STEP bit in do_signal is that we want to be able to stop the debugged program before the first instruction of the signal handler has been executed. The PER single step causes a trap after an instruction has been executed. That first instruction can do bad things to the arguments of the signal handler.. > Apart from arch/s390/signal.c, TIF_SINGLE_STEP is used by entry.S > but I don't understand this asm at all. > > Anyway. I modified the debugging patch a bit: > > --- K/arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 > +++ K/arch/s390/kernel/traps.c 2010-01-05 09:49:19.541792379 -0500 > @@ -384,6 +384,8 @@ void __kprobes do_single_step(struct pt_ > } > if (tracehook_consider_fatal_signal(current, SIGTRAP)) > force_sig(SIGTRAP, current); > + else > + printk("XXX: %d %d\n", current->pid, test_thread_flag(TIF_SINGLE_STEP)); > } > > static void default_trap_handler(struct pt_regs * regs, long interruption_code) > ------------------------------------------------------------------------------- > > Now, when I run this test-case > > #include > #include > #include > #include > #include > #include > > int main(void) > { > int pid, status; > > if (!(pid = fork())) { > assert(ptrace(PTRACE_TRACEME) == 0); > kill(getpid(), SIGSTOP); > > if (!fork()) > return 43; > > wait(&status); > return WEXITSTATUS(status); > } > > > for (;;) { > assert(pid == wait(&status)); > if (WIFEXITED(status)) > break; > assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); > } > > assert(WEXITSTATUS(status) == 43); > return 0; > } > > dmesg shows 799 lines of > > XXX: 2389 0 > > > The kernel is 2.6.32.2 + utrace, but CONFIG_UTRACE is not set. With or without my bug fix ? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From oleg at redhat.com Tue Jan 5 15:47:25 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 5 Jan 2010 16:47:25 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105153633.GA9376@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> Message-ID: <20100105154725.GB9376@redhat.com> On 01/05, Oleg Nesterov wrote: > > Anyway. I modified the debugging patch a bit: > > --- K/arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 > +++ K/arch/s390/kernel/traps.c 2010-01-05 09:49:19.541792379 -0500 > @@ -384,6 +384,8 @@ void __kprobes do_single_step(struct pt_ > } > if (tracehook_consider_fatal_signal(current, SIGTRAP)) > force_sig(SIGTRAP, current); > + else > + printk("XXX: %d %d\n", current->pid, test_thread_flag(TIF_SINGLE_STEP)); > } > > static void default_trap_handler(struct pt_regs * regs, long interruption_code) > ------------------------------------------------------------------------------- Ah, please ignore. I guess TIF_SINGLE_STEP was already cleared by the caller in entry.S Oleg. From schwidefsky at de.ibm.com Tue Jan 5 15:50:53 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Tue, 5 Jan 2010 16:50:53 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105154725.GB9376@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105154725.GB9376@redhat.com> Message-ID: <20100105165053.3e75e438@mschwide.boeblingen.de.ibm.com> On Tue, 5 Jan 2010 16:47:25 +0100 Oleg Nesterov wrote: > On 01/05, Oleg Nesterov wrote: > > > > Anyway. I modified the debugging patch a bit: > > > > --- K/arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 > > +++ K/arch/s390/kernel/traps.c 2010-01-05 09:49:19.541792379 -0500 > > @@ -384,6 +384,8 @@ void __kprobes do_single_step(struct pt_ > > } > > if (tracehook_consider_fatal_signal(current, SIGTRAP)) > > force_sig(SIGTRAP, current); > > + else > > + printk("XXX: %d %d\n", current->pid, test_thread_flag(TIF_SINGLE_STEP)); > > } > > > > static void default_trap_handler(struct pt_regs * regs, long interruption_code) > > ------------------------------------------------------------------------------- > > Ah, please ignore. I guess TIF_SINGLE_STEP was already cleared by the caller > in entry.S Yes, TIF_SINGLE_STEP is checked in entry.S and cleared before do_signal is called. That is the "ni" instruction at sysc_singlestep and sysc_sigpending. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From oleg at redhat.com Tue Jan 5 15:59:13 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 5 Jan 2010 16:59:13 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> Message-ID: <20100105155913.GA10652@redhat.com> On 01/05, Martin Schwidefsky wrote: > > On Tue, 5 Jan 2010 16:36:33 +0100 > Oleg Nesterov wrote: > > > For example, why do_signal() sets TIF_SINGLE_STEP? Why can't we do > > > > --- a/arch/s390/kernel/signal.c > > +++ b/arch/s390/kernel/signal.c > > @@ -500,18 +500,10 @@ void do_signal(struct pt_regs *regs) > > clear_thread_flag(TIF_RESTORE_SIGMASK); > > > > /* > > - * If we would have taken a single-step trap > > - * for a normal instruction, act like we took > > - * one for the handler setup. > > - */ > > - if (current->thread.per_info.single_step) > > - set_thread_flag(TIF_SINGLE_STEP); > > - > > - /* > > * Let tracing know that we've done the handler setup. > > */ > > tracehook_signal_handler(signr, &info, &ka, regs, > > - test_thread_flag(TIF_SINGLE_STEP)); > > + current->thread.per_info.single_step); > > } > > return; > > } > > > > ? > > The reason why we set the TIF_SINGLE_STEP bit in do_signal is that we > want to be able to stop the debugged program before the first > instruction of the signal handler has been executed. The PER single > step causes a trap after an instruction has been executed. That first > instruction can do bad things to the arguments of the signal handler.. Yes, but afaics all we need is to pass the correct "int stepping" arg to tracehook_signal_handler(). If it is true, the tracee does ptrace_notify() before it returns to user-mode. > > dmesg shows 799 lines of > > > > XXX: 2389 0 > > > > > > The kernel is 2.6.32.2 + utrace, but CONFIG_UTRACE is not set. > > With or without my bug fix ? With, but please see another email. I'll add clear_bit(TIF_SINGLE_STEP) into do_fork() path and re-test. Oleg. From oleg at redhat.com Tue Jan 5 17:03:01 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 5 Jan 2010 18:03:01 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105155913.GA10652@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> Message-ID: <20100105170301.GA13641@redhat.com> On 01/05, Oleg Nesterov wrote: > > I'll add clear_bit(TIF_SINGLE_STEP) into do_fork() path and re-test. Hmm. This patch --- kernel/fork.c~ 2009-12-22 10:41:53.188084961 -0500 +++ kernel/fork.c 2010-01-05 11:42:58.370636323 -0500 @@ -1206,6 +1206,8 @@ static struct task_struct *copy_process( * of CLONE_PTRACE. */ clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); + clear_tsk_thread_flag(p, TIF_SINGLE_STEP); + user_disable_single_step(p); #ifdef TIF_SYSCALL_EMU clear_tsk_thread_flag(p, TIF_SYSCALL_EMU); #endif doesn't help, I still see the same XXX's in dmesg... Oleg. From envoi at bdop89.info Tue Jan 5 18:02:22 2010 From: envoi at bdop89.info (E-Marketing Paris 2010 par MediaMailing) Date: Tue, 5 Jan 2010 20:02:22 +0200 Subject: =?iso-8859-1?Q?E-Marketing_Paris_2010_:_2_jours_pour_aiguiser_votre_strat?= =?iso-8859-1?Q?=E9gie?= Message-ID: <9a1dc0adb1edd1b293b701820bc0cc1c@om4.market-products.com> An HTML attachment was scrubbed... URL: From oleg at redhat.com Tue Jan 5 19:58:18 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 5 Jan 2010 20:58:18 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105170301.GA13641@redhat.com> References: <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> Message-ID: <20100105195818.GA20358@redhat.com> On 01/05, Oleg Nesterov wrote: > > On 01/05, Oleg Nesterov wrote: > > > > I'll add clear_bit(TIF_SINGLE_STEP) into do_fork() path and re-test. > > Hmm. This patch > > --- kernel/fork.c~ 2009-12-22 10:41:53.188084961 -0500 > +++ kernel/fork.c 2010-01-05 11:42:58.370636323 -0500 > @@ -1206,6 +1206,8 @@ static struct task_struct *copy_process( > * of CLONE_PTRACE. > */ > clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); > + clear_tsk_thread_flag(p, TIF_SINGLE_STEP); > + user_disable_single_step(p); > #ifdef TIF_SYSCALL_EMU > clear_tsk_thread_flag(p, TIF_SYSCALL_EMU); > #endif > > doesn't help, I still see the same XXX's in dmesg... Oh, now I am totally confused. I reverted your fix from https://www.redhat.com/archives/utrace-devel/2010-January/msg00006.html and now there is nothing in dmesg. I decided to re-test this all with vanilla 2.6.33-rc2. It is really amazing how long it takes to recompile/install the kernel! I spent the rest of this day fighting with this rhts machine. Result - it doesn't boot: 00: zIPL v1.8.2-5.el6 interactive boot menu 00: 00: 0. default (2.6.33-rc2) 00: 00: 1. 2.6.33-rc2 00: 2. 2.6.32.2-14.s390x.el6.s390x 00: 3. 2.6.32.2-14.el6.s390x 00: 4. linux 00: 00: Note: VM users please use '#cp vi vmsg ' 00: 00: Please choose (default will boot in 15 seconds): 00: Booting default (2.6.33-rc2)... ?<000000000011c4fc>? sysc_return+0x0/0x8 ?<00000000003cc0c6>? selinux_sb_copy_data+0x17e/0x238 (?<00000000003cbf94>? selinux_sb_copy_data+0x4c/0x238) ?<00000000003b62a6>? security_sb_copy_data+0x4e/0x60 ?<0000000000280338>? vfs_kern_mount+0x19c/0x1f4 ?<00000000002803de>? kern_mount_data+0x4e/0x5c ?<0000000000ae1908>? devtmpfs_init+0x60/0xbc ?<0000000000ae1818>? driver_init+0x28/0x6c ?<0000000000abe582>? kernel_init+0x32a/0x3d8 ?<000000000010b8c2>? kernel_thread_starter+0x6/0xc ?<000000000010b8bc>? kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00114BD4 00: 00: 00: 00: 00: 00: Cai, any chance you can help? Using /usr/bin/console I can't choose another kernel at the "00: Please choose (default will boot in 15 seconds):" stage, how can I do this??? Oleg. From mldireto at tudoemoferta.com.br Tue Jan 5 21:55:12 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Tue, 5 Jan 2010 19:55:12 -0200 Subject: Parabens 6 Anos de TudoemOferta Message-ID: An HTML attachment was scrubbed... URL: From rtn at park.goldenname.com Wed Jan 6 00:46:18 2010 From: rtn at park.goldenname.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Wed, 06 Jan 2010 00:46:18 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVs0MKy+sa30dC3otPrudzA7Q==?= Message-ID: <201001060045.o060jnDl007477@mx1.redhat.com> utrace-devel???????????????? ?????2010?1?15-16? ? ? ?????2010?1?22-23? ? ? ???????????????????????????????????????????? ??????????????????????????????????????????? ????????????????? ?====??2500?/2?/?(???????????????????????) ???????????????????????? ?????020-80560638?020-85917945 ???(??????????????chinammc21 at 126.comction Plan??????????????????????????????? ====================================================================================== ???? ?????? ??????????? 1???????????????????????????IT???? 2???????????? 3???????????????NPD?CMMI?ISO9000???? 4???????????????? 5???????? 6???????????????????????????????? 7???????????????? 8?????????????????? 9??????????????? 10???????? ??????????????? 1???????? 2?????????????????????? 3???????????? 4?????????????? 5????????????????? 6????????????????? 7?????????????????? 1??????????? 2???????????????????????????????? 3?????????????? 4?????????????????? 5?????????????? 6???????????????????????? 8??????????? 9??????????????? 10????????????????? 11????????????????????? 12????????????????????????? 13???????? ????????????????????????????? 1?????????????????? 2????????????? 3????????????????? 4????????????????? 5????????????? 6???????????? 7????????? 8?????????? 9????????????? 10??????????????????????. 11???????????? *??????????? 12?????????????? 13???? ??????????????? 14???????? ????????????? 1?????????????? 2????????????????? 3???????????????????????????? a??????????? b?????????????? c?????????????? d????????????? e?????????????????????? 4????????????????? a??????? b??????? c???????????? drom s.jouanneau at ledestockeur.com Wed Jan 6 08:38:16 2010 From: s.jouanneau at ledestockeur.com (Le destockeur) Date: Wed, 6 Jan 2010 09:38:16 +0100 Subject: destockage urgent Message-ID: An HTML attachment was scrubbed... URL: From reg at yerb.com Wed Jan 6 14:32:25 2010 From: reg at yerb.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Wed, 06 Jan 2010 14:32:25 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVsz/rK283FttO8pMD40+u/vLrL?= Message-ID: <201001061432.o06EVwIi030627@mx1.redhat.com> utrace-devel????????????????? ???09???????????? ------------------------------------------------------------------------ ?????2010?1?23-24? ?? ?????2010?1?27-28? ?? ?????2010?1?30-31? ?? ?????2860?/???????????????????? ???????????????????????????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc21 at 126.comrom heiko.carstens at de.ibm.com Wed Jan 6 14:59:08 2010 From: heiko.carstens at de.ibm.com (Heiko Carstens) Date: Wed, 6 Jan 2010 15:59:08 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105195818.GA20358@redhat.com> References: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> Message-ID: <20100106145908.GA5621@osiris.boeblingen.de.ibm.com> On Tue, Jan 05, 2010 at 08:58:18PM +0100, Oleg Nesterov wrote: > I decided to re-test this all with vanilla 2.6.33-rc2. It is really > amazing how long it takes to recompile/install the kernel! Then either you have a lot of steal time or an old machine (pre z10)? You could also try to define more cpus if you have a virtual machine: #cp define cpu - ...that is if your admin gave you permissions to do that. > I spent > the rest of this day fighting with this rhts machine. Result - it > doesn't boot: > > 00: zIPL v1.8.2-5.el6 interactive boot menu > 00: > 00: 0. default (2.6.33-rc2) > 00: > 00: 1. 2.6.33-rc2 > 00: 2. 2.6.32.2-14.s390x.el6.s390x > 00: 3. 2.6.32.2-14.el6.s390x > 00: 4. linux > 00: > 00: Note: VM users please use '#cp vi vmsg ' > 00: > 00: Please choose (default will boot in 15 seconds): > 00: Booting default (2.6.33-rc2)... > ?<000000000011c4fc>? sysc_return+0x0/0x8 > ?<00000000003cc0c6>? selinux_sb_copy_data+0x17e/0x238 > (?<00000000003cbf94>? selinux_sb_copy_data+0x4c/0x238) > ?<00000000003b62a6>? security_sb_copy_data+0x4e/0x60 > ?<0000000000280338>? vfs_kern_mount+0x19c/0x1f4 > ?<00000000002803de>? kern_mount_data+0x4e/0x5c > ?<0000000000ae1908>? devtmpfs_init+0x60/0xbc > ?<0000000000ae1818>? driver_init+0x28/0x6c > ?<0000000000abe582>? kernel_init+0x32a/0x3d8 > ?<000000000010b8c2>? kernel_thread_starter+0x6/0xc > ?<000000000010b8bc>? kernel_thread_starter+0x0/0xc > INFO: lockdep is turned off. > 00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00114BD4 > > Cai, any chance you can help? Using /usr/bin/console I can't choose another > kernel at the "00: Please choose (default will boot in 15 seconds):" stage, > how can I do this??? You did enter something like #cp vi vmsg 2 and not just '2'? From caiqian at redhat.com Wed Jan 6 15:33:10 2010 From: caiqian at redhat.com (caiqian at redhat.com) Date: Wed, 6 Jan 2010 10:33:10 -0500 (EST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <1158952983.251101262791902387.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <1126133396.251251262791990053.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> > Cai, any chance you can help? Using /usr/bin/console I can't choose > anotherkernel at the "00: Please choose (default will boot in 15 seconds):" > stage, how can I do this??? As Heiko mentioned, I did manage to enter, #cp vi vmsg 2 during the prompt to get to the second kernel to boot... Thanks, CAI Qian From newsletter at usbportugal.com Wed Jan 6 17:16:13 2010 From: newsletter at usbportugal.com (USBPortugal.com) Date: Wed, 6 Jan 2010 18:16:13 +0100 Subject: =?iso-8859-1?Q?J=E1_n=E3o_h=E1_mem=F3ria_de..._1/2010?= Message-ID: An HTML attachment was scrubbed... URL: From oleg at redhat.com Wed Jan 6 20:09:44 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 6 Jan 2010 21:09:44 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <1126133396.251251262791990053.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> References: <1158952983.251101262791902387.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1126133396.251251262791990053.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <20100106200944.GA26204@redhat.com> On 01/06, caiqian at redhat.com wrote: > > > Cai, any chance you can help? Using /usr/bin/console I can't choose > > anotherkernel at the "00: Please choose (default will boot in 15 seconds):" > > stage, how can I do this??? > > As Heiko mentioned, I did manage to enter, > > #cp vi vmsg 2 if only I new about this magic ;) Thanks Cai and Heiko! Oleg. From oleg at redhat.com Wed Jan 6 20:17:22 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 6 Jan 2010 21:17:22 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105195818.GA20358@redhat.com> References: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> Message-ID: <20100106201722.GB26204@redhat.com> On 01/05, Oleg Nesterov wrote: > > On 01/05, Oleg Nesterov wrote: > > > > On 01/05, Oleg Nesterov wrote: > > > > > > I'll add clear_bit(TIF_SINGLE_STEP) into do_fork() path and re-test. > > > > Hmm. This patch > > > > --- kernel/fork.c~ 2009-12-22 10:41:53.188084961 -0500 > > +++ kernel/fork.c 2010-01-05 11:42:58.370636323 -0500 > > @@ -1206,6 +1206,8 @@ static struct task_struct *copy_process( > > * of CLONE_PTRACE. > > */ > > clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); > > + clear_tsk_thread_flag(p, TIF_SINGLE_STEP); > > + user_disable_single_step(p); > > #ifdef TIF_SYSCALL_EMU > > clear_tsk_thread_flag(p, TIF_SYSCALL_EMU); > > #endif > > > > doesn't help, I still see the same XXX's in dmesg... > > Oh, now I am totally confused. > > I reverted your fix from > https://www.redhat.com/archives/utrace-devel/2010-January/msg00006.html > and now there is nothing in dmesg. I take this back. I re-tested this all under 2.6.32.2 + utrace, and I see nothing in dmesg. I don't know what I did wrong, most probably I forgot to do zipl or something like this... I'll try to summarize. Martin's patch from https://www.redhat.com/archives/utrace-devel/2010-January/msg00006.html fixes the problems with utrace. However, with or without CONFIG_UTRACE, 6580807da14c423f0d0a708108e6df6ebc8bc83d is needed on s390 too, otherwise the child gets unnecessary traps. Oleg. From oleg at redhat.com Wed Jan 6 20:23:47 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 6 Jan 2010 21:23:47 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> Message-ID: <20100106202347.GC26204@redhat.com> On 01/05, Martin Schwidefsky wrote: > > On Mon, 4 Jan 2010 13:11:47 -0800 (PST) > Roland McGrath wrote: > > > > This probably means that copy_process()->user_disable_single_step() > > > is not enough to clear the "this task wants single-stepping" copied > > > from parent. > > > > I would suspect s390's TIF_SINGLE_STEP flag here. That flag means "a > > single-step trap occurred". This is what causes do_single_step to be > > called before returning to user mode, rather than the machine trap doing it > > directly as is done in the other arch implementations. > > Just my thinking as well. > > > If I'm right, then "this task wants single-stepping" is not the problem, > > and that really is fully cleared. In fact, looking at s390's copy_thread > > (arch/s390/kernel/process.c) it clears out all the state that is actually > > touched by user_enable_single_step and user_disable_single_step. So for > > s390 the new fork.c call is actually superfluous AFAICT. > > /* Don't copy debug registers */ > memset(&p->thread.per_info, 0, sizeof(p->thread.per_info)); > > Yep, the call from fork.c is indeed superfluous. I can't explain this, but if I remove copy_process()->user_disable_single_step() the test-case below triggers "XXX" printk's from do_single_step() with or without CONFIG_UTRACE. the patch is --- arch/s390/kernel/traps.c~ 2009-12-22 10:41:52.909174198 -0500 +++ arch/s390/kernel/traps.c 2010-01-05 11:03:55.006487697 -0500 @@ -384,6 +384,9 @@ void __kprobes do_single_step(struct pt_ } if (tracehook_consider_fatal_signal(current, SIGTRAP)) force_sig(SIGTRAP, current); + else + printk("XXX: %s/%d %d\n", current->comm, current->pid, + test_thread_flag(TIF_SINGLE_STEP)); } static void default_trap_handler(struct pt_regs * regs, long interruption_code) Oleg. #include #include #include #include #include #include int main(void) { int pid, status; if (!(pid = fork())) { assert(ptrace(PTRACE_TRACEME) == 0); kill(getpid(), SIGSTOP); if (!fork()) return 43; wait(&status); return WEXITSTATUS(status); } for (;;) { assert(pid == wait(&status)); if (WIFEXITED(status)) break; assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); } assert(WEXITSTATUS(status) == 43); return 0; } From roland at redhat.com Wed Jan 6 20:56:33 2010 From: roland at redhat.com (Roland McGrath) Date: Wed, 6 Jan 2010 12:56:33 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Tuesday, 5 January 2010 10:50:30 +0100 <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> Message-ID: <20100106205633.700CC134D@magilla.sf.frob.com> > do_signal is called before do_single_step. The order of checks of the > TIF_ bits is 1) machine checks, 2) need resched, 3) signal pending, 4) > notify resume, 5) restarting system call, 6) single step. I see. Then the potential issue I raised would exist. > But why is that important ? If the TIF_SINGLE_STEP bit is set the order > of do_signal vs. do_single_step does not seem to be important to me. > There will be a SIGTRAP if TIF_SINGLE_STEP is set, no ? Right. It only becomes relevant if something else clears TIF_SINGLE_STEP, which does not happen now. I was discussing the scenario of having user_disable_single_step clear it. That might happen inside a stop, i.e. inside do_signal (get_signal_to_deliver). So given that order of checks, it becomes possible for user_disable_single_step to affect the pending-step-should-SIGTRAP situation. > But I agree, it is probably better to make all arches look the same in > regard to that pending step report. Right. That means we should leave the status quo of not clearing TIF_SINGLE_STEP in user_disable_single_step. > > Martin, I suggest having copy_thread clear TIF_SINGLE_STEP. > > That bit is always task-private state that should not be copied. > > Then let us do this. Yes, good. > The LCTLG of multiple control registers is rather expensive. Does it > happen often that user_*_single_step is called without need? For gdb is > doesn't matter, the cost to switch between tracer and tracee is already > large, the cycles added to FixPerRegisters won't matter much. For > utrace things might be different. In old (current) ptrace, user_*_single_step is never called on current. In utrace, it's always called on current, so utrace-based ptrace alone adds this second reload overhead beyond the same context-switch overhead old ptrace has. Indeed that addition may be neglible. In other circumstances with utrace, it is very possible to wind up with user_disable_single_step being called superfluously when there was no stop (and so not necessarily any context switch or other high overhead). On other machines, user_disable_single_step is pretty cheap even where user_enable_single_step is quite costly. Given how simple and cheap it is to short-circuit the excess work on s390, I think it is worthwhile. It looks like the context-switch code already checks some magic bits in per_info to decide whether to do the costly reload. So it seems vaguely consistent with that to conditionalize FixPerRegisters similarly. The single_step cases are a special case with an easy one-bit check to short-circuit, so skipping all of FixPerRegisters seems worthwhile IMHO. To be really optimization-happy about it, you'd also hack FixPerRegisters itself to do the reload on current only if PSW_MASK_PER is or was set (if I'm following the code correctly). Or perhaps it should check PER_EM_MASK instead to match what __switch_to does. (I don't understand the distinction between those two tests, if there is one.) Extra frobbity would be to leave the old state too when clearing PSW_MASK_PER, and then just have the trap handler lazily ignore a hit when current doesn't have it set. Then in a case where there is no hit before context switch, you haven't done anything. But that is probably both excessive and not even a win if the PER use is single-step and so there will really very likely be a hit before context switch. Thanks, Roland From roland at redhat.com Wed Jan 6 21:08:12 2010 From: roland at redhat.com (Roland McGrath) Date: Wed, 6 Jan 2010 13:08:12 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Oleg Nesterov's message of Tuesday, 5 January 2010 16:36:33 +0100 <20100105153633.GA9376@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> Message-ID: <20100106210812.E03A1134D@magilla.sf.frob.com> > Oh, I am not sure. But I don't understand TIF_SINGLE_STEP on s390, > absolutely. > > For example, why do_signal() sets TIF_SINGLE_STEP? Why can't we do I think we could. That would be more consistent with other machines. On s390, once we set TIF_SINGLE_STEP, we are going to post a SIGTRAP eventually before going to user mode. But then tracehook_signal_handler() also gets stepping=1 and the expected meaning of this is that the arch code is not itself simulating a single-step for the handler setup. So the tracehook (i.e. ptrace/utrace) code does what it does for "need a fake single-step". In ptrace (including utrace-based ptrace), this winds up with sending a SIGTRAP. So when we finally do get out of do_signal and TIF_SINGLE_STEP causes a second SIGTRAP, it's already pending and the second one makes no difference. But for the general case of utrace, we'll have the UTRACE_SIGNAL_HANDLER report, followed by a SIGTRAP that appears to be an authentic single-step trap, but takes place on the same instruction. If the resumption after the UTRACE_SIGNAL_HANDLER report didn't use stepping, then this is an entirely unexpected extra SIGTRAP. If we do continue stepping, then we are expecting the SIGTRAP, but this gets us a spurious and errnoeous report that looks like the instruction right before the handler's entry point in memory was just executed. [Martin:] > The reason why we set the TIF_SINGLE_STEP bit in do_signal is that we > want to be able to stop the debugged program before the first > instruction of the signal handler has been executed. The PER single > step causes a trap after an instruction has been executed. That first > instruction can do bad things to the arguments of the signal handler.. That's what tracehook_signal_handler is for. You're both doing it yourself in the arch code (by setting TIF_SINGLE_STEP), and then telling the generic code to do it (by passing stepping=1 to tracehook_signal_handler). Thanks, Roland From roland at redhat.com Wed Jan 6 21:13:29 2010 From: roland at redhat.com (Roland McGrath) Date: Wed, 6 Jan 2010 13:13:29 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Oleg Nesterov's message of Wednesday, 6 January 2010 21:17:22 +0100 <20100106201722.GB26204@redhat.com> References: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> Message-ID: <20100106211329.DB4F5134D@magilla.sf.frob.com> > However, with or without CONFIG_UTRACE, 6580807da14c423f0d0a708108e6df6ebc8bc83d > is needed on s390 too, otherwise the child gets unnecessary traps. This confuses me. user_disable_single_step on non-current doesn't do anything not already done by the memset in copy_thread. Ooh, except perhaps it does not clear PSW_MASK_PER. Maybe that matters. That's the only thing I can think of. Maybe Martin can make sense of it. Thanks, Roland From roland at redhat.com Wed Jan 6 21:15:36 2010 From: roland at redhat.com (Roland McGrath) Date: Wed, 6 Jan 2010 13:15:36 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Tuesday, 5 January 2010 10:26:06 +0100 <20100105102606.4f223990@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100105102606.4f223990@mschwide.boeblingen.de.ibm.com> Message-ID: <20100106211536.0F6AC134D@magilla.sf.frob.com> > > then the test-case from 6580807da14c423f0d0a708108e6df6ebc8bc83d > > fails. This probably means that copy_process()->user_disable_single_step() > > is not enough to clear the "this task wants single-stepping" copied > > from parent. > > user_disable_single_step() does not remove the TIF_SINGLE_STEP bit from the > forked task. Perhaps we should just clear the bit in the function. If that were to fix this test case, I think it would be incidental rather than meaning the right thing. The "this task wants single-stepping" state should not have anything to do with TIF_SINGLE_STEP. It means "this task recently had single-stepping", which is a separate moving part. Thanks, Roland From news at standalgarve.com Thu Jan 7 02:55:45 2010 From: news at standalgarve.com (J&B) Date: Thu, 7 Jan 2010 02:55:45 +0000 Subject: Ganha viagens a Londres com J&B! Message-ID: <196620684962574317120@pcmail> An HTML attachment was scrubbed... URL: From schwidefsky at de.ibm.com Thu Jan 7 09:00:50 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Thu, 7 Jan 2010 10:00:50 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100106205633.700CC134D@magilla.sf.frob.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100106205633.700CC134D@magilla.sf.frob.com> Message-ID: <20100107100050.31724463@mschwide.boeblingen.de.ibm.com> On Wed, 6 Jan 2010 12:56:33 -0800 (PST) Roland McGrath wrote: > > do_signal is called before do_single_step. The order of checks of the > > TIF_ bits is 1) machine checks, 2) need resched, 3) signal pending, 4) > > notify resume, 5) restarting system call, 6) single step. > > I see. Then the potential issue I raised would exist. > > > But why is that important ? If the TIF_SINGLE_STEP bit is set the order > > of do_signal vs. do_single_step does not seem to be important to me. > > There will be a SIGTRAP if TIF_SINGLE_STEP is set, no ? > > Right. It only becomes relevant if something else clears TIF_SINGLE_STEP, > which does not happen now. I was discussing the scenario of having > user_disable_single_step clear it. That might happen inside a stop, > i.e. inside do_signal (get_signal_to_deliver). So given that order of > checks, it becomes possible for user_disable_single_step to affect the > pending-step-should-SIGTRAP situation. That was the idea about the TIF_SINGLE_STEP bit. I can be set and later unset if we don't want to deliver the SIGTRAP after all. > > But I agree, it is probably better to make all arches look the same in > > regard to that pending step report. > > Right. That means we should leave the status quo of not clearing > TIF_SINGLE_STEP in user_disable_single_step. Ok, although it seems a bit strange not to do it. Perhaps I should add a comment about it. > > The LCTLG of multiple control registers is rather expensive. Does it > > happen often that user_*_single_step is called without need? For gdb is > > doesn't matter, the cost to switch between tracer and tracee is already > > large, the cycles added to FixPerRegisters won't matter much. For > > utrace things might be different. > > In old (current) ptrace, user_*_single_step is never called on current. > In utrace, it's always called on current, so utrace-based ptrace alone > adds this second reload overhead beyond the same context-switch overhead > old ptrace has. Indeed that addition may be neglible. So after everthing has been converted to utrace we always will load the control registers in FixPerRegisters. > In other circumstances with utrace, it is very possible to wind up with > user_disable_single_step being called superfluously when there was no > stop (and so not necessarily any context switch or other high overhead). > On other machines, user_disable_single_step is pretty cheap even where > user_enable_single_step is quite costly. Given how simple and cheap it > is to short-circuit the excess work on s390, I think it is worthwhile. We could use the same compare of the control registers as the code in __switch_to. See below. > It looks like the context-switch code already checks some magic bits in > per_info to decide whether to do the costly reload. So it seems vaguely > consistent with that to conditionalize FixPerRegisters similarly. The > single_step cases are a special case with an easy one-bit check to > short-circuit, so skipping all of FixPerRegisters seems worthwhile IMHO. What the magic code in __switch_to does is to check if the next process wants to use per and do the load of the control registers only if the current set of control registers 9 to 11 differ from the set for the next process. The check if the next process wants to use per is done with a test-under-mask (TM) instruction and a mask of 0xe8. This checks for 4 bits: em_branching, em_instruction_fetch, em_storage_alteration and em_store_real_address. If one of the bits is set then the current set of control registers is stored, compared with the set for the next process and only if they differ is the lctlg done. The store of control registers is cheap (n cycles for n registers), the load is expensive for specific control registers. For 9 to 11 it costs more than 100 cycles. > To be really optimization-happy about it, you'd also hack FixPerRegisters > itself to do the reload on current only if PSW_MASK_PER is or was set (if > I'm following the code correctly). Or perhaps it should check PER_EM_MASK > instead to match what __switch_to does. (I don't understand the > distinction between those two tests, if there is one.) Extra frobbity > would be to leave the old state too when clearing PSW_MASK_PER, and then > just have the trap handler lazily ignore a hit when current doesn't have > it set. Then in a case where there is no hit before context switch, > you haven't done anything. But that is probably both excessive and > not even a win if the PER use is single-step and so there will really > very likely be a hit before context switch. The PSW_MASK_PER is the "global" PER enablement switch, the PER_EM_MASK bits enable the different PER events. We check for the PER_EM_MASK bits because it is easier to access in __switch_to. The return PSW is stored in the pt_regs structure, we would have to get a pointer to it (what "regs = task_pt_regs(taks)" does in FixPerRegisters). In FixPerRegisters we can as well use the PSW_MASK_PER bit. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From schwidefsky at de.ibm.com Thu Jan 7 09:16:19 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Thu, 7 Jan 2010 10:16:19 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100106210812.E03A1134D@magilla.sf.frob.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100106210812.E03A1134D@magilla.sf.frob.com> Message-ID: <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> On Wed, 6 Jan 2010 13:08:12 -0800 (PST) Roland McGrath wrote: > > Oh, I am not sure. But I don't understand TIF_SINGLE_STEP on s390, > > absolutely. > > > > For example, why do_signal() sets TIF_SINGLE_STEP? Why can't we do > > I think we could. That would be more consistent with other machines. On > s390, once we set TIF_SINGLE_STEP, we are going to post a SIGTRAP > eventually before going to user mode. But then tracehook_signal_handler() > also gets stepping=1 and the expected meaning of this is that the arch code > is not itself simulating a single-step for the handler setup. So the > tracehook (i.e. ptrace/utrace) code does what it does for "need a fake > single-step". Hmm, command for tracehook_signal_handler say this for stepping: @stepping: nonzero if debugger single-step or block-step in use > In ptrace (including utrace-based ptrace), this winds up with sending a > SIGTRAP. So when we finally do get out of do_signal and TIF_SINGLE_STEP > causes a second SIGTRAP, it's already pending and the second one makes no > difference. So we have been lucky so far. > But for the general case of utrace, we'll have the UTRACE_SIGNAL_HANDLER > report, followed by a SIGTRAP that appears to be an authentic single-step > trap, but takes place on the same instruction. If the resumption after the > UTRACE_SIGNAL_HANDLER report didn't use stepping, then this is an entirely > unexpected extra SIGTRAP. If we do continue stepping, then we are > expecting the SIGTRAP, but this gets us a spurious and errnoeous report > that looks like the instruction right before the handler's entry point in > memory was just executed. > > [Martin:] > > The reason why we set the TIF_SINGLE_STEP bit in do_signal is that we > > want to be able to stop the debugged program before the first > > instruction of the signal handler has been executed. The PER single > > step causes a trap after an instruction has been executed. That first > > instruction can do bad things to the arguments of the signal handler.. > > That's what tracehook_signal_handler is for. You're both doing it yourself > in the arch code (by setting TIF_SINGLE_STEP), and then telling the generic > code to do it (by passing stepping=1 to tracehook_signal_handler). Ok, so with the full utrace the semantics of tracehook_signal_handler is more than just causing a SIGTRAP. It is an indication for a signal AND a SIGTRAP if single-stepping is active. To make both cases work we should stop setting TIF_SINGLE_STEP in do_signal and pass current->thread.per_info.single_step to tracehook_signal_handler instead of test_thread_flag(TIF_SINGLE_STEP). -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From schwidefsky at de.ibm.com Thu Jan 7 09:18:55 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Thu, 7 Jan 2010 10:18:55 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100106211329.DB4F5134D@magilla.sf.frob.com> References: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> Message-ID: <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> On Wed, 6 Jan 2010 13:13:29 -0800 (PST) Roland McGrath wrote: > > However, with or without CONFIG_UTRACE, 6580807da14c423f0d0a708108e6df6ebc8bc83d > > is needed on s390 too, otherwise the child gets unnecessary traps. > > This confuses me. user_disable_single_step on non-current doesn't do > anything not already done by the memset in copy_thread. Ooh, except > perhaps it does not clear PSW_MASK_PER. Maybe that matters. That's > the only thing I can think of. Maybe Martin can make sense of it. The additional traps should not happen anymore with this patch: -- Subject: [PATCH] clear TIF_SINGLE_STEP for new process. From: Martin Schwidefsky Clear the TIF_SINGLE_STEP bit in copy_thread. If the new process is not auto-attached by the tracer it is wrong to delivere SIGTRAP to the new process. Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/process.c | 1 + 1 file changed, 1 insertion(+) diff -urpN linux-2.6/arch/s390/kernel/process.c linux-2.6-patched/arch/s390/kernel/process.c --- linux-2.6/arch/s390/kernel/process.c 2009-12-03 04:51:21.000000000 +0100 +++ linux-2.6-patched/arch/s390/kernel/process.c 2010-01-07 09:25:53.000000000 +0100 @@ -217,6 +217,7 @@ int copy_thread(unsigned long clone_flag p->thread.mm_segment = get_fs(); /* Don't copy debug registers */ memset(&p->thread.per_info, 0, sizeof(p->thread.per_info)); + clear_tsk_thread_flag(p, TIF_SINGLE_STEP); /* Initialize per thread user and system timer values */ ti = task_thread_info(p); ti->user_timer = 0; -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From envoi at bdop89.info Thu Jan 7 12:30:53 2010 From: envoi at bdop89.info (SageCRM.com par Soft Direct) Date: Thu, 7 Jan 2010 14:30:53 +0200 Subject: =?UTF-8?Q?Le_CRM_=C3=A0_votre_mesure!?= Message-ID: <29c83ace355ba9115d5c610548b7e7be@om3.market-products.com> An HTML attachment was scrubbed... URL: From diagnosticodedrogas at gmail.com Thu Jan 7 14:29:14 2010 From: diagnosticodedrogas at gmail.com (TESTES DE ÁLCOOL E DROGAS) Date: Thu, 7 Jan 2010 12:29:14 -0200 Subject: Exames Toxicológicos: Baixo custo e resultado em 5 minutos Message-ID: <201001071429.o07ETBrA005084@mx2.redhat.com> An HTML attachment was scrubbed... URL: From oleg at redhat.com Thu Jan 7 17:54:46 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 7 Jan 2010 18:54:46 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> References: <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> Message-ID: <20100107175446.GA13300@redhat.com> Martin, sorry for delay, On 01/07, Martin Schwidefsky wrote: > > On Wed, 6 Jan 2010 13:13:29 -0800 (PST) > Roland McGrath wrote: > > > > However, with or without CONFIG_UTRACE, 6580807da14c423f0d0a708108e6df6ebc8bc83d > > > is needed on s390 too, otherwise the child gets unnecessary traps. > > > > This confuses me. user_disable_single_step on non-current doesn't do > > anything not already done by the memset in copy_thread. Ooh, except > > perhaps it does not clear PSW_MASK_PER. Maybe that matters. That's > > the only thing I can think of. Maybe Martin can make sense of it. I am confused as well. Yes, I thought about regs->psw.mask change too, but I don't understand why it helps.. > The additional traps should not happen anymore with this patch: > -- > Subject: [PATCH] clear TIF_SINGLE_STEP for new process. > > From: Martin Schwidefsky > > Clear the TIF_SINGLE_STEP bit in copy_thread. If the new process is > not auto-attached by the tracer it is wrong to delivere SIGTRAP to > the new process. > > Signed-off-by: Martin Schwidefsky > --- > > arch/s390/kernel/process.c | 1 + > 1 file changed, 1 insertion(+) > > diff -urpN linux-2.6/arch/s390/kernel/process.c linux-2.6-patched/arch/s390/kernel/process.c > --- linux-2.6/arch/s390/kernel/process.c 2009-12-03 04:51:21.000000000 +0100 > +++ linux-2.6-patched/arch/s390/kernel/process.c 2010-01-07 09:25:53.000000000 +0100 > @@ -217,6 +217,7 @@ int copy_thread(unsigned long clone_flag > p->thread.mm_segment = get_fs(); > /* Don't copy debug registers */ > memset(&p->thread.per_info, 0, sizeof(p->thread.per_info)); > + clear_tsk_thread_flag(p, TIF_SINGLE_STEP); Even if I don't understand s390, I think this patch makes sense anyway. Or, user_disable_single_step() can clear this bit. But. Acoording to the testing I did (unless I did something wrong again) this patch doesn't make any difference in this particular case. 6580807da14c423f0d0a708108e6df6ebc8bc83d does. And. Please note that the test-case triggers 799 "false step", but TIF_SINGLE_STEP is surely cleared (by the caller) after the first invocation of do_single_step(). Oleg. From oleg at redhat.com Thu Jan 7 18:11:37 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 7 Jan 2010 19:11:37 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100106210812.E03A1134D@magilla.sf.frob.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100106210812.E03A1134D@magilla.sf.frob.com> Message-ID: <20100107181137.GB13300@redhat.com> On 01/06, Roland McGrath wrote: > > > Oh, I am not sure. But I don't understand TIF_SINGLE_STEP on s390, > > absolutely. > > > > For example, why do_signal() sets TIF_SINGLE_STEP? Why can't we do > > I think we could. That would be more consistent with other machines. On > s390, once we set TIF_SINGLE_STEP, we are going to post a SIGTRAP > eventually before going to user mode. But then tracehook_signal_handler() > also gets stepping=1 and the expected meaning of this is that the arch code > is not itself simulating a single-step for the handler setup. So the > tracehook (i.e. ptrace/utrace) code does what it does for "need a fake > single-step". > > In ptrace (including utrace-based ptrace), this winds up with sending a > SIGTRAP. So when we finally do get out of do_signal and TIF_SINGLE_STEP > causes a second SIGTRAP, it's already pending and the second one makes no > difference. Confused again, perhaps I just misunderstood what you mean... Without utrace, tracehook_signal_handler() doesn't send SIGTRAP, it merely does ptrace_notify(SIGTRAP), this means that > But for the general case of utrace, we'll have the UTRACE_SIGNAL_HANDLER > report, followed by a SIGTRAP that appears to be an authentic single-step > trap, but takes place on the same instruction. If the resumption after the > UTRACE_SIGNAL_HANDLER report didn't use stepping, then this is an entirely > unexpected extra SIGTRAP. even without utrace we can have unexpected SIGTRAP. Oleg. From oleg at redhat.com Thu Jan 7 18:16:32 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 7 Jan 2010 19:16:32 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100106210812.E03A1134D@magilla.sf.frob.com> <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> Message-ID: <20100107181632.GC13300@redhat.com> On 01/07, Martin Schwidefsky wrote: > > On Wed, 6 Jan 2010 13:08:12 -0800 (PST) > Roland McGrath wrote: > > > That's what tracehook_signal_handler is for. You're both doing it yourself > > in the arch code (by setting TIF_SINGLE_STEP), and then telling the generic > > code to do it (by passing stepping=1 to tracehook_signal_handler). > > Ok, so with the full utrace the semantics of tracehook_signal_handler > is more than just causing a SIGTRAP. It is an indication for a signal > AND a SIGTRAP if single-stepping is active. To make both cases work we > should stop setting TIF_SINGLE_STEP in do_signal and pass > current->thread.per_info.single_step to tracehook_signal_handler > instead of test_thread_flag(TIF_SINGLE_STEP). Can't understand why do we need TIF_SINGLE_STEP at all. Just pass current->thread.per_info.single_step to tracehook_signal_handler() ? Oleg. --- a/arch/s390/kernel/signal.c +++ b/arch/s390/kernel/signal.c @@ -504,14 +504,8 @@ void do_signal(struct pt_regs *regs) * for a normal instruction, act like we took * one for the handler setup. */ - if (current->thread.per_info.single_step) - set_thread_flag(TIF_SINGLE_STEP); - - /* - * Let tracing know that we've done the handler setup. - */ tracehook_signal_handler(signr, &info, &ka, regs, - test_thread_flag(TIF_SINGLE_STEP)); + current->thread.per_info.single_step); } return; } From roland at redhat.com Thu Jan 7 21:32:33 2010 From: roland at redhat.com (Roland McGrath) Date: Thu, 7 Jan 2010 13:32:33 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Thursday, 7 January 2010 10:00:50 +0100 <20100107100050.31724463@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100106205633.700CC134D@magilla.sf.frob.com> <20100107100050.31724463@mschwide.boeblingen.de.ibm.com> Message-ID: <20100107213233.B49807300@magilla.sf.frob.com> > > Right. That means we should leave the status quo of not clearing > > TIF_SINGLE_STEP in user_disable_single_step. > > Ok, although it seems a bit strange not to do it. Perhaps I should add a > comment about it. It doesn't seem strange to me, but then I've just been through all this. user_*_step is about what the task will do next. TIF_SINGLE_STEP is about what the task has done recently. Of course more good comments always help. I might be inclined to change the name of TIF_SINGLE_STEP so that its true purpose is more obvious. AFAICT, in fact it is not even about single-step per se. It means some PER trap happened and should produce SIGTRAP. Don't you get the same thing if you haven't used single-step, but instead used PTRACE_POKEUSR to set up per_struct with bits that say to trigger for some other reason? How about calling it TIF_PER_PENDING? > So after everthing has been converted to utrace we always will load the > control registers in FixPerRegisters. Right. (This could well still change in the future. But that's how it is in utrace now. And regardless of possible future implementation changes it will always be the case that sometimes it will be called on current.) > We could use the same compare of the control registers as the code in > __switch_to. See below. Yes, sounds good. > The PSW_MASK_PER is the "global" PER enablement switch, the PER_EM_MASK > bits enable the different PER events. We check for the PER_EM_MASK bits > because it is easier to access in __switch_to. The return PSW is stored > in the pt_regs structure, we would have to get a pointer to it (what > "regs = task_pt_regs(taks)" does in FixPerRegisters). In > FixPerRegisters we can as well use the PSW_MASK_PER bit. I see. Thanks for the explanation. Thanks, Roland From roland at redhat.com Thu Jan 7 21:41:41 2010 From: roland at redhat.com (Roland McGrath) Date: Thu, 7 Jan 2010 13:41:41 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Thursday, 7 January 2010 10:16:19 +0100 <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100106210812.E03A1134D@magilla.sf.frob.com> <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> Message-ID: <20100107214141.7500B7300@magilla.sf.frob.com> > Hmm, command for tracehook_signal_handler say this for stepping: > @stepping: nonzero if debugger single-step or block-step in > use Are you saying you would like me to clarify that wording somehow? It's meant to be implicit that the arch code is not doing any special fakery about single-step for signal handlers, only processing real single-step traps (and faking them for a syscall instruction if the arch requires that). No other arch does it, so it didn't occur to me that s390 would. Before tracehook some had ptrace_notify calls there, and the call to tracehook_signal_handler replaced that call. > > In ptrace (including utrace-based ptrace), this winds up with sending a > > SIGTRAP. So when we finally do get out of do_signal and TIF_SINGLE_STEP > > causes a second SIGTRAP, it's already pending and the second one makes no > > difference. > > So we have been lucky so far. Actually, Oleg rightly points out: > Confused again, perhaps I just misunderstood what you mean... > > Without utrace, tracehook_signal_handler() doesn't send SIGTRAP, it > merely does ptrace_notify(SIGTRAP), this means that [...] > even without utrace we can have unexpected SIGTRAP. That is quite true, and I just misremembered when writing that paragraph. So indeed we have been lucky, but it's not the luck of the problem not happening on s390, but the luck of nobody ever caring. :-) > Ok, so with the full utrace the semantics of tracehook_signal_handler > is more than just causing a SIGTRAP. It is an indication for a signal > AND a SIGTRAP if single-stepping is active. In short, it is the indication of a signal handler having been set up, just like its kerneldoc description says. Whatever that should mean to tracing (SIGTRAP or otherwise) is in the purview of the generic tracing layer, not the arch layer. > To make both cases work we > should stop setting TIF_SINGLE_STEP in do_signal and pass > current->thread.per_info.single_step to tracehook_signal_handler > instead of test_thread_flag(TIF_SINGLE_STEP). Correct. Thanks, Roland From roland at redhat.com Thu Jan 7 21:44:29 2010 From: roland at redhat.com (Roland McGrath) Date: Thu, 7 Jan 2010 13:44:29 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Oleg Nesterov's message of Thursday, 7 January 2010 19:16:32 +0100 <20100107181632.GC13300@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100106210812.E03A1134D@magilla.sf.frob.com> <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> <20100107181632.GC13300@redhat.com> Message-ID: <20100107214429.6388A7300@magilla.sf.frob.com> > Can't understand why do we need TIF_SINGLE_STEP at all. I think you mean "why we need to set it in do_signal" here, not "why do we need it to exist at all". > Just pass current->thread.per_info.single_step to > tracehook_signal_handler() ? Yes. I believe this is what Martin meant, and it's what I meant to endorse. do_signal should not do anything with TIF_SINGLE_STEP at all. Its only purpose should be to communicate from the low-level trap assembly code up to the return-to-user assembly code so it calls do_single_step. Thanks, Roland From roland at redhat.com Thu Jan 7 21:46:42 2010 From: roland at redhat.com (Roland McGrath) Date: Thu, 7 Jan 2010 13:46:42 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Thursday, 7 January 2010 10:18:55 +0100 <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> References: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> Message-ID: <20100107214642.579F27300@magilla.sf.frob.com> > Clear the TIF_SINGLE_STEP bit in copy_thread. If the new process is > not auto-attached by the tracer it is wrong to delivere SIGTRAP to > the new process. The change is right, but this log entry is confusing. "auto-attached" has nothing to do with it, nor does anything about tracing the new process or not. The new process has not experienced a PER trap of its own, so it is wrong to deliver a SIGTRAP that is meant for its creator. Thanks, Roland From roland at redhat.com Thu Jan 7 21:48:21 2010 From: roland at redhat.com (Roland McGrath) Date: Thu, 7 Jan 2010 13:48:21 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Oleg Nesterov's message of Thursday, 7 January 2010 18:54:46 +0100 <20100107175446.GA13300@redhat.com> References: <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> <20100107175446.GA13300@redhat.com> Message-ID: <20100107214821.94FF97300@magilla.sf.frob.com> > I am confused as well. Yes, I thought about regs->psw.mask change too, > but I don't understand why it helps.. [...] > But. Acoording to the testing I did (unless I did something wrong > again) this patch doesn't make any difference in this particular > case. 6580807da14c423f0d0a708108e6df6ebc8bc83d does. Those results are quite mysterious to me. I think we'll have to get Martin to sort it out definitively. Thanks, Roland From mnqtu.kler at hotmail.com Fri Jan 8 05:53:36 2010 From: mnqtu.kler at hotmail.com (Zoe.Ao) Date: Fri, 8 Jan 2010 13:53:36 +0800 Subject: =?GB2312?B?0dC3osjL1LHE6tbVvKjQp7+8usvT67ykwPg=?= Message-ID: <201001080553.o085racH018196@mx1.redhat.com> -------------------------------------------------------------------------- ???????????????? ???????????????????????? -------------------------------------------------------------------------- ???????????? 2010??01??25-26?? ???? 2010??01??28-29?? ???? ?????????????????????????????????????????????????????????? ??????????????CEO/??????????????????/??????????????/??????????????????/???? ????????????????????????????????????????????????QA???????????????????????? ????????????????????3,200?? / 2????????????,???????????????????????????????? ??????????????????400-8899,628 ??????????????021-5109,9475 ??????????????020-3366,5697 / 3452,0981 ?? ????rdwork at 126.com -------------------------------------------------------------------------- ??-??-??-???? ?????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????????? 1???????????????????????????????????????????????????????????????? 2???????????????????????????????????????????????????????????????????? 3????????KPI??????????????????????KPI???????????????????????????? 4???????????????????????????????????????????????????? 5???????????????????????????????????????????????????????? 6?????????????????????????????????????????????????????? 7???????????????????????????????????????????????????????? 8???????????????????????????????????????????????????????? 9???????????????????????????????????????????????????????????????????????????? ???? ????????????4?????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????? -------------------------------------------------------------------------- ??-??-??-???? 1.???????????????????????????????????????????????????????????????????????? ?????????????????????? 2.?????????????????????????????????????????????????????????????????? 3.???????????????????????????????????????????????????????????????? 4.?????????????????????????????????????????????????? 5.??????????????????????????????????????KPI???????????????????????????? 6.????????????????????????????????????????PBC?? 7.?????????????????????????????????????????????????????? 8.??????????????PDCA???????????????????????????? 9.???????????????????????????????????????????????????????????????????????? ?????????? 10.???????????????????????????????????????????????????????????????????????? ????????Action Planicrosoft??????IBM?????????????????? 2)???????????????????????????????????????????? 3)???????????????????????????????????? 10.?????????? 1)???????????????????????????????????????? 2)?????????????????????????????? 11.?????????????? ????????????????????KPI???????? 1.????????KPI???????????????????? 2.????????????????????????????????????KPI???? 3.????????KPI?????????????? 4.????????KPI?????????? 1)???????????????? 2)???????????? 5.????????KPI??????????????????I??T??Q??C??S?? 6.??????????KPI?????? 1)????????KPI???? ???????????????????????????????????????????? 2)????????KPI????????????????????????????????????QA?????? 3)??????????????KPI????????????HR?????????????????????????? 7.????????KPI?????? 8.?????????????????? 1)???????????????????????????? 2)?????????????????????? 3)??????????????KPI???????????? 4)?????????????????????????????????????????????? 9.?????????????? 1)????????????????????KPI?????????????????????????????????????????????? 2)??????????KPI?????????????????????????D?D??????????????PCB 10.?????????? 1)????????????????????KPI?????? 11.?????????????? ?????????????????????? 1.?????????????????????????? 2.?????????????????????? 1)?????????????????? 2)?????????????????????????????????????????????????????? 3.?????????????????? 1)???????? 2)???????? 3)?????????????? 4.???????????????????????D?D????????????PBC 1)??????????WINNING?? 2)??????????EXECUTION?? 3)??????????TEAMWORK?? 5.????????????????PBC?????????????? 6.????????????????????????????????PBC 7.?????????????????????????????????? 1)???????????????????????? 2)???????????????????????????????????????????? 3)???????????????????????????????????????? 8.??????????????????????????PIP?? 9.?????????? 1)????????????????????????PBC???????? 2)?????????????????????????????????????????????????????????????????????????????? 10.?????????????? ????????????/?????????????? 1.???????????????????????? 1)????????????PAC 2)????????????PDT 3)????????FT 2.???????????????????????? 3.?????????????????????? 1)???????????????????????? 2)???????????? 3)???????????????? 4.?????????????????????? 5.?????????????????????????? 6.?????????????????????????? 7.?????????????????????????????????? 1)?????????? 2)?????????? 3)?????????? 4)?????????? 8.?????????? 1)???????????????????????????????????????????? 9.?????????????? ???????????????????????????? 1.??????????????????????????????????????????HR???????? 2.?????????????????????????????????????????? 3.???????????????????????????????????????????????????? 4.???????????? 1)?????????????????????? 2)???????????????????????????????????????????????????????????? 3)???????????????????????????? 4)?????????????????????????????????? 5.???????????????????????? 1)?????????????? 2)?????????????????????? 3)???????????????????????????????????????????????????????????????????? 6.?????????????????????? 1)?????????????? 2)?????????????????????????? 7.???????????????????????????????????????????????? 8.???????????????????????? 9.???????????????????????????????????????????? 1)???????? 2)???????? 3)???????????????? 10.?????????? 1)???????????????????????????????????????? 11.???????? ???????????????????????????????? 1.?????????????????????? 2.???????????????????????????????????????????? 3.?????????????????????? 4.?????????????? 1)??????/?????? 2)?????? 3)?????? 4)?????? 5)?????? 5.?????????????????? 6.?????????????????????????????????????? 7.?????????????????????????????????????????? 8.?????????? 1)?????????????????????????????????????????? 2)?????????????????????????????????????????????? 9.???????? -------------------------------------------------------------------------- ????????????????(Jayeferences: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> <20100107214642.579F27300@magilla.sf.frob.com> Message-ID: <20100108093025.28ec0907@mschwide.boeblingen.de.ibm.com> On Thu, 7 Jan 2010 13:46:42 -0800 (PST) Roland McGrath wrote: > > Clear the TIF_SINGLE_STEP bit in copy_thread. If the new process is > > not auto-attached by the tracer it is wrong to delivere SIGTRAP to > > the new process. > > The change is right, but this log entry is confusing. "auto-attached" has > nothing to do with it, nor does anything about tracing the new process or > not. The new process has not experienced a PER trap of its own, so it is > wrong to deliver a SIGTRAP that is meant for its creator. Ok, I changed the wording slightly: Clear the TIF_SINGLE_STEP bit in copy_thread. The new process did not get a PER event of its own. It is wrong deliver a SIGTRAP that was meant for the parent process. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From schwidefsky at de.ibm.com Fri Jan 8 08:34:52 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Fri, 8 Jan 2010 09:34:52 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100107181632.GC13300@redhat.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100106210812.E03A1134D@magilla.sf.frob.com> <20100107101619.0877cf67@mschwide.boeblingen.de.ibm.com> <20100107181632.GC13300@redhat.com> Message-ID: <20100108093452.15101939@mschwide.boeblingen.de.ibm.com> On Thu, 7 Jan 2010 19:16:32 +0100 Oleg Nesterov wrote: > On 01/07, Martin Schwidefsky wrote: > > > > On Wed, 6 Jan 2010 13:08:12 -0800 (PST) > > Roland McGrath wrote: > > > > > That's what tracehook_signal_handler is for. You're both doing it yourself > > > in the arch code (by setting TIF_SINGLE_STEP), and then telling the generic > > > code to do it (by passing stepping=1 to tracehook_signal_handler). > > > > Ok, so with the full utrace the semantics of tracehook_signal_handler > > is more than just causing a SIGTRAP. It is an indication for a signal > > AND a SIGTRAP if single-stepping is active. To make both cases work we > > should stop setting TIF_SINGLE_STEP in do_signal and pass > > current->thread.per_info.single_step to tracehook_signal_handler > > instead of test_thread_flag(TIF_SINGLE_STEP). > > Can't understand why do we need TIF_SINGLE_STEP at all. > > Just pass current->thread.per_info.single_step to > tracehook_signal_handler() ? > > Oleg. > > --- a/arch/s390/kernel/signal.c > +++ b/arch/s390/kernel/signal.c > @@ -504,14 +504,8 @@ void do_signal(struct pt_regs *regs) > * for a normal instruction, act like we took > * one for the handler setup. > */ > - if (current->thread.per_info.single_step) > - set_thread_flag(TIF_SINGLE_STEP); > - > - /* > - * Let tracing know that we've done the handler setup. > - */ > tracehook_signal_handler(signr, &info, &ka, regs, > - test_thread_flag(TIF_SINGLE_STEP)); > + current->thread.per_info.single_step); > } > return; > } > That is what I meant in the other mail. The patch on my local disk looks almost the same but it removes the comment prior to the TIF_SINGLE_STEP if statement as well. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From roland at redhat.com Fri Jan 8 10:25:39 2010 From: roland at redhat.com (Roland McGrath) Date: Fri, 8 Jan 2010 02:25:39 -0800 (PST) Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: Martin Schwidefsky's message of Friday, 8 January 2010 09:30:25 +0100 <20100108093025.28ec0907@mschwide.boeblingen.de.ibm.com> References: <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> <20100107214642.579F27300@magilla.sf.frob.com> <20100108093025.28ec0907@mschwide.boeblingen.de.ibm.com> Message-ID: <20100108102539.2E747105F6@magilla.sf.frob.com> > Ok, I changed the wording slightly: > > Clear the TIF_SINGLE_STEP bit in copy_thread. The new process did not get > a PER event of its own. It is wrong deliver a SIGTRAP that was meant for > the parent process. Very good! Thanks, Roland From bvi at gvbi.com Fri Jan 8 21:25:24 2010 From: bvi at gvbi.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sat, 9 Jan 2010 05:25:24 +0800 Subject: =?GB2312?B?Qzc6dXRyYWNlLWRldmVstNO8vMr119/P8rncwO0=?= Message-ID: <201001082125.o08LPCfi001571@mx1.redhat.com> utrace-devel??????? ?????2010?1?25?26? ?? ?????2010?1?28?29? ?? ?????3200????????1600?????????/???????????????????? ???????CEO/?????????/???????/???????????/??????????? ????????PMO???????????????????????? ????????????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comction Plan???? ????????????????? --------------------------------------------------------------------------------------------------- ???? ???????0.5? 1)???????????? ????????????????????1.5? 1)?????????????????? 2)?????????????? 3)?????????????? 4)?????????????????????????????????????????????????????????????????????????????? 5)???????????? 6)??????? 7)?????????????? 8??????????????????????? 9???????? ????????????????3.5? 1)???????? 2)????? 3)????????? 4)????????? 5)????????? 6)????????? 7)????????? ???????????????1.5? 1)????????????????????????? 2)????????????????????????????? 3)????????? 4)????????????? 5)??????? 6)????????????? 7)?????????????????????? 8)???????? 9)??????????? 10)?????????????? 11)??????????? 12)?????????? 13)????????????? 14)???????????????? 15)?????????????????????????????????????????? 16)????????? 17)???????????????????? 18)????????? 19)?????????????????????? ???????????????????????????1.0? 1)???????? 2)???????????? 3)????????????????????????? 4)????????????????? 5)???????????? 6)?????????SMART??????????????PBC?? 7)?????????????SMART 8)?????????SMART???????????SMART 9)???????PDCA?? 10)????????????????????????????????? 11)?????????? 12)??????????? 13)PERT??????GANNT 14)???????????PERT? 15)?????????????????????????????? 16)???????????? 17)????????? 18)???????????? 19)??????????????????? ?????????????????????????????2.0? 1)???????????? 2)??????????? 3)???????????? 4)???????????? 5)????????????????? 6)????????? 7)???????? 8)???????? 9)???????/???? 10)??????????? 11)????????????? 12)???????????? 13)????????????????????????? 14)????????????????????????????? 15)???????????????????????????????????????????????????????????????????????? 16)??????????????????????????????? 17)???? 30 ???????????????????????????????????????????????????????????? 18)?????????????????????????????????? 19)????????????????????????????????? 20)????????????????????????????? 21)??????????????????????? 22)??????????????????? ???????????????????????????1.5? 1)??????????? 2)?????????????? 3)????????? 4)?????????????????????? 5)???????????????????????? 6)?????????????????????? 7)??????????????????????????? 8)???????????????????????? 9)?????????????????????????? 10)?????????????????????? 11)????????????????????????? 12)??????????????????????PCB? 13)????????????????? 14)?????????????? 15)???????????? 16)??????????????????? 17)?????????? 18)??????????????????? 19)???????????????? 20)???????????? 21)??????????????????? 22)????????????????????????? 23)??????? ???????????????????????????2.0? 1)?????????? 2)???????????? 3)?????????????????????? 4)?????????????????? 5)???????? 6)???????????????? 7)???????????????? 8)???????????????????????? 9)????????????????? 10)???????????????????? 11)????? ???????????????????0.5? 1)????????? 2)??????? 3)????????????????? 4)??????????? 5)????????????????????? -------------------------------------------------------------------------------------------------------- ???? Gilesrom uose.poi7 at live.cn Sun Jan 10 04:05:24 2010 From: uose.poi7 at live.cn (Silvia.Ben) Date: Sun, 10 Jan 2010 12:05:24 +0800 Subject: =?GB2312?B?NrTzItXQyr0itPLU7CLIq8Tcs7W85CI=?= Message-ID: <201001100405.o0A45IWL027364@mx1.redhat.com> ================================================================================ ????????????? ================================================================================ ?.?.?.?.?? 2010?01?23-24? ?? ?.?.?.?.?.?.?.????????????? ?.?.?.?.?.?????????????? ?.?.?.?.?.?????????????? ???????????/???????????????????????? ================================================================================ ??????: ?????????????????????????????????????? ???????????????????????????????????????? ??????????????????????????. ?????????????????????????????????????? ???????????????????????????????????????? ???????????????! ???????????????????????? ???????????????????????????????????????? ???????????????????????????????????????? ???????????????????????????????????????? ???????????????????? ???????????? l???????????????? ??????????????????????? ??? 2??????????????????????????????? 3????????????????????? ?????????????????? ??? 4????????? ?????????????????????????????? ?????? 5??????????????????????? ???????????????? ?????????????????????????????????????? ?????????????????????????????????? ??????: l??????????????????????????????? 2??????????????????? 3????????????????????? 4???????????????????????????? 5??????????????TPM???????? ????? lrom rezone at wwfspecies.org Sun Jan 10 20:44:41 2010 From: rezone at wwfspecies.org (Vandyk) Date: Mon, 11 Jan 2010 05:44:41 +0900 Subject: " "Oh! take her, and welcome; I have no wish to keep her here. But you w Message-ID: <4B4A3842.3030602@wwfspecies.org> E moment Major Strickland's eyes rested on my face, on which the full light of the candle was now shining, his ruddy cheek paled; he started back in amazement, and was obliged to replace the candlestick on the table. "Great Heavens! what a marvellous resemblance!" he exclaimed. "It cannot arise from accident merely. There must be a hidden link somewhere." Then taking the candle for the second time, he scanned my face again with eyes that seemed to pierce me through and through. "It is as if one had come to me suddenly from the dead," I heard him say in a low voice. Then with down-bent head and folded arms he took several turns across the room. "Sir, of whom do I remind you?" I timidly asked. "Of someone, child, whom I knew when I was young--of someone who died long years before you were born." There was a ring of pathos in his voice that seemed like the echo of some sorrowful story. "Are you sure that you have no other name than Janet Hope?" he asked, presently. "None, sir, that I know of. I have been called Janet Hope ever since I can remember." "But about your parents? What were they called, and where did they live?" "I know nothing whatever about them except what Sister Agnes told me yesterday." "And she said--what?" "That my father was drowned abroad several years ago, and that my mother died a year later." "Poverina! But it is strange that Sister Agnes should have known your parents. Perhaps she can supply the missing link. The mention of her name reminds me that I have not yet sent word to Deepley Walls that you are safe and sound at Rose Cottage. Geordie must start without a moment's delay. I am an old friend of Lady Chillington, my dear, so that she will be quite satisfied when she learns that you are under my roof." "But, sir, when shall I see the gentleman who got me out of the water?" I asked. "What, Geordie? Oh, you'll s -------------- next part -------------- A non-text attachment was scrubbed... Name: lothian.jpg Type: image/jpeg Size: 14591 bytes Desc: not available URL: From initiation at dexp.nl Mon Jan 11 09:28:02 2010 From: initiation at dexp.nl (Reeger) Date: Mon, 11 Jan 2010 10:28:02 +0100 Subject: "Trirashmi" or Triple Sunbeam Message-ID: <4B4AEECF.4080309@dexp.nl> business beats me altogether. At the top of the street there is a native 'tamasha' with people singing and beating tom-toms; half-way down the street there are stone-throwing and firing, and at the bottom of the street there are people, Europeans and Natives, shopping!" He was struck, as I was, by the incongruity of the whole business. At Jacob's Circle there was a great display of military and magisterial strength. Tommy Atkins had taken up a strong position at the corner of Clerk Road; sentries paced up and down by day and night; machine guns gaped upon the fountain erected to the memory of Le Grand Jacob. At intervals a squadron of cavalry dashed into the open, halted for a space, and then as suddenly disappeared; and they were followed by motor cars and carriages containing Commissioners, Deputy Commissioners, Police Subordinates, Special Magistrates and miscellaneous European sightseers. All the pomp and circumstance of Law and Order were represented there, and there could scarcely have been a greater display of armed force, more secret consultations, more wild dashes hither and thither, more troubled parleying, if the entire City north of Jacob's Circle had been in flames. And yet behind it and around it the daily life of the people moved forward in its accustomed channel, The Bhandari's liquor-shop at the corner had its full complement of patrons, and the Bhandari himself might be seen pulling out handfuls of thirst-producing parched grain for those of his customers who desired a relish with their liquor; members of that degraded class which follows one of the immemorial vices of the East wandered round the Marwaris' shops, begging and clapping their hands in the manner peculiar to them; and across the diameter of the Circle strayed a group of Barots--those strange semi-gipsy looking men from Kathiawar who act as priests and magic -------------- next part -------------- A non-text attachment was scrubbed... Name: paramagnet.jpg Type: image/jpeg Size: 17598 bytes Desc: not available URL: From caiqian at redhat.com Mon Jan 11 09:59:25 2010 From: caiqian at redhat.com (CAI Qian) Date: Mon, 11 Jan 2010 04:59:25 -0500 (EST) Subject: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set ->ops = utrace_detached_ops lockless) In-Reply-To: <20091209181241.GA20475@redhat.com> Message-ID: <253423212.42121263203965061.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Hi Jan, Looks like the following patch from Oleg has not been checked in ptrace testsuite yet. Thanks, CAI Qian ----- "Oleg Nesterov" wrote: > On 12/09, Oleg Nesterov wrote: > > > > Cai, Ananth, thank you. > > > > So. I think we can forget about the possible kernel problems (and > > in any case we can rule out utrace). > > > > The test-case just wrong and should be fixed. The tracee can't execute > > the function descriptor in data section, that is why it gets SIGSEGV. > > > > > while the '.func_name' is the text address. > > > > tried to change the code to > > > > REGS_ACCESS (regs, nip) = (unsigned long) .raise_sigusr2 > > > > but gcc doesn't like this ;) > > > > > (See > > > handle_rt_signal64 in arch/powerpc/kernel/signal_64.c and > > > kprobe_lookup_name in arch/powerpc/include/asm/kprobes.h. > > > > Thanks... looking at handle_rt_signal64(), looks like we should > > also set regs->gpr[2] = funct_desc_ptr->toc if we change regs->nip > > > > > > I hope someone who understand powerpc could fix the test-case ;) > > Yes, I verified the patch below fixes step-jump-cont.c on > ibm-js20-02.lab.bos.redhat.com. > > Oleg. > > --- step-jump-cont.c~ 2009-12-09 12:17:04.367733643 -0500 > +++ step-jump-cont.c 2009-12-09 13:12:50.708535770 -0500 > @@ -153,12 +153,19 @@ raise_sigusr2 (void) > assert (0); > } > > +typedef struct { > + unsigned long entry; > + unsigned long toc; > + unsigned long env; > +} func_descr_t; > + > int main (void) > { > long l; > int status; > pid_t pid; > REGS_TYPE (regs); > + func_descr_t *fp; > > setbuf (stdout, NULL); > atexit (cleanup); > @@ -214,7 +221,12 @@ int main (void) > #elif defined __x86_64__ > REGS_ACCESS (regs, rip) = (unsigned long) raise_sigusr2; > #elif defined __powerpc__ > - REGS_ACCESS (regs, nip) = (unsigned long) raise_sigusr2; > + > + fp = (void*)raise_sigusr2; > + > + REGS_ACCESS(regs, nip) = fp->entry; > + REGS_ACCESS(regs, gpr[2]) = fp->toc; > + > #else > # error "Check outer #ifdef" > #endif From mldireto at tudoemoferta.com.br Mon Jan 11 10:07:11 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 11 Jan 2010 08:07:11 -0200 Subject: Super Ferias no TudoemOferta Message-ID: An HTML attachment was scrubbed... URL: From srikar at linux.vnet.ibm.com Mon Jan 11 12:25:29 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:55:29 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> User Space Breakpoint Assistance Layer (UBP) User space breakpointing Infrastructure provides kernel subsystems with architecture independent interface to establish breakpoints in user applications. This patch provides core implementation of ubp and also wrappers for architecture dependent methods. UBP currently supports both single stepping inline and execution out of line strategies. Two different probepoints in the same process can have two different strategies. You need to follow this up with the UBP patch for your architecture. Signed-off-by: Jim Keniston Signed-off-by: Srikar Dronamraju --- arch/Kconfig | 12 + include/linux/ubp.h | 282 ++++++++++++++++++++++++++++++ kernel/Makefile | 1 kernel/ubp_core.c | 479 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 774 insertions(+) Index: new_uprobes.git/arch/Kconfig =================================================================== --- new_uprobes.git.orig/arch/Kconfig +++ new_uprobes.git/arch/Kconfig @@ -57,6 +57,15 @@ config KPROBES for kernel debugging, non-intrusive instrumentation and testing. If in doubt, say "N". +config UBP + bool "User-space breakpoint assistance (EXPERIMENTAL)" + depends on MODULES + depends on HAVE_UBP + help + Ubp enables kernel subsystems to establish breakpoints + in user applications. This service is used by components + such as uprobes. If in doubt, say "N". + config HAVE_EFFICIENT_UNALIGNED_ACCESS bool help @@ -90,6 +99,9 @@ config USER_RETURN_NOTIFIER Provide a kernel-internal notification when a cpu is about to switch to user mode. +config HAVE_UBP + def_bool n + config HAVE_IOREMAP_PROT bool Index: new_uprobes.git/include/linux/ubp.h =================================================================== --- /dev/null +++ new_uprobes.git/include/linux/ubp.h @@ -0,0 +1,282 @@ +#ifndef _LINUX_UBP_H +#define _LINUX_UBP_H +/* + * User-space BreakPoint support (ubp) + * include/linux/ubp.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2008, 2009 + */ + +#include +struct task_struct; +struct pt_regs; + +/** + * Strategy hints: + * + * %UBP_HNT_INLINE: Specifies that the instruction must + * be single-stepped inline. Can be set by the caller of + * @arch->analyze_insn() -- e.g., if caller is out of XOL slots -- + * or by @arch->analyze_insn() if there's no viable XOL strategy + * for that instruction. Set in arch->strategies if the architecture + * doesn't implement XOL. + * + * %UBP_HNT_PERMSL: Specifies that the instruction slot whose + * address is @ubp->xol_vaddr is assigned to @ubp for the life of + * the process. Can be used by @arch->analyze_insn() to simplify + * XOL in some cases. Ignored in @arch->strategies. + * + * %UBP_HNT_TSKINFO: Set in @arch->strategies if the architecture's + * XOL handling requires the preservation of special + * task-specific info between the calls to @arch->pre_xol() + * and @arch->post_xol(). (E.g., XOL of x86_64 rip-relative + * instructions uses a scratch register, whose value is saved + * by pre_xol() and restored by post_xol().) The caller + * of @arch->analyze_insn() should set %UBP_HNT_TSKINFO in + * @ubp->strategy if it's set in @arch->strategies and the caller + * can maintain a @ubp_task_arch_info object for each probed task. + * @arch->analyze_insn() should leave this flag set in @ubp->strategy + * if it needs to use the per-task @ubp_task_arch_info object. + */ +#define UBP_HNT_INLINE 0x1 /* Single-step this insn inline. */ +#define UBP_HNT_TSKINFO 0x2 /* XOL requires ubp_task_arch_info */ +#define UBP_HNT_PERMSL 0x4 /* XOL slot assignment is permanent */ + +#define UBP_HNT_MASK 0x7 + +/** + * struct ubp_bkpt - user-space breakpoint/probepoint + * + * @vaddr: virtual address of probepoint + * @xol_vaddr: virtual address of XOL slot assigned to this probepoint + * @opcode: copy of opcode at @vaddr + * @insn: typically a copy of the instruction at @vaddr. More + * precisely, this is the instruction (stream) that will be + * executed in place of the original instruction. + * @strategy: hints about how this instruction will be executed + * @fixups: set of fixups to be executed by @arch->post_xol() + * @arch_info: architecture-specific info about this probepoint + */ +struct ubp_bkpt { + unsigned long vaddr; + unsigned long xol_vaddr; + ubp_opcode_t opcode; + u8 insn[UBP_XOL_SLOT_BYTES]; + u16 strategy; + u16 fixups; + struct ubp_bkpt_arch_info arch_info; +}; + +/* Post-execution fixups. Some architectures may define others. */ +#define UPB_FIX_NONE 0x0 /* No fixup needed */ +#define UBP_FIX_IP 0x1 /* Adjust IP back to vicinity of actual insn */ +#define UBP_FIX_CALL 0x2 /* Adjust the return address of a call insn */ + +#ifndef UPB_FIX_DEFAULT +#define UPB_FIX_DEFAULT UBP_FIX_IP +#endif + +#if defined(CONFIG_UBP) +extern int ubp_init(u16 *strategies); +extern int ubp_insert_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp); +extern unsigned long ubp_get_bkpt_addr(struct pt_regs *regs); +extern int ubp_pre_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs); +extern int ubp_post_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs); +extern int ubp_cancel_xol(struct task_struct *tsk, struct ubp_bkpt *ubp); +extern int ubp_remove_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp); +extern int ubp_validate_insn_addr(struct task_struct *tsk, + unsigned long vaddr); +extern void ubp_set_ip(struct pt_regs *regs, unsigned long vaddr); +#else /* CONFIG_UBP */ +static inline int ubp_init(u16 *strategies) +{ + return -ENOSYS; +} +static inline int ubp_insert_bkpt(struct task_struct *tsk, + struct ubp_bkpt *ubp) +{ + return -ENOSYS; +} +static inline unsigned long ubp_get_bkpt_addr(struct pt_regs *regs) +{ + return -ENOSYS; +} +static inline int ubp_pre_sstep(struct task_struct *tsk, + struct ubp_bkpt *ubp, struct ubp_task_arch_info *tskinfo, + struct pt_regs *regs) +{ + return -ENOSYS; +} +static inline int ubp_post_sstep(struct task_struct *tsk, + struct ubp_bkpt *ubp, struct ubp_task_arch_info *tskinfo, + struct pt_regs *regs) +{ + return -ENOSYS; +} +static inline int ubp_cancel_xol(struct task_struct *tsk, + struct ubp_bkpt *ubp) +{ + return -ENOSYS; +} +static inline int ubp_remove_bkpt(struct task_struct *tsk, + struct ubp_bkpt *ubp) +{ + return -ENOSYS; +} +static inline int ubp_validate_insn_addr(struct task_struct *tsk, + unsigned long vaddr) +{ + return -ENOSYS; +} +static inline void ubp_set_ip(struct pt_regs *regs, unsigned long vaddr) +{ +} +#endif /* CONFIG_UBP */ + +#ifdef UBP_IMPLEMENTATION +/** + * struct ubp_arch_info - architecture-specific parameters and functions + * + * Most architectures can use the default versions of @read_opcode(), + * @set_bkpt(), @set_orig_insn(), and @is_bkpt_insn(); ia64 is an + * exception. All functions (including @validate_address()) can assume + * that the caller has verified that the probepoint's virtual address + * resides in an executable VM area. + * + * @bkpt_insn: + * The architecture's breakpoint instruction. This is used by + * the default versions of @set_bkpt(), @set_orig_insn(), and + * @is_bkpt_insn(). + * @ip_advancement_by_bkpt_insn: + * The number of bytes the instruction pointer is advanced by + * this architecture's breakpoint instruction. For example, after + * the powerpc trap instruction executes, the ip still points to the + * breakpoint instruction (ip_advancement_by_bkpt_insn = 0); but the + * x86 int3 instruction (1 byte) advances the ip past the int3 + * (ip_advancement_by_bkpt_insn = 1). + * @max_insn_bytes: + * The maximum length, in bytes, of an instruction in this + * architecture. This must be <= UBP_XOL_SLOT_BYTES; + * @strategies: + * Bit-map of %UBP_HNT_* values recognized by this architecture. + * Include %UBP_HNT_INLINE iff this architecture doesn't support + * execution out of line. Include %UBP_HNT_TSKINFO if + * XOL of at least some instructions requires communication of + * per-task state between @pre_xol() and @post_xol(). + * @set_ip: + * Set the instruction pointer in @regs to @vaddr. + * @validate_address: + * Return 0 if @vaddr is a valid instruction address, or a negative + * errno (typically -%EINVAL) otherwise. If you don't provide + * @validate_address(), any address will be accepted. Caller + * guarantees that @vaddr is in an executable VM area. This + * function typically just enforces arch-specific instruction + * alignment. + * @read_opcode: + * For task @tsk, read the opcode at @vaddr and store it in + * @opcode. Return 0 (success) or a negative errno. Defaults to + * @ubp_read_opcode(). + * @set_bkpt: + * For task @tsk, store @bkpt_insn at @ubp->vaddr. Return 0 + * (success) or a negative errno. Defaults to @ubp_set_bkpt(). + * @set_orig_insn: + * For task @tsk, restore the original opcode (@ubp->opcode) at + * @ubp->vaddr. If @check is true, first verify that there's + * actually a breakpoint instruction there. Return 0 (success) or + * a negative errno. Defaults to @ubp_set_orig_insn(). + * @is_bkpt_insn: + * Return %true if @ubp->opcode is @bkpt_insn. Defaults to + * @ubp_is_bkpt_insn(), which just tests (ubp->opcode == + * arch->bkpt_insn). + * @analyze_insn: + * Analyze @ubp->insn. Return 0 if @ubp->insn is an instruction + * you can probe, or a negative errno (typically -%EPERM) + * otherwise. The caller sets @ubp->strategy to %UBP_HNT_INLINE + * to suppress XOL for this instruction (e.g., because we're + * out of XOL slots). If the instruction can be probed but + * can't be executed out of line, set @ubp->strategy to + * %UBP_HNT_INLINE. Otherwise, determine what sort of XOL-related + * fixups @post_xol() (and possibly @pre_xol()) will need + * to do for this instruction, and annotate @ubp accordingly. + * You may modify @ubp->insn (e.g., the x86_64 port does this + * for rip-relative instructions), but if you do so, you should + * retain a copy in @ubp->arch_info in case you have to revert + * to single-stepping inline (see @cancel_xol()). + * @pre_xol: + * Called just before executing the instruction associated + * with @ubp out of line. @ubp->xol_vaddr is the address in + * @tsk's virtual address space where @ubp->insn has been copied. + * @pre_xol() should at least set the instruction pointer in + * @regs to @ubp->xol_vaddr -- which is what the default, + * @ubp_pre_xol(), does. If @ubp->strategy includes the + * %UBP_HNT_TSKINFO flag, then @tskinfo points to a per-task + * copy of struct ubp_task_arch_info. + * @post_xol: + * Called after executing the instruction associated with + * @ubp out of line. @post_xol() should perform the fixups + * specified in @ubp->fixups, which includes ensuring that the + * instruction pointer in @regs points at the next instruction in + * the probed instruction stream. @tskinfo is as for @pre_xol(). + * You must provide this function. + * @cancel_xol: + * The instruction associated with @ubp cannot be executed + * out of line after all. (This can happen when XOL slots + * are lazily assigned, and we run out of slots before we + * hit this breakpoint. This function should never be called + * if @analyze_insn() was previously called for @ubp with a + * non-zero value of @ubp->xol_vaddr and with %UBP_HNT_PERMSL + * set in @ubp->strategy.) Adjust @ubp as needed so it can be + * single-stepped inline. Omit this function if you don't need it. + */ + +struct ubp_arch_info { + ubp_opcode_t bkpt_insn; + u8 ip_advancement_by_bkpt_insn; + u8 max_insn_bytes; + u16 strategies; + void (*set_ip)(struct pt_regs *regs, unsigned long vaddr); + int (*validate_address)(struct task_struct *tsk, unsigned long vaddr); + int (*read_opcode)(struct task_struct *tsk, unsigned long vaddr, + ubp_opcode_t *opcode); + int (*set_bkpt)(struct task_struct *tsk, struct ubp_bkpt *ubp); + int (*set_orig_insn)(struct task_struct *tsk, + struct ubp_bkpt *ubp, bool check); + bool (*is_bkpt_insn)(struct ubp_bkpt *ubp); + int (*analyze_insn)(struct task_struct *tsk, struct ubp_bkpt *ubp); + int (*pre_xol)(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, + struct pt_regs *regs); + int (*post_xol)(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, + struct pt_regs *regs); + void (*cancel_xol)(struct task_struct *tsk, struct ubp_bkpt *ubp); +}; + +/* Unexported functions & macros for use by arch-specific code */ +#define ubp_opcode_sz ((unsigned int)(sizeof(ubp_opcode_t))) +extern int ubp_read_vm(struct task_struct *tsk, unsigned long vaddr, + void *kbuf, int nbytes); +extern int ubp_write_data(struct task_struct *tsk, unsigned long vaddr, + const void *kbuf, int nbytes); + +extern struct ubp_arch_info ubp_arch_info; + +#endif /* UBP_IMPLEMENTATION */ + +#endif /* _LINUX_UBP_H */ Index: new_uprobes.git/kernel/Makefile =================================================================== --- new_uprobes.git.orig/kernel/Makefile +++ new_uprobes.git/kernel/Makefile @@ -102,6 +102,7 @@ obj-$(CONFIG_SLOW_WORK_DEBUG) += slow-wo obj-$(CONFIG_PERF_EVENTS) += perf_event.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o +obj-$(CONFIG_UBP) += ubp_core.o ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) # According to Alan Modra , the -fno-omit-frame-pointer is Index: new_uprobes.git/kernel/ubp_core.c =================================================================== --- /dev/null +++ new_uprobes.git/kernel/ubp_core.c @@ -0,0 +1,479 @@ +/* + * User-space BreakPoint support (ubp) + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2008, 2009 + */ + +#define UBP_IMPLEMENTATION 1 + +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * TODO: Resolve verbosity. ubp_insert_bkpt() is the only function + * that reports failures via printk. + */ + +static struct ubp_arch_info *arch = &ubp_arch_info; + +static bool ubp_uses_xol(u16 strategy) +{ + return !(strategy & UBP_HNT_INLINE); +} + +static bool validate_strategy(u16 strategy, u16 valid_bits) +{ + return ((strategy & (~valid_bits)) == 0); +} + +/** + * ubp_init - initialize the ubp data structures + * @strategies indicates which breakpoint-related strategies are + * supported by the client: + * %UBP_HNT_INLINE: Client supports only single-stepping inline. + * Otherwise client must provide an instruction slot + * (UBP_XOL_SLOT_BYTES bytes) in the probed process's address + * space for each instruction to be executed out of line. + * %UBP_HNT_TSKINFO: Client can provide and maintain one + * @ubp_task_arch_info object for each probed task. (Failure to + * support this will prevent XOL of rip-relative instructions on + * x86_64, at least.) + * Upon return, @strategies is updated to reflect those strategies + * required by this particular architecture's implementation of ubp: + * %UBP_HNT_INLINE: Architecture or client supports only + * single-stepping inline. + * %UBP_HNT_TSKINFO: Architecture uses @ubp_task_arch_info, and will + * expect it to be passed to @ubp_pre_sstep() and @ubp_post_sstep() + * as needed (see @ubp_insert_bkpt()). + * Possible errors: + * -%ENOSYS: ubp not supported for this architecture. + * -%EINVAL: unrecognized flags in @strategies + */ +int ubp_init(u16 *strategies) +{ + u16 inline_bit, tskinfo_bit; + u16 client_strategies = *strategies; + + if (!validate_strategy(client_strategies, + UBP_HNT_INLINE | UBP_HNT_TSKINFO)) + return -EINVAL; + + inline_bit = (client_strategies | arch->strategies) & UBP_HNT_INLINE; + tskinfo_bit = (client_strategies & arch->strategies) & UBP_HNT_TSKINFO; + *strategies = (inline_bit | tskinfo_bit); + return 0; +} + +/* + * Read @nbytes at @vaddr from @tsk into @kbuf. Return number of bytes read. + * Not exported, but available for use by arch-specific ubp code. + */ +int ubp_read_vm(struct task_struct *tsk, unsigned long vaddr, + void *kbuf, int nbytes) +{ + if (tsk == current) { + int nleft = copy_from_user(kbuf, (void __user *) vaddr, + nbytes); + return nbytes - nleft; + } else + return access_process_vm(tsk, vaddr, kbuf, nbytes, 0); +} + +/* + * Write @nbytes from @kbuf at @vaddr in @tsk. Return number of bytes written. + * Can be used to write to stack or data VM areas, but not instructions. + * Not exported, but available for use by arch-specific ubp code. + */ +int ubp_write_data(struct task_struct *tsk, unsigned long vaddr, + const void *kbuf, int nbytes) +{ + int nleft; + + if (tsk == current) { + nleft = copy_to_user((void __user *) vaddr, kbuf, nbytes); + return nbytes - nleft; + } else + return access_process_vm(tsk, vaddr, (void *) kbuf, + nbytes, 1); +} + +static int ubp_write_opcode(struct task_struct *tsk, unsigned long vaddr, + ubp_opcode_t opcode) +{ + int result; + + result = access_process_vm(tsk, vaddr, &opcode, ubp_opcode_sz, 1); + return (result == ubp_opcode_sz ? 0 : -EFAULT); +} + +/* Default implementation of arch->read_opcode */ +static int ubp_read_opcode(struct task_struct *tsk, unsigned long vaddr, + ubp_opcode_t *opcode) +{ + int bytes_read; + + bytes_read = ubp_read_vm(tsk, vaddr, opcode, ubp_opcode_sz); + return (bytes_read == ubp_opcode_sz ? 0 : -EFAULT); +} + +/* Default implementation of arch->set_bkpt */ +static int ubp_set_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp) +{ + return ubp_write_opcode(tsk, ubp->vaddr, arch->bkpt_insn); +} + +/* Default implementation of arch->set_orig_insn */ +static int ubp_set_orig_insn(struct task_struct *tsk, struct ubp_bkpt *ubp, + bool check) +{ + if (check) { + ubp_opcode_t opcode; + int result = arch->read_opcode(tsk, ubp->vaddr, &opcode); + if (result) + return result; + if (opcode != arch->bkpt_insn) + return -EINVAL; + } + return ubp_write_opcode(tsk, ubp->vaddr, ubp->opcode); +} + +/* Return 0 if vaddr is in an executable VM area, or -EINVAL otherwise. */ +static inline int ubp_check_vma(struct task_struct *tsk, unsigned long vaddr) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + int ret = -EINVAL; + + mm = get_task_mm(tsk); + if (!mm) + return -EINVAL; + down_read(&mm->mmap_sem); + vma = find_vma(mm, vaddr); + if (vma && vaddr >= vma->vm_start && (vma->vm_flags & VM_EXEC)) + ret = 0; + up_read(&mm->mmap_sem); + mmput(mm); + return ret; +} + +/** + * ubp_validate_insn_addr - Validate if the instruction is an + * executable vma. + * Returns 0 if the vaddr is a valid instruction address. + * @tsk: the probed task + * @vaddr: virtual address of the instruction to be verified. + * + * Possible errors: + * -%EINVAL: Instruction passed is not a valid instruction address. + */ +int ubp_validate_insn_addr(struct task_struct *tsk, unsigned long vaddr) +{ + int result; + + result = ubp_check_vma(tsk, vaddr); + if (result != 0) + return result; + if (arch->validate_address) + result = arch->validate_address(tsk, vaddr); + return result; +} + +static void ubp_bkpt_insertion_failed(struct task_struct *tsk, + struct ubp_bkpt *ubp, const char *why) +{ + printk(KERN_ERR "Can't place breakpoint at pid %d vaddr %#lx: %s\n", + tsk->pid, ubp->vaddr, why); +} + +/** + * ubp_insert_bkpt - insert breakpoint + * Insert a breakpoint into the process that includes @tsk, at the + * virtual address @ubp->vaddr. + * + * @ubp->strategy affects how this breakpoint will be handled: + * %UBP_HNT_INLINE: Probed instruction will be single-stepped inline. + * %UBP_HNT_TSKINFO: As above. + * %UBP_HNT_PERMSL: An XOL instruction slot in the probed process's + * address space has been allocated to this probepoint, and will + * remain so allocated as long as it's needed. @ubp->xol_vaddr is + * its address. (This slot can be reallocated if + * @ubp_insert_bkpt() fails.) The client is NOT required to + * allocate an instruction slot before calling @ubp_insert_bkpt(). + * @ubp_insert_bkpt() updates @ubp->strategy as needed: + * %UBP_HNT_INLINE: Architecture or client cannot do XOL for this + * probepoint. + * %UBP_HNT_TSKINFO: @ubp_task_arch_info will be used for this + * probepoint. + * + * All threads of the probed process must be stopped while + * @ubp_insert_bkpt() runs. + * + * Possible errors: + * -%ENOSYS: ubp not supported for this architecture + * -%EINVAL: unrecognized/invalid strategy flags + * -%EINVAL: invalid instruction address + * -%EEXIST: breakpoint instruction already exists at that address + * -%EPERM: cannot probe this instruction + * -%EFAULT: failed to insert breakpoint instruction + * [TBD: Validate xol_vaddr?] + */ +int ubp_insert_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp) +{ + int result, len; + + BUG_ON(!tsk || !ubp); + if (!validate_strategy(ubp->strategy, UBP_HNT_MASK)) + return -EINVAL; + + result = ubp_validate_insn_addr(tsk, ubp->vaddr); + if (result != 0) + return result; + + /* + * If ubp_read_vm() transfers fewer bytes than the maximum + * instruction size, assume that the probed instruction is smaller + * than the max and near the end of the last page of instructions. + * But there must be room at least for a breakpoint-size instruction. + */ + len = ubp_read_vm(tsk, ubp->vaddr, ubp->insn, arch->max_insn_bytes); + if (len < ubp_opcode_sz) { + ubp_bkpt_insertion_failed(tsk, ubp, + "error reading original instruction"); + return -EFAULT; + } + memcpy(&ubp->opcode, ubp->insn, ubp_opcode_sz); + if (arch->is_bkpt_insn(ubp)) { + ubp_bkpt_insertion_failed(tsk, ubp, + "bkpt already exists at that addr"); + return -EEXIST; + } + + result = arch->analyze_insn(tsk, ubp); + if (result < 0) { + ubp_bkpt_insertion_failed(tsk, ubp, + "instruction type cannot be probed"); + return result; + } + + result = arch->set_bkpt(tsk, ubp); + if (result < 0) { + ubp_bkpt_insertion_failed(tsk, ubp, + "failed to insert bkpt instruction"); + return result; + } + return 0; +} + +/** + * ubp_pre_sstep - prepare to single-step the probed instruction + * @tsk: the probed task + * @ubp: the probepoint information, as returned by @ubp_insert_bkpt(). + * Unless the %UBP_HNT_INLINE flag is set in @ubp->strategy, + * @ubp->xol_vaddr must be the address of an XOL instruction slot + * that is allocated to this probepoint at least until after the + * completion of @ubp_post_sstep(), and populated with the contents + * of @ubp->insn. [Need to be more precise here to account for + * untimely exit or UBP_HNT_BOOSTED.] + * @tskinfo: points to a @ubp_task_arch_info object for @tsk, if + * the %UBP_HNT_TSKINFO flag is set in @ubp->strategy. + * @regs: reflects the saved user state of @tsk. @ubp_pre_sstep() + * adjusts this. In particular, the instruction pointer is set + * to the instruction to be single-stepped. + * Possible errors: + * -%EFAULT: Failed to read or write @tsk's address space as needed. + * + * The client must ensure that the contents of @ubp are not + * changed during the single-step operation -- i.e., between when + * @ubp_pre_sstep() is called and when @ubp_post_sstep() returns. + * Additionally, if single-stepping inline is used for this probepoint, + * the client must serialize the single-step operation (so multiple + * threads don't step on each other while the opcode replacement is + * taking place). + */ +int ubp_pre_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs) +{ + int result; + + BUG_ON(!tsk || !ubp || !regs); + if (ubp_uses_xol(ubp->strategy)) { + BUG_ON(!ubp->xol_vaddr); + return arch->pre_xol(tsk, ubp, tskinfo, regs); + } + + /* + * Single-step this instruction inline. Replace the breakpoint + * with the original opcode. + */ + result = arch->set_orig_insn(tsk, ubp, false); + if (result == 0) + arch->set_ip(regs, ubp->vaddr); + return result; +} + +/** + * ubp_post_sstep - prepare to resume execution after single-step + * @tsk: the probed task + * @ubp: the probepoint information, as with @ubp_pre_sstep() + * @tskinfo: the @ubp_task_arch_info object, if any, passed to + * @ubp_pre_sstep() + * @regs: reflects the saved state of @tsk after the single-step + * operation. @ubp_post_sstep() adjusts @tsk's state as needed, + * including pointing the instruction pointer at the instruction + * following the probed instruction. + * Possible errors: + * -%EFAULT: Failed to read or write @tsk's address space as needed. + */ +int ubp_post_sstep(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs) +{ + BUG_ON(!tsk || !ubp || !regs); + if (ubp_uses_xol(ubp->strategy)) + return arch->post_xol(tsk, ubp, tskinfo, regs); + + /* + * Single-stepped this instruction inline. Put the breakpoint + * instruction back. + */ + return arch->set_bkpt(tsk, ubp); +} + +/** + * ubp_cancel_xol - cancel XOL for this probepoint + * @tsk: a task in the probed process + * @ubp: the probepoint information + * Switch @ubp's single-stepping strategy from out-of-line to inline. + * If the client employs lazy XOL-slot allocation, it can call + * this function if it determines that it can't provide an XOL + * slot for @ubp. @ubp_cancel_xol() adjusts @ubp appropriately. + * + * @ubp_cancel_xol()'s behavior is undefined if @ubp_pre_sstep() has + * already been called for @ubp. + * + * Possible errors: + * Can't think of any yet. + */ +int ubp_cancel_xol(struct task_struct *tsk, struct ubp_bkpt *ubp) +{ + if (arch->cancel_xol) + arch->cancel_xol(tsk, ubp); + ubp->strategy |= UBP_HNT_INLINE; + return 0; +} + +/** + * ubp_get_bkpt_addr - compute address of bkpt given post-bkpt regs + * @regs: Reflects the saved state of the task after it has hit a breakpoint + * instruction. Return the address of the breakpoint instruction. + */ +unsigned long ubp_get_bkpt_addr(struct pt_regs *regs) +{ + return instruction_pointer(regs) - arch->ip_advancement_by_bkpt_insn; +} + +/** + * ubp_remove_bkpt - remove breakpoint + * For the process that includes @tsk, remove the breakpoint specified + * by @ubp, restoring the original opcode. + * + * Possible errors: + * -%EINVAL: @ubp->vaddr is not a valid instruction address. + * -%ENOENT: There is no breakpoint instruction at @ubp->vaddr. + * -%EFAULT: Failed to read/write @tsk's address space as needed. + */ +int ubp_remove_bkpt(struct task_struct *tsk, struct ubp_bkpt *ubp) +{ + if (ubp_validate_insn_addr(tsk, ubp->vaddr) != 0) + return -EINVAL; + return arch->set_orig_insn(tsk, ubp, true); +} + +void ubp_set_ip(struct pt_regs *regs, unsigned long vaddr) +{ + arch->set_ip(regs, vaddr); +} + +/* Default implementation of arch->is_bkpt_insn */ +static bool ubp_is_bkpt_insn(struct ubp_bkpt *ubp) +{ + return (ubp->opcode == arch->bkpt_insn); +} + +/* Default implementation of arch->pre_xol */ +static int ubp_pre_xol(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs) +{ + arch->set_ip(regs, ubp->xol_vaddr); + return 0; +} + +/* Validate arch-specific info during ubp initialization. */ + +static int ubp_bad_arch_param(const char *param_name, int value) +{ + printk(KERN_ERR "ubp: bad value %d/%#x for parameter %s" + " in ubp_arch_info\n", value, value, param_name); + return -ENOSYS; +} + +static int ubp_missing_arch_func(const char *func_name) +{ + printk(KERN_ERR "ubp: ubp_arch_info lacks required function: %s\n", + func_name); + return -ENOSYS; +} + +static int __init init_ubp(void) +{ + int result = 0; + + /* Accept any value of bkpt_insn. */ + if (arch->max_insn_bytes < 1) + result = ubp_bad_arch_param("max_insn_bytes", + arch->max_insn_bytes); + if (arch->ip_advancement_by_bkpt_insn > arch->max_insn_bytes) + result = ubp_bad_arch_param("ip_advancement_by_bkpt_insn", + arch->ip_advancement_by_bkpt_insn); + /* Accept any value of strategies. */ + if (!arch->set_ip) + result = ubp_missing_arch_func("set_ip"); + /* Null validate_address() is OK. */ + if (!arch->read_opcode) + arch->read_opcode = ubp_read_opcode; + if (!arch->set_bkpt) + arch->set_bkpt = ubp_set_bkpt; + if (!arch->set_orig_insn) + arch->set_orig_insn = ubp_set_orig_insn; + if (!arch->is_bkpt_insn) + arch->is_bkpt_insn = ubp_is_bkpt_insn; + if (!arch->analyze_insn) + result = ubp_missing_arch_func("analyze_insn"); + if (!arch->pre_xol) + arch->pre_xol = ubp_pre_xol; + if (ubp_uses_xol(arch->strategies) && !arch->post_xol) + result = ubp_missing_arch_func("post_xol"); + /* Null cancel_xol() is OK. */ + return result; +} + +module_init(init_ubp); From srikar at linux.vnet.ibm.com Mon Jan 11 12:25:37 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:55:37 +0530 Subject: [RFC] [PATCH 2/7] x86 support for UBP In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122537.22050.43529.sendpatchset@srikar.in.ibm.com> x86 support for user breakpoint Infrastructure This patch provides x86 specific userspace breakpoint assistance implementation details. This patch requires "x86: instruction decoder API" patch. http://lkml.org/lkml/2009/6/1/459 Signed-off-by: Jim Keniston Signed-off-by: Srikar Dronamraju --- arch/x86/Kconfig | 1 arch/x86/include/asm/ubp.h | 40 +++ arch/x86/kernel/Makefile | 2 arch/x86/kernel/ubp_x86.c | 577 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 620 insertions(+) Index: new_uprobes.git/arch/x86/Kconfig =================================================================== --- new_uprobes.git.orig/arch/x86/Kconfig +++ new_uprobes.git/arch/x86/Kconfig @@ -50,6 +50,7 @@ config X86 select HAVE_KERNEL_BZIP2 select HAVE_KERNEL_LZMA select HAVE_HW_BREAKPOINT + select HAVE_UBP select HAVE_ARCH_KMEMCHECK select HAVE_USER_RETURN_NOTIFIER Index: new_uprobes.git/arch/x86/include/asm/ubp.h =================================================================== --- /dev/null +++ new_uprobes.git/arch/x86/include/asm/ubp.h @@ -0,0 +1,40 @@ +#ifndef _ASM_UBP_H +#define _ASM_UBP_H +/* + * User-space BreakPoint support (ubp) for x86 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2008, 2009 + */ + +typedef u8 ubp_opcode_t; +#define MAX_UINSN_BYTES 16 +#define UBP_XOL_SLOT_BYTES (MAX_UINSN_BYTES) + +#ifdef CONFIG_X86_64 +struct ubp_bkpt_arch_info { + unsigned long rip_target_address; + u8 orig_insn[MAX_UINSN_BYTES]; +}; +struct ubp_task_arch_info { + unsigned long saved_scratch_register; +}; +#else +struct ubp_bkpt_arch_info {}; +struct ubp_task_arch_info {}; +#endif + +#endif /* _ASM_UBP_H */ Index: new_uprobes.git/arch/x86/kernel/Makefile =================================================================== --- new_uprobes.git.orig/arch/x86/kernel/Makefile +++ new_uprobes.git/arch/x86/kernel/Makefile @@ -116,6 +116,8 @@ obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o +obj-$(CONFIG_UBP) += ubp_x86.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) Index: new_uprobes.git/arch/x86/kernel/ubp_x86.c =================================================================== --- /dev/null +++ new_uprobes.git/arch/x86/kernel/ubp_x86.c @@ -0,0 +1,577 @@ +/* + * User-space BreakPoint support (ubp) for x86 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2008, 2009 + */ + +#define UBP_IMPLEMENTATION 1 + +#include +#include +#include +#include +#include + +#ifdef CONFIG_X86_32 +#define is_32bit_app(tsk) 1 +#else +#define is_32bit_app(tsk) (test_tsk_thread_flag(tsk, TIF_IA32)) +#endif + +#define UBP_FIX_RIP_AX 0x8000 +#define UBP_FIX_RIP_CX 0x4000 + +/* Adaptations for mhiramat x86 decoder v14. */ +#define OPCODE1(insn) ((insn)->opcode.bytes[0]) +#define OPCODE2(insn) ((insn)->opcode.bytes[1]) +#define OPCODE3(insn) ((insn)->opcode.bytes[2]) +#define MODRM_REG(insn) X86_MODRM_REG(insn->modrm.value) + +static void set_ip(struct pt_regs *regs, unsigned long vaddr) +{ + regs->ip = vaddr; +} + +#ifdef CONFIG_X86_64 +static bool is_riprel_insn(struct ubp_bkpt *ubp) +{ + return ((ubp->fixups & (UBP_FIX_RIP_AX | UBP_FIX_RIP_CX)) != 0); +} + +static void cancel_xol(struct task_struct *tsk, struct ubp_bkpt *ubp) +{ + if (is_riprel_insn(ubp)) { + /* + * We rewrote ubp->insn to use indirect addressing rather + * than rip-relative addressing for XOL. For + * single-stepping inline, put back the original instruction. + */ + memcpy(ubp->insn, ubp->arch_info.orig_insn, MAX_UINSN_BYTES); + ubp->strategy &= ~UBP_HNT_TSKINFO; + } +} +#endif /* CONFIG_X86_64 */ + +#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\ + (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \ + (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \ + (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) | \ + (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf)) \ + << (row % 32)) + +static const u32 good_insns_64[256 / 32] = { + /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ + /* ---------------------------------------------- */ + W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 00 */ + W(0x10, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 10 */ + W(0x20, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 20 */ + W(0x30, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 30 */ + W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ + W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */ + W(0x60, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */ + W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */ + W(0x80, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ + W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */ + W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */ + W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */ + W(0xc0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */ + W(0xd0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ + W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */ + W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */ + /* ---------------------------------------------- */ + /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ +}; + +/* Good-instruction tables for 32-bit apps -- copied from i386 uprobes */ + +static const u32 good_insns_32[256 / 32] = { + /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ + /* ---------------------------------------------- */ + W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 00 */ + W(0x10, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 10 */ + W(0x20, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* 20 */ + W(0x30, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) , /* 30 */ + W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */ + W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */ + W(0x60, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */ + W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */ + W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ + W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */ + W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */ + W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */ + W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */ + W(0xd0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ + W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */ + W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */ + /* ---------------------------------------------- */ + /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ +}; + +/* Using this for both 64-bit and 32-bit apps */ +static const u32 good_2byte_insns[256 / 32] = { + /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ + /* ---------------------------------------------- */ + W(0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1) | /* 00 */ + W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1) , /* 10 */ + W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */ + W(0x30, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 30 */ + W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */ + W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */ + W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 60 */ + W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) , /* 70 */ + W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ + W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */ + W(0xa0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* a0 */ + W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1) , /* b0 */ + W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* c0 */ + W(0xd0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ + W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* e0 */ + W(0xf0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* f0 */ + /* ---------------------------------------------- */ + /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ +}; + +/* + * opcodes we'll probably never support: + * 6c-6d, e4-e5, ec-ed - in + * 6e-6f, e6-e7, ee-ef - out + * cc, cd - int3, int + * cf - iret + * d6 - illegal instruction + * f1 - int1/icebp + * f4 - hlt + * fa, fb - cli, sti + * 0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2 + * + * invalid opcodes in 64-bit mode: + * 06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5 + * + * 63 - we support this opcode in x86_64 but not in i386. + * + * opcodes we may need to refine support for: + * 0f - 2-byte instructions: For many of these instructions, the validity + * depends on the prefix and/or the reg field. On such instructions, we + * just consider the opcode combination valid if it corresponds to any + * valid instruction. + * 8f - Group 1 - only reg = 0 is OK + * c6-c7 - Group 11 - only reg = 0 is OK + * d9-df - fpu insns with some illegal encodings + * f2, f3 - repnz, repz prefixes. These are also the first byte for + * certain floating-point instructions, such as addsd. + * fe - Group 4 - only reg = 0 or 1 is OK + * ff - Group 5 - only reg = 0-6 is OK + * + * others -- Do we need to support these? + * 0f - (floating-point?) prefetch instructions + * 07, 17, 1f - pop es, pop ss, pop ds + * 26, 2e, 36, 3e - es:, cs:, ss:, ds: segment prefixes -- + * but 64 and 65 (fs: and gs:) seem to be used, so we support them + * 67 - addr16 prefix + * ce - into + * f0 - lock prefix + */ + +/* + * TODO: + * - Where necessary, examine the modrm byte and allow only valid instructions + * in the different Groups and fpu instructions. + */ + +static bool is_prefix_bad(struct insn *insn) +{ + int i; + + for (i = 0; i < insn->prefixes.nbytes; i++) { + switch (insn->prefixes.bytes[i]) { + case 0x26: /*INAT_PFX_ES */ + case 0x2E: /*INAT_PFX_CS */ + case 0x36: /*INAT_PFX_DS */ + case 0x3E: /*INAT_PFX_SS */ + case 0xF0: /*INAT_PFX_LOCK */ + return 1; + } + } + return 0; +} + +static void report_bad_prefix(void) +{ + printk(KERN_ERR "ubp does not currently support probing " + "instructions with any of the following prefixes: " + "cs:, ds:, es:, ss:, lock:\n"); +} + +static void report_bad_1byte_opcode(int mode, ubp_opcode_t op) +{ + printk(KERN_ERR "In %d-bit apps, " + "ubp does not currently support probing " + "instructions whose first byte is 0x%2.2x\n", mode, op); +} + +static void report_bad_2byte_opcode(ubp_opcode_t op) +{ + printk(KERN_ERR "ubp does not currently support probing " + "instructions with the 2-byte opcode 0x0f 0x%2.2x\n", op); +} + +static int validate_insn_32bits(struct ubp_bkpt *ubp, struct insn *insn) +{ + insn_init(insn, ubp->insn, false); + + /* Skip good instruction prefixes; reject "bad" ones. */ + insn_get_opcode(insn); + if (is_prefix_bad(insn)) { + report_bad_prefix(); + return -EPERM; + } + if (test_bit(OPCODE1(insn), (unsigned long *) good_insns_32)) + return 0; + if (insn->opcode.nbytes == 2) { + if (test_bit(OPCODE2(insn), + (unsigned long *) good_2byte_insns)) + return 0; + report_bad_2byte_opcode(OPCODE2(insn)); + } else + report_bad_1byte_opcode(32, OPCODE1(insn)); + return -EPERM; +} + +static int validate_insn_64bits(struct ubp_bkpt *ubp, struct insn *insn) +{ + insn_init(insn, ubp->insn, true); + + /* Skip good instruction prefixes; reject "bad" ones. */ + insn_get_opcode(insn); + if (is_prefix_bad(insn)) { + report_bad_prefix(); + return -EPERM; + } + if (test_bit(OPCODE1(insn), (unsigned long *) good_insns_64)) + return 0; + if (insn->opcode.nbytes == 2) { + if (test_bit(OPCODE2(insn), + (unsigned long *) good_2byte_insns)) + return 0; + report_bad_2byte_opcode(OPCODE2(insn)); + } else + report_bad_1byte_opcode(64, OPCODE1(insn)); + return -EPERM; +} + +/* + * Figure out which fixups post_xol() will need to perform, and annotate + * ubp->fixups accordingly. To start with, ubp->fixups is either zero or + * it reflects rip-related fixups. + */ +static void prepare_fixups(struct ubp_bkpt *ubp, struct insn *insn) +{ + bool fix_ip = true, fix_call = false; /* defaults */ + insn_get_opcode(insn); /* should be a nop */ + + switch (OPCODE1(insn)) { + case 0xc3: /* ret/lret */ + case 0xcb: + case 0xc2: + case 0xca: + /* ip is correct */ + fix_ip = false; + break; + case 0xe8: /* call relative - Fix return addr */ + fix_call = true; + break; + case 0x9a: /* call absolute - Fix return addr, not ip */ + fix_call = true; + fix_ip = false; + break; + case 0xff: + { + int reg; + insn_get_modrm(insn); + reg = MODRM_REG(insn); + if (reg == 2 || reg == 3) { + /* call or lcall, indirect */ + /* Fix return addr; ip is correct. */ + fix_call = true; + fix_ip = false; + } else if (reg == 4 || reg == 5) { + /* jmp or ljmp, indirect */ + /* ip is correct. */ + fix_ip = false; + } + break; + } + case 0xea: /* jmp absolute -- ip is correct */ + fix_ip = false; + break; + default: + break; + } + if (fix_ip) + ubp->fixups |= UBP_FIX_IP; + if (fix_call) + ubp->fixups |= UBP_FIX_CALL; +} + +#ifdef CONFIG_X86_64 +static int handle_riprel_insn(struct ubp_bkpt *ubp, struct insn *insn); +#endif + +static int analyze_insn(struct task_struct *tsk, struct ubp_bkpt *ubp) +{ + int ret; + struct insn insn; + + ubp->fixups = 0; +#ifdef CONFIG_X86_64 + ubp->arch_info.rip_target_address = 0x0; +#endif + + if (is_32bit_app(tsk)) { + ret = validate_insn_32bits(ubp, &insn); + if (ret != 0) + return ret; + } else { + ret = validate_insn_64bits(ubp, &insn); + if (ret != 0) + return ret; + } + if (ubp->strategy & UBP_HNT_INLINE) + return 0; +#ifdef CONFIG_X86_64 + ret = handle_riprel_insn(ubp, &insn); + if (ret == -1) + /* rip-relative; can't XOL */ + return 0; + else if (ret == 0) + /* not rip-relative */ + ubp->strategy &= ~UBP_HNT_TSKINFO; +#endif + prepare_fixups(ubp, &insn); + return 0; +} + +#ifdef CONFIG_X86_64 +/* + * If ubp->insn doesn't use rip-relative addressing, return 0. Otherwise, + * rewrite the instruction so that it accesses its memory operand + * indirectly through a scratch register. Set ubp->fixups and + * ubp->arch_info.rip_target_address accordingly. (The contents of the + * scratch register will be saved before we single-step the modified + * instruction, and restored afterward.) Return 1. + * + * (... except if the client doesn't support our UBP_HNT_TSKINFO strategy, + * we must suppress XOL for rip-relative instructions: return -1.) + * + * We do this because a rip-relative instruction can access only a + * relatively small area (+/- 2 GB from the instruction), and the XOL + * area typically lies beyond that area. At least for instructions + * that store to memory, we can't execute the original instruction + * and "fix things up" later, because the misdirected store could be + * disastrous. + * + * Some useful facts about rip-relative instructions: + * - There's always a modrm byte. + * - There's never a SIB byte. + * - The displacement is always 4 bytes. + */ +static int handle_riprel_insn(struct ubp_bkpt *ubp, struct insn *insn) +{ + u8 *cursor; + u8 reg; + + if (!insn_rip_relative(insn)) + return 0; + + /* + * We have a rip-relative instruction. To allow this instruction + * to be single-stepped out of line, the client must provide us + * with a per-task ubp_task_arch_info object. + */ + if (!(ubp->strategy & UBP_HNT_TSKINFO)) { + ubp->strategy |= UBP_HNT_INLINE; + return -1; + } + memcpy(ubp->arch_info.orig_insn, ubp->insn, MAX_UINSN_BYTES); + + /* + * Point cursor at the modrm byte. The next 4 bytes are the + * displacement. Beyond the displacement, for some instructions, + * is the immediate operand. + */ + cursor = ubp->insn + insn->prefixes.nbytes + insn->rex_prefix.nbytes + + insn->opcode.nbytes; + insn_get_length(insn); + + /* + * Convert from rip-relative addressing to indirect addressing + * via a scratch register. Change the r/m field from 0x5 (%rip) + * to 0x0 (%rax) or 0x1 (%rcx), and squeeze out the offset field. + */ + reg = MODRM_REG(insn); + if (reg == 0) { + /* + * The register operand (if any) is either the A register + * (%rax, %eax, etc.) or (if the 0x4 bit is set in the + * REX prefix) %r8. In any case, we know the C register + * is NOT the register operand, so we use %rcx (register + * #1) for the scratch register. + */ + ubp->fixups = UBP_FIX_RIP_CX; + /* Change modrm from 00 000 101 to 00 000 001. */ + *cursor = 0x1; + } else { + /* Use %rax (register #0) for the scratch register. */ + ubp->fixups = UBP_FIX_RIP_AX; + /* Change modrm from 00 xxx 101 to 00 xxx 000 */ + *cursor = (reg << 3); + } + + /* Target address = address of next instruction + (signed) offset */ + ubp->arch_info.rip_target_address = (long) ubp->vaddr + + insn->length + insn->displacement.value; + /* Displacement field is gone; slide immediate field (if any) over. */ + if (insn->immediate.nbytes) { + cursor++; + memmove(cursor, cursor + insn->displacement.nbytes, + insn->immediate.nbytes); + } + return 1; +} + +/* + * If we're emulating a rip-relative instruction, save the contents + * of the scratch register and store the target address in that register. + */ +static int pre_xol(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs) +{ + BUG_ON(!ubp->xol_vaddr); + regs->ip = ubp->xol_vaddr; + if (ubp->fixups & UBP_FIX_RIP_AX) { + tskinfo->saved_scratch_register = regs->ax; + regs->ax = ubp->arch_info.rip_target_address; + } else if (ubp->fixups & UBP_FIX_RIP_CX) { + tskinfo->saved_scratch_register = regs->cx; + regs->cx = ubp->arch_info.rip_target_address; + } + return 0; +} +#endif + +/* + * Called by post_xol() to adjust the return address pushed by a call + * instruction executed out of line. + */ +static int adjust_ret_addr(struct task_struct *tsk, unsigned long sp, + long correction) +{ + int rasize, ncopied; + long ra = 0; + + if (is_32bit_app(tsk)) + rasize = 4; + else + rasize = 8; + ncopied = ubp_read_vm(tsk, sp, &ra, rasize); + if (unlikely(ncopied != rasize)) + goto fail; + ra += correction; + ncopied = ubp_write_data(tsk, sp, &ra, rasize); + if (unlikely(ncopied != rasize)) + goto fail; + return 0; + +fail: + printk(KERN_ERR + "ubp: Failed to adjust return address after" + " single-stepping call instruction;" + " pid=%d, sp=%#lx\n", tsk->pid, sp); + return -EFAULT; +} + +/* + * Called after single-stepping. ubp->vaddr is the address of the + * instruction whose first byte has been replaced by the "int3" + * instruction. To avoid the SMP problems that can occur when we + * temporarily put back the original opcode to single-step, we + * single-stepped a copy of the instruction. The address of this + * copy is ubp->xol_vaddr. + * + * This function prepares to resume execution after the single-step. + * We have to fix things up as follows: + * + * Typically, the new ip is relative to the copied instruction. We need + * to make it relative to the original instruction (FIX_IP). Exceptions + * are return instructions and absolute or indirect jump or call instructions. + * + * If the single-stepped instruction was a call, the return address that + * is atop the stack is the address following the copied instruction. We + * need to make it the address following the original instruction (FIX_CALL). + * + * If the original instruction was a rip-relative instruction such as + * "movl %edx,0xnnnn(%rip)", we have instead executed an equivalent + * instruction using a scratch register -- e.g., "movl %edx,(%rax)". + * We need to restore the contents of the scratch register and adjust + * the ip, keeping in mind that the instruction we executed is 4 bytes + * shorter than the original instruction (since we squeezed out the offset + * field). (FIX_RIP_AX or FIX_RIP_CX) + */ +static int post_xol(struct task_struct *tsk, struct ubp_bkpt *ubp, + struct ubp_task_arch_info *tskinfo, struct pt_regs *regs) +{ + /* Typically, the XOL vma is at a high addr, so correction < 0. */ + long correction = (long) (ubp->vaddr - ubp->xol_vaddr); + int result = 0; + +#ifdef CONFIG_X86_64 + if (is_riprel_insn(ubp)) { + if (ubp->fixups & UBP_FIX_RIP_AX) + regs->ax = tskinfo->saved_scratch_register; + else + regs->cx = tskinfo->saved_scratch_register; + /* + * The original instruction includes a displacement, and so + * is 4 bytes longer than what we've just single-stepped. + * Fall through to handle stuff like "jmpq *...(%rip)" and + * "callq *...(%rip)". + */ + correction += 4; + } +#endif + if (ubp->fixups & UBP_FIX_IP) + regs->ip += correction; + if (ubp->fixups & UBP_FIX_CALL) + result = adjust_ret_addr(tsk, regs->sp, correction); + return result; +} + +struct ubp_arch_info ubp_arch_info = { + .bkpt_insn = 0xcc, + .ip_advancement_by_bkpt_insn = 1, + .max_insn_bytes = MAX_UINSN_BYTES, +#ifdef CONFIG_X86_32 + .strategies = 0x0, +#else + /* rip-relative instructions require special handling. */ + .strategies = UBP_HNT_TSKINFO, + .pre_xol = pre_xol, + .cancel_xol = cancel_xol, +#endif + .set_ip = set_ip, + .analyze_insn = analyze_insn, + .post_xol = post_xol, +}; From srikar at linux.vnet.ibm.com Mon Jan 11 12:25:45 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:55:45 +0530 Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> Execution out of line (XOL) Slot allocation mechanism for Execution Out of Line strategy in User space breakpointing Inftrastructure. (XOL) This patch provides slot allocation mechanism for execution out of line strategy for use with user space breakpoint infrastructure. This patch requires utrace support in kernel. This patch provides five functions xol_get_insn_slot(), xol_free_insn_slot(), xol_put_area(), xol_get_area() and xol_validate_vaddr(). Current slot allocation mechanism: 1. Allocate one dedicated slot per user breakpoint. 2. If the allocated vma is completely used, expand current vma. 3. If we cant expand the vma, allocate a new vma. Signed-off-by: Jim Keniston Signed-off-by: Srikar Dronamraju --- arch/Kconfig | 4 include/linux/ubp_xol.h | 56 ++++ kernel/Makefile | 1 kernel/ubp_xol.c | 644 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 705 insertions(+) Index: new_uprobes.git/arch/Kconfig =================================================================== --- new_uprobes.git.orig/arch/Kconfig +++ new_uprobes.git/arch/Kconfig @@ -102,6 +102,10 @@ config USER_RETURN_NOTIFIER config HAVE_UBP def_bool n +config UBP_XOL + def_bool y + depends on UBP && UTRACE + config HAVE_IOREMAP_PROT bool Index: new_uprobes.git/include/linux/ubp_xol.h =================================================================== --- /dev/null +++ new_uprobes.git/include/linux/ubp_xol.h @@ -0,0 +1,56 @@ +#ifndef _LINUX_XOL_H +#define _LINUX_XOL_H +/* + * User-space BreakPoint support (ubp) -- Allocation of instruction + * slots for execution out of line (XOL) + * include/linux/ubp_xol.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2009 + */ + + +#if defined(CONFIG_UBP_XOL) +extern unsigned long xol_get_insn_slot(struct ubp_bkpt *ubp, void *xol_area); +extern void xol_free_insn_slot(unsigned long, void *xol_area); +extern int xol_validate_vaddr(struct pid *pid, unsigned long vaddr, + void *xol_area); +extern void *xol_get_area(struct pid *pid); +extern void xol_put_area(void *xol_area); +#else /* CONFIG_UBP_XOL */ +static inline unsigned long xol_get_insn_slot(struct ubp_bkpt *ubp, + void *xol_area) +{ + return 0; +} +static inline void xol_free_insn_slot(unsigned long slot_addr, void *xol_area) +{ +} +static inline int xol_validate_vaddr(struct pid *pid, unsigned long vaddr, + void *xol_area) +{ + return -ENOSYS; +} +static inline void *xol_get_area(struct pid *pid) +{ + return NULL; +} +static inline void xol_put_area(void *xol_area) +{ +} +#endif /* CONFIG_UBP_XOL */ + +#endif /* _LINUX_XOL_H */ Index: new_uprobes.git/kernel/Makefile =================================================================== --- new_uprobes.git.orig/kernel/Makefile +++ new_uprobes.git/kernel/Makefile @@ -103,6 +103,7 @@ obj-$(CONFIG_PERF_EVENTS) += perf_event. obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o obj-$(CONFIG_UBP) += ubp_core.o +obj-$(CONFIG_UBP_XOL) += ubp_xol.o ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) # According to Alan Modra , the -fno-omit-frame-pointer is Index: new_uprobes.git/kernel/ubp_xol.c =================================================================== --- /dev/null +++ new_uprobes.git/kernel/ubp_xol.c @@ -0,0 +1,644 @@ +/* + * User-space BreakPoint support (ubp) -- Allocation of instruction + * slots for execution out of line (XOL) + * kernel/ubp_xol.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2009 + */ + +/* + * Every probepoint gets its own slot. Once it's assigned a slot, it + * keeps that slot until the probepoint goes away. If we run out of + * slots in the XOL vma, we try to expand it by one page. If we can't + * expand it, we allocate an additional vma. Only the probed process + * itself can add or expand vmas. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define UINSNS_PER_PAGE (PAGE_SIZE/UBP_XOL_SLOT_BYTES) + +struct ubp_xol_vma { + struct list_head list; + unsigned long *bitmap; /* 0 = free slot */ + + /* + * We keep the vma's vm_start rather than a pointer to the vma + * itself. The probed process or a naughty kernel module could make + * the vma go away, and we must handle that reasonably gracefully. + */ + unsigned long vaddr; /* Page(s) of instruction slots */ + int npages; + int nslots; +}; + +struct ubp_xol_area { + struct list_head vmas; + struct mutex mutex; /* Serializes access to list of vmas */ + + /* + * We ref-count threads and clients. The xol_report_* callbacks + * are all about noticing when the last thread goes away. + */ + struct kref kref; + struct ubp_xol_vma *last_vma; + pid_t tgid; + bool can_expand; +}; + +static const struct utrace_engine_ops xol_engine_ops; +static void xol_free_area(struct kref *kref); + +/* + * xol_mutex allows creation of unique ubp_xol_area. + * Critical region for xol_mutex includes creation and initialization + * of ubp_xol_area and attaching an exclusive engine with + * xol_engine_ops for the thread whose pid is thread group id. + */ +static DEFINE_MUTEX(xol_mutex); + +/** + * xol_put_area - release a reference to ubp_xol_area. + * If this happens to be the last reference, free the ubp_xol_area. + * @xol_area: unique per process ubp_xol_area for this process. + */ +void xol_put_area(void *xol_area) +{ + struct ubp_xol_area *area = (struct ubp_xol_area *) xol_area; + + if (unlikely(!area)) + return; + kref_put(&area->kref, xol_free_area); +} + +/* + * Need unique ubp_xol_area. This is achieved by using utrace engines. + * However code using utrace could be avoided if mm_struct / + * mm_context_t had a pointer to ubp_xol_area. + */ + +/* + * xol_create_engine - add a thread to watch + * xol_create_engine can return these values: + * 0: successfully created an engine. + * -EEXIST: don't bother because an engine already exists for this + * thread. + * -ESRCH: Process or thread is exiting; don't need to create an + * engine. + * -ENOMEM: utrace can't allocate memory for the engine + * + * This function is called holding a reference to pid. + */ +static int xol_create_engine(struct pid *pid, struct ubp_xol_area *area) +{ + struct utrace_engine *engine; + int result; + + engine = utrace_attach_pid(pid, UTRACE_ATTACH_CREATE | + UTRACE_ATTACH_EXCLUSIVE | UTRACE_ATTACH_MATCH_OPS, + &xol_engine_ops, area); + if (IS_ERR(engine)) { + put_pid(pid); + return PTR_ERR(engine); + } + result = utrace_set_events_pid(pid, engine, + UTRACE_EVENT(EXEC) | UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT)); + /* + * Since this is the first and only time we set events for this + * engine, there shouldn't be any callbacks in progress. + */ + WARN_ON(result == -EINPROGRESS); + kref_get(&area->kref); + put_pid(pid); + utrace_engine_put(engine); + return 0; +} + +/* + * If a thread clones while xol_get_area() is running, it's possible + * for xol_create_engine() to be called both from there and from + * here. No problem, since xol_create_engine() refuses to create (or + * ref-count) a second engine for the same task. + */ +static u32 xol_report_clone(u32 action, + struct utrace_engine *engine, + unsigned long clone_flags, + struct task_struct *child) +{ + if (clone_flags & CLONE_THREAD) { + struct pid *child_pid = get_pid(task_pid(child)); + + BUG_ON(!child_pid); + (void)xol_create_engine(child_pid, + (struct ubp_xol_area *) engine->data); + } + return UTRACE_RESUME; +} + +/* + * When a multithreaded app execs, the exec-ing thread reports the + * exec, and the other threads report exit. + */ +static u32 xol_report_exec(u32 action, + struct utrace_engine *engine, + const struct linux_binfmt *fmt, + const struct linux_binprm *bprm, + struct pt_regs *regs) +{ + xol_put_area((struct ubp_xol_area *)engine->data); + return UTRACE_DETACH; +} + +static u32 xol_report_exit(u32 action, struct utrace_engine *engine, + long orig_code, long *code) +{ + xol_put_area((struct ubp_xol_area *)engine->data); + return UTRACE_DETACH; +} + +static const struct utrace_engine_ops xol_engine_ops = { + .report_exit = xol_report_exit, + .report_clone = xol_report_clone, + .report_exec = xol_report_exec +}; + +/* + * @start_pid is the pid for a thread in the traced process. + * Creating engines for a hugely multithreaded process can be + * time consuming. Hence engines for other threads are created + * outside the critical region. + */ +static void create_engine_sibling_threads(struct pid *start_pid, + struct ubp_xol_area *area) +{ + struct task_struct *t, *start; + struct utrace_engine *engine; + struct pid *pid = NULL; + + rcu_read_lock(); + start = pid_task(start_pid, PIDTYPE_PID); + t = start; + if (t) { + do { + if (t->exit_state) { + t = next_thread(t); + continue; + } + + /* + * This doesn't sleep, does minimal error checking. + */ + engine = utrace_attach_task(t, + UTRACE_ATTACH_MATCH_OPS, + &xol_engine_ops, NULL); + if (PTR_ERR(engine) == -ENOENT) { + pid = get_pid(task_pid(t)); + (void)xol_create_engine(pid, area); + } else if (!IS_ERR(engine)) + utrace_engine_put(engine); + + t = next_thread(t); + } while (t != start); + } + rcu_read_unlock(); +} + +/** + * xol_get_area - Get a reference to process's ubp_xol_area. + * If an ubp_xol_area doesn't exist for @tg_leader's process, create + * one. In any case, increment its refcount and return a pointer + * to it. + * @tg_leader: pointer to struct pid of a thread whose tid is the + * thread group id + */ +void *xol_get_area(struct pid *tg_leader) +{ + struct ubp_xol_area *area = NULL; + struct utrace_engine *engine; + struct pid *pid; + int ret; + + pid = get_pid(tg_leader); + mutex_lock(&xol_mutex); + engine = utrace_attach_pid(tg_leader, UTRACE_ATTACH_MATCH_OPS, + &xol_engine_ops, NULL); + if (!IS_ERR(engine)) { + area = engine->data; + utrace_engine_put(engine); + mutex_unlock(&xol_mutex); + goto found_area; + } + + area = kzalloc(sizeof(*area), GFP_USER); + if (unlikely(!area)) { + mutex_unlock(&xol_mutex); + return NULL; + } + mutex_init(&area->mutex); + kref_init(&area->kref); + area->last_vma = NULL; + area->can_expand = true; + area->tgid = pid_task(tg_leader, PIDTYPE_PID)->tgid; + INIT_LIST_HEAD(&area->vmas); + ret = xol_create_engine(pid, area); + mutex_unlock(&xol_mutex); + + if (ret != 0) { + kfree(area); + return NULL; + } + create_engine_sibling_threads(pid, area); + +found_area: + if (likely(area)) + kref_get(&area->kref); + return (void *) area; +} + +static void xol_free_area(struct kref *kref) +{ + struct ubp_xol_vma *usv, *tmp; + struct ubp_xol_area *area; + + area = container_of(kref, struct ubp_xol_area, kref); + list_for_each_entry_safe(usv, tmp, &area->vmas, list) { + kfree(usv->bitmap); + kfree(usv); + } + kfree(area); +} + +/* + * Allocate a bitmap for a new vma, or expand an existing bitmap. + * if old_bitmap is non-NULL, xol_realloc_bitmap() never returns + * old_bitmap. + */ +static unsigned long *xol_realloc_bitmap(unsigned long *old_bitmap, + int old_nslots, int new_nslots) +{ + unsigned long *new_bitmap; + + BUG_ON(new_nslots < old_nslots); + + new_bitmap = kzalloc(BITS_TO_LONGS(new_nslots) * sizeof(long), + GFP_USER); + if (!new_bitmap) { + printk(KERN_ERR "ubp_xol: cannot %sallocate bitmap for XOL " + "area for pid/tgid %d/%d\n", (old_bitmap ? "re" : ""), + current->pid, current->tgid); + return NULL; + } + if (old_bitmap) + memcpy(new_bitmap, old_bitmap, + BITS_TO_LONGS(old_nslots) * sizeof(long)); + return new_bitmap; +} + +static struct ubp_xol_vma *xol_alloc_vma(void) +{ + struct ubp_xol_vma *usv; + + usv = kzalloc(sizeof(struct ubp_xol_vma), GFP_USER); + if (!usv) { + printk(KERN_ERR "ubp_xol: cannot allocate kmem for XOL vma" + " for pid/tgid %d/%d\n", current->pid, current->tgid); + return NULL; + } + usv->bitmap = xol_realloc_bitmap(NULL, 0, UINSNS_PER_PAGE); + if (!usv->bitmap) { + kfree(usv); + return NULL; + } + return usv; +} + +static inline struct ubp_xol_vma *xol_add_vma(struct ubp_xol_area *area) +{ + struct vm_area_struct *vma; + struct ubp_xol_vma *usv; + struct mm_struct *mm; + struct file *file; + unsigned long addr; + + mm = get_task_mm(current); + if (!mm) + return ERR_PTR(-ESRCH); + + usv = xol_alloc_vma(); + if (!usv) { + mmput(mm); + return ERR_PTR(-ENOMEM); + } + + down_write(&mm->mmap_sem); + /* + * Find the end of the top mapping and skip a page. + * If there is no space for PAGE_SIZE above + * that, mmap will ignore our address hint. + * + * We allocate a "fake" unlinked shmem file because + * anonymous memory might not be granted execute + * permission when the selinux security hooks have + * their way. + */ + vma = rb_entry(rb_last(&mm->mm_rb), struct vm_area_struct, vm_rb); + addr = vma->vm_end + PAGE_SIZE; + file = shmem_file_setup("uprobes/ssol", PAGE_SIZE, VM_NORESERVE); + if (!file) { + printk(KERN_ERR "ubp_xol failed to setup shmem_file while " + "allocating vma for pid/tgid %d/%d for " + "single-stepping out of line.\n", + current->pid, current->tgid); + goto fail; + } + addr = do_mmap_pgoff(file, addr, PAGE_SIZE, PROT_EXEC, MAP_PRIVATE, 0); + fput(file); + + if (addr & ~PAGE_MASK) { + printk(KERN_ERR "ubp_xol failed to allocate a vma for pid/tgid" + " %d/%d for single-stepping out of line.\n", + current->pid, current->tgid); + goto fail; + } + vma = find_vma(mm, addr); + BUG_ON(!vma); + + /* Don't expand vma on mremap(). */ + vma->vm_flags |= VM_DONTEXPAND | VM_DONTCOPY; + usv->vaddr = vma->vm_start; + up_write(&mm->mmap_sem); + mmput(mm); + usv->npages = 1; + usv->nslots = UINSNS_PER_PAGE; + INIT_LIST_HEAD(&usv->list); + list_add_tail(&usv->list, &area->vmas); + area->last_vma = usv; + return usv; + +fail: + up_write(&mm->mmap_sem); + mmput(mm); + kfree(usv->bitmap); + kfree(usv); + return ERR_PTR(-ENOMEM); +} + +/* Runs with area->mutex locked */ +static long xol_expand_vma(struct ubp_xol_vma *usv) +{ + struct vm_area_struct *vma; + unsigned long *new_bitmap; + struct mm_struct *mm; + unsigned long new_length, result; + int new_nslots; + + new_length = PAGE_SIZE * (usv->npages + 1); + new_nslots = (int) ((usv->npages + 1) * UINSNS_PER_PAGE); + + /* xol_realloc_bitmap() never returns usv->bitmap. */ + new_bitmap = xol_realloc_bitmap(usv->bitmap, usv->nslots, new_nslots); + if (!new_bitmap) + return -ENOMEM; + + mm = get_task_mm(current); + if (!mm) + return -ESRCH; + + down_write(&mm->mmap_sem); + vma = find_vma(mm, usv->vaddr); + if (!vma) { + printk(KERN_ERR "pid/tgid %d/%d: ubp XOL vma at %#lx" + " has disappeared!\n", current->pid, current->tgid, + usv->vaddr); + result = -ENOMEM; + goto fail; + } + if (vma_pages(vma) != usv->npages || vma->vm_start != usv->vaddr) { + printk(KERN_ERR "pid/tgid %d/%d: ubp XOL vma has been" + " altered: %#lx/%ld pages; should be %#lx/%d pages\n", + current->pid, current->tgid, vma->vm_start, + vma_pages(vma), usv->vaddr, usv->npages); + result = -ENOMEM; + goto fail; + } + vma->vm_flags &= ~VM_DONTEXPAND; + result = do_mremap(usv->vaddr, usv->npages*PAGE_SIZE, new_length, 0, 0); + vma->vm_flags |= VM_DONTEXPAND; + if (IS_ERR_VALUE(result)) { + printk(KERN_WARNING "ubp_xol failed to expand the vma " + "for pid/tgid %d/%d for single-stepping out of line.\n", + current->pid, current->tgid); + goto fail; + } + BUG_ON(result != usv->vaddr); + up_write(&mm->mmap_sem); + + kfree(usv->bitmap); + usv->bitmap = new_bitmap; + usv->nslots = new_nslots; + usv->npages++; + return 0; + +fail: + up_write(&mm->mmap_sem); + mmput(mm); + kfree(new_bitmap); + return result; +} + +/* + * Find a slot + * - searching in existing vmas for a free slot. + * - If no free slot in existing vmas, try expanding the last vma. + * - If unable to expand a vma, try adding a new vma. + * + * Runs with area->mutex locked. + */ +static unsigned long xol_take_insn_slot(struct ubp_xol_area *area) +{ + struct ubp_xol_vma *usv; + unsigned long slot_addr; + int slot_nr; + + list_for_each_entry(usv, &area->vmas, list) { + slot_nr = find_first_zero_bit(usv->bitmap, usv->nslots); + if (slot_nr < usv->nslots) { + set_bit(slot_nr, usv->bitmap); + slot_addr = usv->vaddr + + (slot_nr * UBP_XOL_SLOT_BYTES); + return slot_addr; + } + } + + /* + * All out of space. Need to allocate a new page. + * Only the probed process itself can add or expand vmas. + */ + if (!area->can_expand || (area->tgid != current->tgid)) + goto fail; + + usv = area->last_vma; + if (usv) { + /* Expand vma, take first of newly added slots. */ + slot_nr = usv->nslots; + if (xol_expand_vma(usv) != 0) { + printk(KERN_WARNING "Allocating additional vma.\n"); + usv = NULL; + } + } + if (!usv) { + slot_nr = 0; + usv = xol_add_vma(area); + if (IS_ERR(usv)) + goto cant_expand; + } + + /* Take first slot of new page. */ + set_bit(slot_nr, usv->bitmap); + slot_addr = usv->vaddr + (slot_nr * UBP_XOL_SLOT_BYTES); + return slot_addr; + +cant_expand: + area->can_expand = false; +fail: + return 0; +} + +/** + * xol_get_insn_slot - If ubp was not allocated a slot, then + * allocate a slot. If ubp_insert_bkpt is already called, (i.e + * ubp.vaddr != 0) then copy the instruction into the slot. + * Allocating a free slot could result in + * - using a free slot in the current vma or + * - expanding the last vma or + * - adding a new vma. + * Returns the allocated slot address or 0. + * @ubp: probepoint information + * @xol_area refers the unique per process ubp_xol_area for + * this process. + */ +unsigned long xol_get_insn_slot(struct ubp_bkpt *ubp, void *xol_area) +{ + struct ubp_xol_area *area = (struct ubp_xol_area *) xol_area; + int len; + + if (unlikely(!area)) + return 0; + mutex_lock(&area->mutex); + if (likely(!ubp->xol_vaddr)) { + ubp->xol_vaddr = xol_take_insn_slot(area); + /* + * Initialize the slot if ubp->vaddr points to valid + * instruction slot. + */ + if (likely(ubp->xol_vaddr) && ubp->vaddr) { + len = access_process_vm(current, ubp->xol_vaddr, + ubp->insn, UBP_XOL_SLOT_BYTES, 1); + if (unlikely(len < UBP_XOL_SLOT_BYTES)) + printk(KERN_ERR "Failed to copy instruction" + " at %#lx len = %d\n", + ubp->vaddr, len); + } + } + mutex_unlock(&area->mutex); + return ubp->xol_vaddr; +} + +/** + * xol_free_insn_slot - If slot was earlier allocated by + * @xol_get_insn_slot(), make the slot available for + * subsequent requests. + * @slot_addr: slot address as returned by + * @xol_get_insn_area(). + * @xol_area refers the unique per process ubp_xol_area for + * this process. + */ +void xol_free_insn_slot(unsigned long slot_addr, void *xol_area) +{ + struct ubp_xol_area *area = (struct ubp_xol_area *) xol_area; + struct ubp_xol_vma *usv; + int found = 0; + + if (unlikely(!slot_addr || IS_ERR_VALUE(slot_addr))) + return; + if (unlikely(!area)) + return; + mutex_lock(&area->mutex); + list_for_each_entry(usv, &area->vmas, list) { + unsigned long vma_end = usv->vaddr + usv->npages*PAGE_SIZE; + if (usv->vaddr <= slot_addr && slot_addr < vma_end) { + int slot_nr; + unsigned long offset = slot_addr - usv->vaddr; + BUG_ON(offset % UBP_XOL_SLOT_BYTES); + slot_nr = offset / UBP_XOL_SLOT_BYTES; + BUG_ON(slot_nr >= usv->nslots); + clear_bit(slot_nr, usv->bitmap); + found = 1; + } + } + mutex_unlock(&area->mutex); + if (!found) + printk(KERN_ERR "%s: no XOL vma for slot address %#lx\n", + __func__, slot_addr); +} + +/** + * xol_validate_vaddr - Verify if the specified address is in an + * executable vma, but not in an XOL vma. + * - Return 0 if the specified virtual address is in an + * executable vma, but not in an XOL vma. + * - Return 1 if the specified virtual address is in an + * XOL vma. + * - Return -EINTR otherwise.(i.e non executable vma, or + * not a valid address + * @pid: the probed process + * @vaddr: virtual address of the instruction to be validated. + * @xol_area refers the unique per process ubp_xol_area for + * this process. + */ +int xol_validate_vaddr(struct pid *pid, unsigned long vaddr, void *xol_area) +{ + struct ubp_xol_area *area = (struct ubp_xol_area *) xol_area; + struct ubp_xol_vma *usv; + struct task_struct *tsk; + int result; + + tsk = pid_task(pid, PIDTYPE_PID); + result = ubp_validate_insn_addr(tsk, vaddr); + if (result != 0) + return result; + + if (unlikely(!area)) + return 0; + mutex_lock(&area->mutex); + list_for_each_entry(usv, &area->vmas, list) { + unsigned long vma_end = usv->vaddr + usv->npages*PAGE_SIZE; + if (usv->vaddr <= vaddr && vaddr < vma_end) { + result = 1; + break; + } + } + mutex_unlock(&area->mutex); + return result; +} From srikar at linux.vnet.ibm.com Mon Jan 11 12:25:53 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:55:53 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> Uprobes Implementation Uprobes Infrastructure enables user to dynamically establish probepoints in user applications and collect information by executing a handler functions when the probepoints are hit. Please refer Documentation/uprobes.txt for more details. This patch provides the core implementation of uprobes. This patch builds on utrace infrastructure. You need to follow this up with the uprobes patch for your architecture. Signed-off-by: Jim Keniston Signed-off-by: Srikar Dronamraju --- arch/Kconfig | 12 include/linux/uprobes.h | 292 ++++++ kernel/Makefile | 1 kernel/uprobes_core.c | 2017 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 2322 insertions(+) Index: new_uprobes.git/arch/Kconfig =================================================================== --- new_uprobes.git.orig/arch/Kconfig +++ new_uprobes.git/arch/Kconfig @@ -66,6 +66,16 @@ config UBP in user applications. This service is used by components such as uprobes. If in doubt, say "N". +config UPROBES + bool "User-space probes (EXPERIMENTAL)" + depends on UTRACE && MODULES && UBP + depends on HAVE_UPROBES + help + Uprobes enables kernel modules to establish probepoints + in user applications and execute handler functions when + the probepoints are hit. For more information, refer to + Documentation/uprobes.txt. If in doubt, say "N". + config HAVE_EFFICIENT_UNALIGNED_ACCESS bool help @@ -115,6 +125,8 @@ config HAVE_KPROBES config HAVE_KRETPROBES bool +config HAVE_UPROBES + def_bool n # # An arch should select this if it provides all these things: # Index: new_uprobes.git/include/linux/uprobes.h =================================================================== --- /dev/null +++ new_uprobes.git/include/linux/uprobes.h @@ -0,0 +1,292 @@ +#ifndef _LINUX_UPROBES_H +#define _LINUX_UPROBES_H +/* + * Userspace Probes (UProbes) + * include/linux/uprobes.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006, 2009 + */ +#include +#include + +struct pt_regs; + +/* This is what the user supplies us. */ +struct uprobe { + /* + * The pid of the probed process. Currently, this can be the + * thread ID (task->pid) of any active thread in the process. + */ + pid_t pid; + + /* Location of the probepoint */ + unsigned long vaddr; + + /* Handler to run when the probepoint is hit */ + void (*handler)(struct uprobe*, struct pt_regs*); + + /* + * This function, if non-NULL, will be called upon completion of + * an ASYNCHRONOUS registration (i.e., one initiated by a uprobe + * handler). reg = 1 for register, 0 for unregister. + */ + void (*registration_callback)(struct uprobe *u, int reg, int result); + + /* Reserved for use by uprobes */ + void *kdata; +}; + +#if defined(CONFIG_UPROBES) +extern int register_uprobe(struct uprobe *u); +extern void unregister_uprobe(struct uprobe *u); +#else +static inline int register_uprobe(struct uprobe *u) +{ + return -ENOSYS; +} +static inline void unregister_uprobe(struct uprobe *u) +{ +} +#endif /* CONFIG_UPROBES */ + +#ifdef UPROBES_IMPLEMENTATION + +#include +#include +#include +#include +#include +#include +#include + +struct utrace_engine; +struct task_struct; +struct pid; + +enum uprobe_probept_state { + UPROBE_INSERTING, /* process quiescing prior to insertion */ + UPROBE_BP_SET, /* breakpoint in place */ + UPROBE_REMOVING, /* process quiescing prior to removal */ + UPROBE_DISABLED /* removal completed */ +}; + +enum uprobe_task_state { + UPTASK_QUIESCENT, + UPTASK_SLEEPING, /* See utask_fake_quiesce(). */ + UPTASK_RUNNING, + UPTASK_BP_HIT, + UPTASK_SSTEP +}; + +enum uprobe_ssil_state { + SSIL_DISABLE, + SSIL_CLEAR, + SSIL_SET +}; + +#define UPROBE_HASH_BITS 5 +#define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS) + +/* + * uprobe_process -- not a user-visible struct. + * A uprobe_process represents a probed process. A process can have + * multiple probepoints (each represented by a uprobe_probept) and + * one or more threads (each represented by a uprobe_task). + */ +struct uprobe_process { + /* + * rwsem is write-locked for any change to the uprobe_process's + * graph (including uprobe_tasks, uprobe_probepts, and uprobe_kimgs) -- + * e.g., due to probe [un]registration or special events like exit. + * It's read-locked during the whole time we process a probepoint hit. + */ + struct rw_semaphore rwsem; + + /* Table of uprobe_probepts registered for this process */ + /* TODO: Switch to list_head[] per Ingo. */ + struct hlist_head uprobe_table[UPROBE_TABLE_SIZE]; + + /* List of uprobe_probepts awaiting insertion or removal */ + struct list_head pending_uprobes; + + /* List of uprobe_tasks in this task group */ + struct list_head thread_list; + int nthreads; + int n_quiescent_threads; + + /* this goes on the uproc_table */ + struct hlist_node hlist; + + /* + * All threads (tasks) in a process share the same uprobe_process. + */ + struct pid *tg_leader; + pid_t tgid; + + /* Threads in UTASK_SLEEPING state wait here to be roused. */ + wait_queue_head_t waitq; + + /* + * We won't free the uprobe_process while... + * - any register/unregister operations on it are in progress; or + * - any uprobe_report_* callbacks are running; or + * - uprobe_table[] is not empty; or + * - any tasks are UTASK_SLEEPING in the waitq; + * refcount reflects this. We do NOT ref-count tasks (threads), + * since once the last thread has exited, the rest is academic. + */ + atomic_t refcount; + + /* + * finished = 1 means the process is execing or the last thread + * is exiting, and we're cleaning up the uproc. If the execed + * process is probed, a new uproc will be created. + */ + bool finished; + + /* + * 1 to single-step out of line; 0 for inline. This can drop to + * 0 if we can't set up the XOL area, but never goes from 0 to 1. + */ + bool sstep_out_of_line; + + /* + * Manages slots for instruction-copies to be single-stepped + * out of line. + */ + void *xol_area; +}; + +/* + * uprobe_kimg -- not a user-visible struct. + * Holds implementation-only per-uprobe data. + * uprobe->kdata points to this. + */ +struct uprobe_kimg { + struct uprobe *uprobe; + struct uprobe_probept *ppt; + + /* + * -EBUSY while we're waiting for all threads to quiesce so the + * associated breakpoint can be inserted or removed. + * 0 if the the insert/remove operation has succeeded, or -errno + * otherwise. + */ + int status; + + /* on ppt's list */ + struct list_head list; +}; + +/* + * uprobe_probept -- not a user-visible struct. + * A probepoint, at which several uprobes can be registered. + * Guarded by uproc->rwsem. + */ +struct uprobe_probept { + /* breakpoint/XOL details */ + struct ubp_bkpt ubp; + + /* The uprobe_kimg(s) associated with this uprobe_probept */ + struct list_head uprobe_list; + + enum uprobe_probept_state state; + + /* The parent uprobe_process */ + struct uprobe_process *uproc; + + /* + * ppt goes in the uprobe_process->uprobe_table when registered -- + * even before the breakpoint has been inserted. + */ + struct hlist_node ut_node; + + /* + * ppt sits in the uprobe_process->pending_uprobes queue while + * awaiting insertion or removal of the breakpoint. + */ + struct list_head pd_node; + + /* [un]register_uprobe() waits 'til bkpt inserted/removed */ + wait_queue_head_t waitq; + + /* + * ssil_lock, ssilq and ssil_state are used to serialize + * single-stepping inline, so threads don't clobber each other + * swapping the breakpoint instruction in and out. This helps + * prevent crashing the probed app, but it does NOT prevent + * probe misses while the breakpoint is swapped out. + * ssilq - threads wait for their chance to single-step inline. + */ + spinlock_t ssil_lock; + wait_queue_head_t ssilq; + enum uprobe_ssil_state ssil_state; +}; + +/* + * uprobe_utask -- not a user-visible struct. + * Corresponds to a thread in a probed process. + * Guarded by uproc->rwsem. + */ +struct uprobe_task { + /* Lives in the global utask_table */ + struct hlist_node hlist; + + /* Lives on the thread_list for the uprobe_process */ + struct list_head list; + + struct task_struct *tsk; + struct pid *pid; + + /* The utrace engine for this task */ + struct utrace_engine *engine; + + /* Back pointer to the associated uprobe_process */ + struct uprobe_process *uproc; + + enum uprobe_task_state state; + + /* + * quiescing = 1 means this task has been asked to quiesce. + * It may not be able to comply immediately if it's hit a bkpt. + */ + bool quiescing; + + /* Set before running handlers; cleared after single-stepping. */ + struct uprobe_probept *active_probe; + + /* Saved address of copied original instruction */ + long singlestep_addr; + + struct ubp_task_arch_info arch_info; + + /* + * Unexpected error in probepoint handling has left task's + * text or stack corrupted. Kill task ASAP. + */ + bool doomed; + + /* [un]registrations initiated by handlers must be asynchronous. */ + struct list_head deferred_registrations; + + /* Delay handler-destined signals 'til after single-step done. */ + struct list_head delayed_signals; +}; + +#endif /* UPROBES_IMPLEMENTATION */ + +#endif /* _LINUX_UPROBES_H */ Index: new_uprobes.git/kernel/Makefile =================================================================== --- new_uprobes.git.orig/kernel/Makefile +++ new_uprobes.git/kernel/Makefile @@ -104,6 +104,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_b obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o obj-$(CONFIG_UBP) += ubp_core.o obj-$(CONFIG_UBP_XOL) += ubp_xol.o +obj-$(CONFIG_UPROBES) += uprobes_core.o ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) # According to Alan Modra , the -fno-omit-frame-pointer is Index: new_uprobes.git/kernel/uprobes_core.c =================================================================== --- /dev/null +++ new_uprobes.git/kernel/uprobes_core.c @@ -0,0 +1,2017 @@ +/* + * Userspace Probes (UProbes) + * kernel/uprobes_core.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006, 2009 + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#define UPROBES_IMPLEMENTATION 1 +#include +#include +#include +#include +#include +#include + +#define UPROBE_SET_FLAGS 1 +#define UPROBE_CLEAR_FLAGS 0 + +#define MAX_XOL_SLOTS 1024 + +static int utask_fake_quiesce(struct uprobe_task *utask); +static int uprobe_post_ssout(struct uprobe_task *utask, + struct uprobe_probept *ppt, struct pt_regs *regs); + +typedef void (*uprobe_handler_t)(struct uprobe*, struct pt_regs*); + +/* + * Table of currently probed processes, hashed by task-group leader's + * struct pid. + */ +static struct hlist_head uproc_table[UPROBE_TABLE_SIZE]; + +/* Protects uproc_table during uprobe (un)registration */ +static DEFINE_MUTEX(uproc_mutex); + +/* Table of uprobe_tasks, hashed by task_struct pointer. */ +static struct hlist_head utask_table[UPROBE_TABLE_SIZE]; +static DEFINE_SPINLOCK(utask_table_lock); + +/* p_uprobe_utrace_ops = &uprobe_utrace_ops. Fwd refs are a pain w/o this. */ +static const struct utrace_engine_ops *p_uprobe_utrace_ops; + +struct deferred_registration { + struct list_head list; + struct uprobe *uprobe; + int regflag; /* 0 - unregister, 1 - register */ +}; + +/* + * Calling a signal handler cancels single-stepping, so uprobes delays + * calling the handler, as necessary, until after single-stepping is completed. + */ +struct delayed_signal { + struct list_head list; + siginfo_t info; +}; + +static u16 ubp_strategies; + +static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk) +{ + struct hlist_head *head; + struct hlist_node *node; + struct uprobe_task *utask; + unsigned long flags; + + head = &utask_table[hash_ptr(tsk, UPROBE_HASH_BITS)]; + spin_lock_irqsave(&utask_table_lock, flags); + hlist_for_each_entry(utask, node, head, hlist) { + if (utask->tsk == tsk) { + spin_unlock_irqrestore(&utask_table_lock, flags); + return utask; + } + } + spin_unlock_irqrestore(&utask_table_lock, flags); + return NULL; +} + +static void uprobe_hash_utask(struct uprobe_task *utask) +{ + struct hlist_head *head; + unsigned long flags; + + INIT_HLIST_NODE(&utask->hlist); + head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)]; + spin_lock_irqsave(&utask_table_lock, flags); + hlist_add_head(&utask->hlist, head); + spin_unlock_irqrestore(&utask_table_lock, flags); +} + +static void uprobe_unhash_utask(struct uprobe_task *utask) +{ + unsigned long flags; + + spin_lock_irqsave(&utask_table_lock, flags); + hlist_del(&utask->hlist); + spin_unlock_irqrestore(&utask_table_lock, flags); +} + +static inline void uprobe_get_process(struct uprobe_process *uproc) +{ + atomic_inc(&uproc->refcount); +} + +/* + * Decrement uproc's refcount in a situation where we "know" it can't + * reach zero. It's OK to call this with uproc locked. Compare with + * uprobe_put_process(). + */ +static inline void uprobe_decref_process(struct uprobe_process *uproc) +{ + if (atomic_dec_and_test(&uproc->refcount)) + BUG(); +} + +/* + * Runs with the uproc_mutex held. Returns with uproc ref-counted and + * write-locked. + * + * Around exec time, briefly, it's possible to have one (finished) uproc + * for the old image and one for the new image. We find the latter. + */ +static struct uprobe_process *uprobe_find_process(struct pid *tg_leader) +{ + struct uprobe_process *uproc; + struct hlist_head *head; + struct hlist_node *node; + + head = &uproc_table[hash_ptr(tg_leader, UPROBE_HASH_BITS)]; + hlist_for_each_entry(uproc, node, head, hlist) { + if (uproc->tg_leader == tg_leader && !uproc->finished) { + uprobe_get_process(uproc); + down_write(&uproc->rwsem); + return uproc; + } + } + return NULL; +} + +/* + * In the given uproc's hash table of probepoints, find the one with the + * specified virtual address. Runs with uproc->rwsem locked. + */ +static struct uprobe_probept *uprobe_find_probept(struct uprobe_process *uproc, + unsigned long vaddr) +{ + struct uprobe_probept *ppt; + struct hlist_node *node; + struct hlist_head *head = &uproc->uprobe_table[hash_long(vaddr, + UPROBE_HASH_BITS)]; + + hlist_for_each_entry(ppt, node, head, ut_node) { + if (ppt->ubp.vaddr == vaddr && ppt->state != UPROBE_DISABLED) + return ppt; + } + return NULL; +} + +/* + * Save a copy of the original instruction (so it can be single-stepped + * out of line), insert the breakpoint instruction, and awake + * register_uprobe(). + */ +static void uprobe_insert_bkpt(struct uprobe_probept *ppt, + struct task_struct *tsk) +{ + struct uprobe_kimg *uk; + int result; + + if (tsk) + result = ubp_insert_bkpt(tsk, &ppt->ubp); + else + /* No surviving tasks associated with ppt->uproc */ + result = -ESRCH; + ppt->state = (result ? UPROBE_DISABLED : UPROBE_BP_SET); + list_for_each_entry(uk, &ppt->uprobe_list, list) + uk->status = result; + wake_up_all(&ppt->waitq); +} + +/* + * Check if task has just stepped on a trap instruction at the + * indicated address. If it has indeed stepped on that address, + * then reset Instruction Pointer for the task. + * + * tsk should either be current thread or already quiesced thread. + */ +static inline void reset_thread_ip(struct task_struct *tsk, + struct pt_regs *regs, unsigned long addr) +{ + if ((ubp_get_bkpt_addr(regs) == addr) && + !test_tsk_thread_flag(tsk, TIF_SINGLESTEP)) + ubp_set_ip(regs, addr); +} + +/* + * ppt's breakpoint has been removed. If any threads are in the middle of + * single-stepping at this probepoint, fix things up so they can proceed. + * If any threads have just hit breakpoint but are yet to start + * pre-processing, reset their instruction pointers. + * + * Runs with all of ppt->uproc's threads quiesced and ppt->uproc->rwsem + * write-locked + */ +static inline void adjust_trapped_thread_ip(struct uprobe_probept *ppt) +{ + struct uprobe_process *uproc = ppt->uproc; + struct uprobe_task *utask; + struct pt_regs *regs; + + list_for_each_entry(utask, &uproc->thread_list, list) { + regs = task_pt_regs(utask->tsk); + if (utask->active_probe != ppt) { + reset_thread_ip(utask->tsk, regs, ppt->ubp.vaddr); + continue; + } + + /* + * Current thread cannot have an active breakpoint + * and still request for a breakpoint removal. The + * above case is handled by utask_fake_quiesce(). + */ + BUG_ON(utask->tsk == current); + +#ifdef CONFIG_UBP_XOL + if (instruction_pointer(regs) == ppt->ubp.xol_vaddr) + /* adjust the ip to breakpoint addr. */ + ubp_set_ip(regs, ppt->ubp.vaddr); + else + /* adjust the ip to next instruction. */ + uprobe_post_ssout(utask, ppt, regs); +#endif + } +} + +static void uprobe_remove_bkpt(struct uprobe_probept *ppt, + struct task_struct *tsk) +{ + if (tsk) { + if (ubp_remove_bkpt(tsk, &ppt->ubp) != 0) { + printk(KERN_ERR + "Error removing uprobe at pid %d vaddr %#lx:" + " can't restore original instruction\n", + tsk->tgid, ppt->ubp.vaddr); + /* + * This shouldn't happen, since we were previously + * able to write the breakpoint at that address. + * There's not much we can do besides let the + * process die with a SIGTRAP the next time the + * breakpoint is hit. + */ + } + adjust_trapped_thread_ip(ppt); + if (ppt->ubp.strategy & UBP_HNT_INLINE) { + unsigned long flags; + spin_lock_irqsave(&ppt->ssil_lock, flags); + ppt->ssil_state = SSIL_DISABLE; + wake_up_all(&ppt->ssilq); + spin_unlock_irqrestore(&ppt->ssil_lock, flags); + } + } + /* Wake up unregister_uprobe(). */ + ppt->state = UPROBE_DISABLED; + wake_up_all(&ppt->waitq); +} + +/* + * Runs with all of uproc's threads quiesced and uproc->rwsem write-locked. + * As specified, insert or remove the breakpoint instruction for each + * uprobe_probept on uproc's pending list. + * tsk = one of the tasks associated with uproc -- NULL if there are + * no surviving threads. + * It's OK for uproc->pending_uprobes to be empty here. It can happen + * if a register and an unregister are requested (by different probers) + * simultaneously for the same pid/vaddr. + */ +static void handle_pending_uprobes(struct uprobe_process *uproc, + struct task_struct *tsk) +{ + struct uprobe_probept *ppt, *tmp; + + list_for_each_entry_safe(ppt, tmp, &uproc->pending_uprobes, pd_node) { + switch (ppt->state) { + case UPROBE_INSERTING: + uprobe_insert_bkpt(ppt, tsk); + break; + case UPROBE_REMOVING: + uprobe_remove_bkpt(ppt, tsk); + break; + default: + BUG(); + } + list_del(&ppt->pd_node); + } +} + +static void utask_adjust_flags(struct uprobe_task *utask, int set, + unsigned long flags) +{ + unsigned long newflags, oldflags; + + oldflags = utask->engine->flags; + newflags = oldflags; + if (set) + newflags |= flags; + else + newflags &= ~flags; + /* + * utrace_barrier[_pid] is not appropriate here. If we're + * adjusting current, it's not needed. And if we're adjusting + * some other task, we're holding utask->uproc->rwsem, which + * could prevent that task from completing the callback we'd + * be waiting on. + */ + if (newflags != oldflags) { + if (utrace_set_events_pid(utask->pid, utask->engine, + newflags) != 0) + /* We don't care. */ + ; + } +} + +static inline void clear_utrace_quiesce(struct uprobe_task *utask, bool resume) +{ + utask_adjust_flags(utask, UPROBE_CLEAR_FLAGS, UTRACE_EVENT(QUIESCE)); + if (resume) { + if (utrace_control_pid(utask->pid, utask->engine, + UTRACE_RESUME) != 0) + /* We don't care. */ + ; + } +} + +/* Opposite of quiesce_all_threads(). Same locking applies. */ +static void rouse_all_threads(struct uprobe_process *uproc) +{ + struct uprobe_task *utask; + + list_for_each_entry(utask, &uproc->thread_list, list) { + if (utask->quiescing) { + utask->quiescing = false; + if (utask->state == UPTASK_QUIESCENT) { + utask->state = UPTASK_RUNNING; + uproc->n_quiescent_threads--; + clear_utrace_quiesce(utask, true); + } + } + } + /* Wake any threads that decided to sleep rather than quiesce. */ + wake_up_all(&uproc->waitq); +} + +/* + * If all of uproc's surviving threads have quiesced, do the necessary + * breakpoint insertions or removals, un-quiesce everybody, and return 1. + * tsk is a surviving thread, or NULL if there is none. Runs with + * uproc->rwsem write-locked. + */ +static int check_uproc_quiesced(struct uprobe_process *uproc, + struct task_struct *tsk) +{ + if (uproc->n_quiescent_threads >= uproc->nthreads) { + handle_pending_uprobes(uproc, tsk); + rouse_all_threads(uproc); + return 1; + } + return 0; +} + +/* Direct the indicated thread to quiesce. */ +static void uprobe_stop_thread(struct uprobe_task *utask) +{ + int result; + + /* + * As with utask_adjust_flags, calling utrace_barrier_pid below + * could deadlock. + */ + BUG_ON(utask->tsk == current); + result = utrace_control_pid(utask->pid, utask->engine, UTRACE_STOP); + if (result == 0) { + /* Already stopped. */ + utask->state = UPTASK_QUIESCENT; + utask->uproc->n_quiescent_threads++; + } else if (result == -EINPROGRESS) { + if (utask->tsk->state & TASK_INTERRUPTIBLE) { + /* + * Task could be in interruptible wait for a long + * time -- e.g., if stopped for I/O. But we know + * it's not going to run user code before all + * threads quiesce, so pretend it's quiesced. + * This avoids terminating a system call via + * UTRACE_INTERRUPT. + */ + utask->state = UPTASK_QUIESCENT; + utask->uproc->n_quiescent_threads++; + } else { + /* + * Task will eventually stop, but it may be a long time. + * Don't wait. + */ + result = utrace_control_pid(utask->pid, utask->engine, + UTRACE_INTERRUPT); + if (result != 0) + /* We don't care. */ + ; + } + } +} + +/* + * Quiesce all threads in the specified process -- e.g., prior to + * breakpoint insertion. Runs with uproc->rwsem write-locked. + * Returns false if all threads have died. + */ +static bool quiesce_all_threads(struct uprobe_process *uproc, + struct uprobe_task **cur_utask_quiescing) +{ + struct uprobe_task *utask; + struct task_struct *survivor = NULL; /* any survivor */ + bool survivors = false; + + *cur_utask_quiescing = NULL; + list_for_each_entry(utask, &uproc->thread_list, list) { + if (!survivors) { + survivor = pid_task(utask->pid, PIDTYPE_PID); + if (survivor) + survivors = true; + } + if (!utask->quiescing) { + /* + * If utask is currently handling a probepoint, it'll + * check utask->quiescing and quiesce when it's done. + */ + utask->quiescing = true; + if (utask->tsk == current) + *cur_utask_quiescing = utask; + else if (utask->state == UPTASK_RUNNING) { + utask_adjust_flags(utask, UPROBE_SET_FLAGS, + UTRACE_EVENT(QUIESCE)); + uprobe_stop_thread(utask); + } + } + } + /* + * If all the (other) threads are already quiesced, it's up to the + * current thread to do the necessary work. + */ + check_uproc_quiesced(uproc, survivor); + return survivors; +} + +/* Called with utask->uproc write-locked. */ +static void uprobe_free_task(struct uprobe_task *utask, bool in_callback) +{ + struct deferred_registration *dr, *d; + struct delayed_signal *ds, *ds2; + int result; + + if (utask->engine && (utask->tsk != current || !in_callback)) { + /* + * No other tasks in this process should be running + * uprobe_report_* callbacks. (If they are, utrace_barrier() + * here could deadlock.) + */ + result = utrace_control_pid(utask->pid, utask->engine, + UTRACE_DETACH); + BUG_ON(result == -EINPROGRESS); + } + put_pid(utask->pid); /* null pid OK */ + + uprobe_unhash_utask(utask); + list_del(&utask->list); + list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) { + list_del(&dr->list); + kfree(dr); + } + + list_for_each_entry_safe(ds, ds2, &utask->delayed_signals, list) { + list_del(&ds->list); + kfree(ds); + } + + kfree(utask); +} + +/* + * Dismantle uproc and all its remaining uprobe_tasks. + * in_callback = 1 if the caller is a uprobe_report_* callback who will + * handle the UTRACE_DETACH operation. + * Runs with uproc_mutex held; called with uproc->rwsem write-locked. + */ +static void uprobe_free_process(struct uprobe_process *uproc, int in_callback) +{ + struct uprobe_task *utask, *tmp; + + if (!hlist_unhashed(&uproc->hlist)) + hlist_del(&uproc->hlist); + list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list) + uprobe_free_task(utask, in_callback); + put_pid(uproc->tg_leader); + if (uproc->xol_area) + xol_put_area(uproc->xol_area); + up_write(&uproc->rwsem); /* So kfree doesn't complain */ + kfree(uproc); +} + +/* + * Decrement uproc's ref count. If it's zero, free uproc and return + * 1. Else return 0. If uproc is locked, don't call this; use + * uprobe_decref_process(). + */ +static int uprobe_put_process(struct uprobe_process *uproc, bool in_callback) +{ + int freed = 0; + + if (atomic_dec_and_test(&uproc->refcount)) { + mutex_lock(&uproc_mutex); + down_write(&uproc->rwsem); + if (unlikely(atomic_read(&uproc->refcount) != 0)) { + /* + * The works because uproc_mutex is held any + * time the ref count can go from 0 to 1 -- e.g., + * register_uprobe() sneaks in with a new probe. + */ + up_write(&uproc->rwsem); + } else { + uprobe_free_process(uproc, in_callback); + freed = 1; + } + mutex_unlock(&uproc_mutex); + } + return freed; +} + +static struct uprobe_kimg *uprobe_mk_kimg(struct uprobe *u) +{ + struct uprobe_kimg *uk = kzalloc(sizeof *uk, + GFP_USER); + + if (unlikely(!uk)) + return ERR_PTR(-ENOMEM); + u->kdata = uk; + uk->uprobe = u; + uk->ppt = NULL; + INIT_LIST_HEAD(&uk->list); + uk->status = -EBUSY; + return uk; +} + +/* + * Allocate a uprobe_task object for p and add it to uproc's list. + * Called with p "got" and uproc->rwsem write-locked. Called in one of + * the following cases: + * - before setting the first uprobe in p's process + * - we're in uprobe_report_clone() and p is the newly added thread + * Returns: + * - pointer to new uprobe_task on success + * - NULL if t dies before we can utrace_attach it + * - negative errno otherwise + */ +static struct uprobe_task *uprobe_add_task(struct pid *p, + struct uprobe_process *uproc) +{ + struct uprobe_task *utask; + struct utrace_engine *engine; + struct task_struct *t = pid_task(p, PIDTYPE_PID); + + if (!t) + return NULL; + utask = kzalloc(sizeof *utask, GFP_USER); + if (unlikely(utask == NULL)) + return ERR_PTR(-ENOMEM); + + utask->pid = p; + utask->tsk = t; + utask->state = UPTASK_RUNNING; + utask->quiescing = false; + utask->uproc = uproc; + utask->active_probe = NULL; + utask->doomed = false; + INIT_LIST_HEAD(&utask->deferred_registrations); + INIT_LIST_HEAD(&utask->delayed_signals); + INIT_LIST_HEAD(&utask->list); + list_add_tail(&utask->list, &uproc->thread_list); + uprobe_hash_utask(utask); + + engine = utrace_attach_pid(p, UTRACE_ATTACH_CREATE, + p_uprobe_utrace_ops, utask); + if (IS_ERR(engine)) { + long err = PTR_ERR(engine); + printk("uprobes: utrace_attach_task failed, returned %ld\n", + err); + uprobe_free_task(utask, 0); + if (err == -ESRCH) + return NULL; + return ERR_PTR(err); + } + utask->engine = engine; + /* + * Always watch for traps, clones, execs and exits. Caller must + * set any other engine flags. + */ + utask_adjust_flags(utask, UPROBE_SET_FLAGS, + UTRACE_EVENT(SIGNAL) | UTRACE_EVENT(SIGNAL_IGN) | + UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(EXEC) | + UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT)); + /* + * Note that it's OK if t dies just after utrace_attach, because + * with the engine in place, the appropriate report_* callback + * should handle it after we release uproc->rwsem. + */ + utrace_engine_put(engine); + return utask; +} + +/* + * start_pid is the pid for a thread in the probed process. Find the + * next thread that doesn't have a corresponding uprobe_task yet. Return + * a ref-counted pid for that task, if any, else NULL. + */ +static struct pid *find_next_thread_to_add(struct uprobe_process *uproc, + struct pid *start_pid) +{ + struct task_struct *t, *start; + struct uprobe_task *utask; + struct pid *pid = NULL; + + rcu_read_lock(); + start = pid_task(start_pid, PIDTYPE_PID); + t = start; + if (t) { + do { + if (unlikely(t->flags & PF_EXITING)) + goto dont_add; + list_for_each_entry(utask, &uproc->thread_list, list) { + if (utask->tsk == t) + /* Already added */ + goto dont_add; + } + /* Found thread/task to add. */ + pid = get_pid(task_pid(t)); + break; +dont_add: + t = next_thread(t); + } while (t != start); + } + rcu_read_unlock(); + return pid; +} + +/* Runs with uproc_mutex held; returns with uproc->rwsem write-locked. */ +static struct uprobe_process *uprobe_mk_process(struct pid *tg_leader) +{ + struct uprobe_process *uproc; + struct uprobe_task *utask; + struct pid *add_me; + int i; + long err; + + uproc = kzalloc(sizeof *uproc, GFP_USER); + if (unlikely(uproc == NULL)) + return ERR_PTR(-ENOMEM); + + /* Initialize fields */ + atomic_set(&uproc->refcount, 1); + init_rwsem(&uproc->rwsem); + down_write(&uproc->rwsem); + init_waitqueue_head(&uproc->waitq); + for (i = 0; i < UPROBE_TABLE_SIZE; i++) + INIT_HLIST_HEAD(&uproc->uprobe_table[i]); + INIT_LIST_HEAD(&uproc->pending_uprobes); + INIT_LIST_HEAD(&uproc->thread_list); + uproc->nthreads = 0; + uproc->n_quiescent_threads = 0; + INIT_HLIST_NODE(&uproc->hlist); + uproc->tg_leader = get_pid(tg_leader); + uproc->tgid = pid_task(tg_leader, PIDTYPE_PID)->tgid; + uproc->finished = false; + +#ifdef CONFIG_UBP_XOL + if (!(ubp_strategies & UBP_HNT_INLINE)) + uproc->sstep_out_of_line = true; + else +#endif + uproc->sstep_out_of_line = false; + + /* + * Create and populate one utask per thread in this process. We + * can't call uprobe_add_task() while holding RCU lock, so we: + * 1. rcu_read_lock() + * 2. Find the next thread, add_me, in this process that's not + * already on uproc's thread_list. + * 3. rcu_read_unlock() + * 4. uprobe_add_task(add_me, uproc) + * Repeat 1-4 'til we have utasks for all threads. + */ + add_me = tg_leader; + while ((add_me = find_next_thread_to_add(uproc, add_me)) != NULL) { + utask = uprobe_add_task(add_me, uproc); + if (IS_ERR(utask)) { + err = PTR_ERR(utask); + goto fail; + } + if (utask) + uproc->nthreads++; + } + + if (uproc->nthreads == 0) { + /* All threads -- even p -- are dead. */ + err = -ESRCH; + goto fail; + } + return uproc; + +fail: + uprobe_free_process(uproc, 0); + return ERR_PTR(err); +} + +/* + * Creates a uprobe_probept and connects it to uk and uproc. Runs with + * uproc->rwsem write-locked. + */ +static struct uprobe_probept *uprobe_add_probept(struct uprobe_kimg *uk, + struct uprobe_process *uproc) +{ + struct uprobe_probept *ppt; + + ppt = kzalloc(sizeof *ppt, GFP_USER); + if (unlikely(ppt == NULL)) + return ERR_PTR(-ENOMEM); + init_waitqueue_head(&ppt->waitq); + init_waitqueue_head(&ppt->ssilq); + spin_lock_init(&ppt->ssil_lock); + ppt->ssil_state = SSIL_CLEAR; + + /* Connect to uk. */ + INIT_LIST_HEAD(&ppt->uprobe_list); + list_add_tail(&uk->list, &ppt->uprobe_list); + uk->ppt = ppt; + uk->status = -EBUSY; + ppt->ubp.vaddr = uk->uprobe->vaddr; + ppt->ubp.xol_vaddr = 0; + + /* Connect to uproc. */ + if (!uproc->sstep_out_of_line) + ppt->ubp.strategy = UBP_HNT_INLINE; + else + ppt->ubp.strategy = ubp_strategies; + ppt->state = UPROBE_INSERTING; + ppt->uproc = uproc; + INIT_LIST_HEAD(&ppt->pd_node); + list_add_tail(&ppt->pd_node, &uproc->pending_uprobes); + INIT_HLIST_NODE(&ppt->ut_node); + hlist_add_head(&ppt->ut_node, + &uproc->uprobe_table[hash_long(ppt->ubp.vaddr, + UPROBE_HASH_BITS)]); + uprobe_get_process(uproc); + return ppt; +} + +/* + * Runs with ppt->uproc write-locked. Frees ppt and decrements the ref + * count on ppt->uproc (but ref count shouldn't hit 0). + */ +static void uprobe_free_probept(struct uprobe_probept *ppt) +{ + struct uprobe_process *uproc = ppt->uproc; + + xol_free_insn_slot(ppt->ubp.xol_vaddr, uproc->xol_area); + hlist_del(&ppt->ut_node); + kfree(ppt); + uprobe_decref_process(uproc); +} + +static void uprobe_free_kimg(struct uprobe_kimg *uk) +{ + uk->uprobe->kdata = NULL; + kfree(uk); +} + +/* + * Runs with uprobe_process write-locked. + * Note that we never free uk->uprobe, because the user owns that. + */ +static void purge_uprobe(struct uprobe_kimg *uk) +{ + struct uprobe_probept *ppt = uk->ppt; + + list_del(&uk->list); + uprobe_free_kimg(uk); + if (list_empty(&ppt->uprobe_list)) + uprobe_free_probept(ppt); +} + +/* + * Runs with utask->uproc locked. + * read lock if called from uprobe handler. + * else write lock. + * Returns -EINPROGRESS on success. + * Returns -EBUSY if a request for defer registration already exists. + * Returns 0 if we have deferred request for both register/unregister. + * + */ +static int defer_registration(struct uprobe *u, int regflag, + struct uprobe_task *utask) +{ + struct deferred_registration *dr, *d; + + /* Check if we already have such a defer request */ + list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) { + if (dr->uprobe == u) { + if (dr->regflag != regflag) { + /* same as successful register + unregister */ + list_del(&dr->list); + kfree(dr); + return 0; + } else + /* we already have identical request */ + return -EBUSY; + } + } + + /* We have a new unique request */ + dr = kmalloc(sizeof(struct deferred_registration), GFP_USER); + if (!dr) + return -ENOMEM; + dr->uprobe = u; + dr->regflag = regflag; + INIT_LIST_HEAD(&dr->list); + list_add_tail(&dr->list, &utask->deferred_registrations); + return -EINPROGRESS; +} + +/* + * Given a numeric thread ID, return a ref-counted struct pid for the + * task-group-leader thread. + */ +static struct pid *uprobe_get_tg_leader(pid_t p) +{ + struct pid *pid = NULL; + + rcu_read_lock(); + if (current->nsproxy) + pid = find_vpid(p); + if (pid) { + struct task_struct *t = pid_task(pid, PIDTYPE_PID); + if (t) + pid = task_tgid(t); + else + pid = NULL; + } + rcu_read_unlock(); + return get_pid(pid); /* null pid OK here */ +} + +/* See Documentation/uprobes.txt. */ +int register_uprobe(struct uprobe *u) +{ + struct uprobe_task *cur_utask, *cur_utask_quiescing = NULL; + struct uprobe_process *uproc; + struct uprobe_probept *ppt; + struct uprobe_kimg *uk; + struct pid *p; + int ret = 0, uproc_is_new = 0; + bool survivors; +#ifndef CONFIG_UBP_XOL + struct task_struct *tsk; +#endif + + if (!u || !u->handler) + return -EINVAL; + + p = uprobe_get_tg_leader(u->pid); + if (!p) + return -ESRCH; + + cur_utask = uprobe_find_utask(current); + if (cur_utask && cur_utask->active_probe) { + /* + * Called from handler; cur_utask->uproc is read-locked. + * Do this registration later. + */ + put_pid(p); + return defer_registration(u, 1, cur_utask); + } + + /* Get the uprobe_process for this pid, or make a new one. */ + mutex_lock(&uproc_mutex); + uproc = uprobe_find_process(p); + + if (uproc) { + struct uprobe_task *utask; + + mutex_unlock(&uproc_mutex); + list_for_each_entry(utask, &uproc->thread_list, list) { + if (!utask->active_probe) + continue; + /* + * utask is at a probepoint, but has dropped + * uproc->rwsem to single-step. If utask is + * stopped, then it's probably because some + * other engine has asserted UTRACE_STOP; + * that engine may not allow UTRACE_RESUME + * until register_uprobe() returns. But, for + * reasons we won't go into here, utask wants + * to finish with utask->active_probe before + * allowing handle_pending_uprobes() to run + * (via utask_fake_quiesce()). So we defer this + * registration operation; it will be run after + * utask->active_probe is taken care of. + */ + BUG_ON(utask->state != UPTASK_SSTEP); + if (task_is_stopped_or_traced(utask->tsk)) { + ret = defer_registration(u, 1, utask); + goto fail_uproc; + } + } + } else { + uproc = uprobe_mk_process(p); + if (IS_ERR(uproc)) { + ret = (int) PTR_ERR(uproc); + mutex_unlock(&uproc_mutex); + goto fail_tsk; + } + /* Hold uproc_mutex until we've added uproc to uproc_table. */ + uproc_is_new = 1; + } + +#ifdef CONFIG_UBP_XOL + ret = xol_validate_vaddr(p, u->vaddr, uproc->xol_area); +#else + tsk = pid_task(p, PIDTYPE_PID); + ret = ubp_validate_insn_addr(tsk, u->vaddr); +#endif + if (ret < 0) + goto fail_uproc; + + if (u->kdata) { + /* + * Probe is already/still registered. This is the only + * place we return -EBUSY to the user. + */ + ret = -EBUSY; + goto fail_uproc; + } + + uk = uprobe_mk_kimg(u); + if (IS_ERR(uk)) { + ret = (int) PTR_ERR(uk); + goto fail_uproc; + } + + /* See if we already have a probepoint at the vaddr. */ + ppt = (uproc_is_new ? NULL : uprobe_find_probept(uproc, u->vaddr)); + if (ppt) { + /* Breakpoint is already in place, or soon will be. */ + uk->ppt = ppt; + list_add_tail(&uk->list, &ppt->uprobe_list); + switch (ppt->state) { + case UPROBE_INSERTING: + uk->status = -EBUSY; /* in progress */ + if (uproc->tg_leader == task_tgid(current)) { + cur_utask_quiescing = cur_utask; + BUG_ON(!cur_utask_quiescing); + } + break; + case UPROBE_REMOVING: + /* Wait! Don't remove that bkpt after all! */ + ppt->state = UPROBE_BP_SET; + /* Remove from pending list. */ + list_del(&ppt->pd_node); + /* Wake unregister_uprobe(). */ + wake_up_all(&ppt->waitq); + /*FALLTHROUGH*/ + case UPROBE_BP_SET: + uk->status = 0; + break; + default: + BUG(); + } + up_write(&uproc->rwsem); + put_pid(p); + if (uk->status == 0) { + uprobe_decref_process(uproc); + return 0; + } + goto await_bkpt_insertion; + } else { + ppt = uprobe_add_probept(uk, uproc); + if (IS_ERR(ppt)) { + ret = (int) PTR_ERR(ppt); + goto fail_uk; + } + } + + if (uproc_is_new) { + hlist_add_head(&uproc->hlist, + &uproc_table[hash_ptr(uproc->tg_leader, + UPROBE_HASH_BITS)]); + mutex_unlock(&uproc_mutex); + } + put_pid(p); + survivors = quiesce_all_threads(uproc, &cur_utask_quiescing); + + if (!survivors) { + purge_uprobe(uk); + up_write(&uproc->rwsem); + uprobe_put_process(uproc, false); + return -ESRCH; + } + up_write(&uproc->rwsem); + +await_bkpt_insertion: + if (cur_utask_quiescing) + /* Current task is probing its own process. */ + (void) utask_fake_quiesce(cur_utask_quiescing); + else + wait_event(ppt->waitq, ppt->state != UPROBE_INSERTING); + ret = uk->status; + if (ret != 0) { + down_write(&uproc->rwsem); + purge_uprobe(uk); + up_write(&uproc->rwsem); + } + uprobe_put_process(uproc, false); + return ret; + +fail_uk: + uprobe_free_kimg(uk); + +fail_uproc: + if (uproc_is_new) { + uprobe_free_process(uproc, 0); + mutex_unlock(&uproc_mutex); + } else { + up_write(&uproc->rwsem); + uprobe_put_process(uproc, false); + } + +fail_tsk: + put_pid(p); + return ret; +} +EXPORT_SYMBOL_GPL(register_uprobe); + +/* See Documentation/uprobes.txt. */ +void unregister_uprobe(struct uprobe *u) +{ + struct pid *p; + struct uprobe_process *uproc; + struct uprobe_kimg *uk; + struct uprobe_probept *ppt; + struct uprobe_task *cur_utask, *cur_utask_quiescing = NULL; + struct uprobe_task *utask; + + if (!u) + return; + p = uprobe_get_tg_leader(u->pid); + if (!p) + return; + + cur_utask = uprobe_find_utask(current); + if (cur_utask && cur_utask->active_probe) { + /* Called from handler; uproc is read-locked; do this later */ + put_pid(p); + (void) defer_registration(u, 0, cur_utask); + return; + } + + /* + * Lock uproc before walking the graph, in case the process we're + * probing is exiting. + */ + mutex_lock(&uproc_mutex); + uproc = uprobe_find_process(p); + mutex_unlock(&uproc_mutex); + put_pid(p); + if (!uproc) + return; + + list_for_each_entry(utask, &uproc->thread_list, list) { + if (!utask->active_probe) + continue; + + /* See comment in register_uprobe(). */ + BUG_ON(utask->state != UPTASK_SSTEP); + if (task_is_stopped_or_traced(utask->tsk)) { + (void) defer_registration(u, 0, utask); + goto done; + } + } + uk = (struct uprobe_kimg *)u->kdata; + if (!uk) + /* + * This probe was never successfully registered, or + * has already been unregistered. + */ + goto done; + if (uk->status == -EBUSY) + /* Looks like register or unregister is already in progress. */ + goto done; + ppt = uk->ppt; + + list_del(&uk->list); + uprobe_free_kimg(uk); + + if (!list_empty(&ppt->uprobe_list)) + goto done; + + /* + * The last uprobe at ppt's probepoint is being unregistered. + * Queue the breakpoint for removal. + */ + ppt->state = UPROBE_REMOVING; + list_add_tail(&ppt->pd_node, &uproc->pending_uprobes); + + (void) quiesce_all_threads(uproc, &cur_utask_quiescing); + up_write(&uproc->rwsem); + if (cur_utask_quiescing) + /* Current task is probing its own process. */ + (void) utask_fake_quiesce(cur_utask_quiescing); + else + wait_event(ppt->waitq, ppt->state != UPROBE_REMOVING); + + if (likely(ppt->state == UPROBE_DISABLED)) { + down_write(&uproc->rwsem); + uprobe_free_probept(ppt); + /* else somebody else's register_uprobe() resurrected ppt. */ + up_write(&uproc->rwsem); + } + uprobe_put_process(uproc, false); + return; + +done: + up_write(&uproc->rwsem); + uprobe_put_process(uproc, false); +} +EXPORT_SYMBOL_GPL(unregister_uprobe); + +/* Find a surviving thread in uproc. Runs with uproc->rwsem locked. */ +static struct task_struct *find_surviving_thread(struct uprobe_process *uproc) +{ + struct uprobe_task *utask; + + list_for_each_entry(utask, &uproc->thread_list, list) { + if (!(utask->tsk->flags & PF_EXITING)) + return utask->tsk; + } + return NULL; +} + +/* + * Run all the deferred_registrations previously queued by the current utask. + * Runs with no locks or mutexes held. The current utask's uprobe_process + * is ref-counted, so it won't disappear as the result of unregister_u*probe() + * called here. + */ +static void uprobe_run_def_regs(struct list_head *drlist) +{ + struct deferred_registration *dr, *d; + + list_for_each_entry_safe(dr, d, drlist, list) { + int result = 0; + struct uprobe *u = dr->uprobe; + + if (dr->regflag) + result = register_uprobe(u); + else + unregister_uprobe(u); + if (u && u->registration_callback) + u->registration_callback(u, dr->regflag, result); + list_del(&dr->list); + kfree(dr); + } +} + +/* + * utrace engine report callbacks + */ + +/* + * We've been asked to quiesce, but aren't in a position to do so. + * This could happen in either of the following cases: + * + * 1) Our own thread is doing a register or unregister operation -- + * e.g., as called from a uprobe handler or a non-uprobes utrace + * callback. We can't wait_event() for ourselves in [un]register_uprobe(). + * + * 2) We've been asked to quiesce, but we hit a probepoint first. Now + * we're in the report_signal callback, having handled the probepoint. + * We'd like to just turn on UTRACE_EVENT(QUIESCE) and coast into + * quiescence. Unfortunately, it's possible to hit a probepoint again + * before we quiesce. When processing the SIGTRAP, utrace would call + * uprobe_report_quiesce(), which must decline to take any action so + * as to avoid removing the uprobe just hit. As a result, we could + * keep hitting breakpoints and never quiescing. + * + * So here we do essentially what we'd prefer to do in uprobe_report_quiesce(). + * If we're the last thread to quiesce, handle_pending_uprobes() and + * rouse_all_threads(). Otherwise, pretend we're quiescent and sleep until + * the last quiescent thread handles that stuff and then wakes us. + * + * Called and returns with no mutexes held. Returns 1 if we free utask->uproc, + * else 0. + */ +static int utask_fake_quiesce(struct uprobe_task *utask) +{ + struct uprobe_process *uproc = utask->uproc; + enum uprobe_task_state prev_state = utask->state; + + down_write(&uproc->rwsem); + + /* In case we're somehow set to quiesce for real... */ + clear_utrace_quiesce(utask, false); + + if (uproc->n_quiescent_threads == uproc->nthreads-1) { + /* We're the last thread to "quiesce." */ + handle_pending_uprobes(uproc, utask->tsk); + rouse_all_threads(uproc); + up_write(&uproc->rwsem); + return 0; + } else { + utask->state = UPTASK_SLEEPING; + uproc->n_quiescent_threads++; + up_write(&uproc->rwsem); + /* We ref-count sleepers. */ + uprobe_get_process(uproc); + + wait_event(uproc->waitq, !utask->quiescing); + + down_write(&uproc->rwsem); + utask->state = prev_state; + uproc->n_quiescent_threads--; + up_write(&uproc->rwsem); + + /* + * If uproc's last uprobe has been unregistered, and + * unregister_uprobe() woke up before we did, it's up + * to us to free uproc. + */ + return uprobe_put_process(uproc, false); + } +} + +/* Prepare to single-step ppt's probed instruction inline. */ +static void uprobe_pre_ssin(struct uprobe_task *utask, + struct uprobe_probept *ppt, struct pt_regs *regs) +{ + unsigned long flags; + + if (unlikely(ppt->ssil_state == SSIL_DISABLE)) { + reset_thread_ip(utask->tsk, regs, ppt->ubp.vaddr); + return; + } + spin_lock_irqsave(&ppt->ssil_lock, flags); + while (ppt->ssil_state == SSIL_SET) { + spin_unlock_irqrestore(&ppt->ssil_lock, flags); + up_read(&utask->uproc->rwsem); + wait_event(ppt->ssilq, ppt->ssil_state != SSIL_SET); + down_read(&utask->uproc->rwsem); + spin_lock_irqsave(&ppt->ssil_lock, flags); + } + if (unlikely(ppt->ssil_state == SSIL_DISABLE)) { + /* + * While waiting to single step inline, breakpoint has + * been removed. Thread continues as if nothing happened. + */ + spin_unlock_irqrestore(&ppt->ssil_lock, flags); + reset_thread_ip(utask->tsk, regs, ppt->ubp.vaddr); + return; + } + ppt->ssil_state = SSIL_SET; + spin_unlock_irqrestore(&ppt->ssil_lock, flags); + + if (unlikely(ubp_pre_sstep(utask->tsk, &ppt->ubp, + &utask->arch_info, regs) != 0)) { + printk(KERN_ERR "Failed to temporarily restore original " + "instruction for single-stepping: " + "pid/tgid=%d/%d, vaddr=%#lx\n", + utask->tsk->pid, utask->tsk->tgid, ppt->ubp.vaddr); + utask->doomed = true; + } +} + +/* Prepare to continue execution after single-stepping inline. */ +static void uprobe_post_ssin(struct uprobe_task *utask, + struct uprobe_probept *ppt, struct pt_regs *regs) +{ + unsigned long flags; + + if (unlikely(ubp_post_sstep(utask->tsk, &ppt->ubp, + &utask->arch_info, regs) != 0)) + printk("Couldn't restore bp: pid/tgid=%d/%d, addr=%#lx\n", + utask->tsk->pid, utask->tsk->tgid, ppt->ubp.vaddr); + spin_lock_irqsave(&ppt->ssil_lock, flags); + if (likely(ppt->ssil_state == SSIL_SET)) { + ppt->ssil_state = SSIL_CLEAR; + wake_up(&ppt->ssilq); + } + spin_unlock_irqrestore(&ppt->ssil_lock, flags); +} + +#ifdef CONFIG_UBP_XOL +/* + * This architecture wants to do single-stepping out of line, but now we've + * discovered that it can't -- typically because we couldn't set up the XOL + * vma. Make all probepoints use inline single-stepping. + */ +static void uproc_cancel_xol(struct uprobe_process *uproc) +{ + down_write(&uproc->rwsem); + if (likely(uproc->sstep_out_of_line)) { + /* No other task beat us to it. */ + int i; + struct uprobe_probept *ppt; + struct hlist_node *node; + struct hlist_head *head; + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { + head = &uproc->uprobe_table[i]; + hlist_for_each_entry(ppt, node, head, ut_node) { + if (!(ppt->ubp.strategy & UBP_HNT_INLINE)) + ubp_cancel_xol(current, &ppt->ubp); + } + } + /* Do this last, so other tasks don't proceed too soon. */ + uproc->sstep_out_of_line = false; + } + up_write(&uproc->rwsem); +} + +/* Prepare to single-step ppt's probed instruction out of line. */ +static int uprobe_pre_ssout(struct uprobe_task *utask, + struct uprobe_probept *ppt, struct pt_regs *regs) +{ + if (!ppt->ubp.xol_vaddr) + ppt->ubp.xol_vaddr = xol_get_insn_slot(&ppt->ubp, + ppt->uproc->xol_area); + if (unlikely(!ppt->ubp.xol_vaddr)) { + ubp_cancel_xol(utask->tsk, &ppt->ubp); + return -1; + } + utask->singlestep_addr = ppt->ubp.xol_vaddr; + return ubp_pre_sstep(utask->tsk, &ppt->ubp, &utask->arch_info, regs); +} + +/* Prepare to continue execution after single-stepping out of line. */ +static int uprobe_post_ssout(struct uprobe_task *utask, + struct uprobe_probept *ppt, struct pt_regs *regs) +{ + int ret; + + ret = ubp_post_sstep(utask->tsk, &ppt->ubp, &utask->arch_info, regs); + return ret; +} +#endif + +/* + * If this thread is supposed to be quiescing, mark it quiescent; and + * if it was the last thread to quiesce, do the work we quiesced for. + * Runs with utask->uproc->rwsem write-locked. Returns true if we can + * let this thread resume. + */ +static bool utask_quiesce(struct uprobe_task *utask) +{ + if (utask->quiescing) { + if (utask->state != UPTASK_QUIESCENT) { + utask->state = UPTASK_QUIESCENT; + utask->uproc->n_quiescent_threads++; + } + return check_uproc_quiesced(utask->uproc, current); + } else { + clear_utrace_quiesce(utask, false); + return true; + } +} + +/* + * Delay delivery of the indicated signal until after single-step. + * Otherwise single-stepping will be cancelled as part of calling + * the signal handler. + */ +static void uprobe_delay_signal(struct uprobe_task *utask, siginfo_t *info) +{ + struct delayed_signal *ds; + + ds = kmalloc(sizeof(*ds), GFP_USER); + if (ds) { + ds->info = *info; + INIT_LIST_HEAD(&ds->list); + list_add_tail(&ds->list, &utask->delayed_signals); + } +} + +static void uprobe_inject_delayed_signals(struct list_head *delayed_signals) +{ + struct delayed_signal *ds, *tmp; + + list_for_each_entry_safe(ds, tmp, delayed_signals, list) { + send_sig_info(ds->info.si_signo, &ds->info, current); + list_del(&ds->list); + kfree(ds); + } +} + +/* + * Verify from Instruction Pointer if singlestep has indeed occurred. + * If Singlestep has occurred, then do post singlestep fix-ups. + */ +static bool validate_and_post_sstep(struct uprobe_task *utask, + struct pt_regs *regs, + struct uprobe_probept *ppt) +{ + unsigned long vaddr = instruction_pointer(regs); + + if (ppt->ubp.strategy & UBP_HNT_INLINE) { + /* + * If we have singlestepped, Instruction pointer cannot + * be same as virtual address of probepoint. + */ + if (vaddr == ppt->ubp.vaddr) + return false; + uprobe_post_ssin(utask, ppt, regs); +#ifdef CONFIG_UBP_XOL + } else { + /* + * If we have executed out of line, Instruction pointer + * cannot be same as virtual address of XOL slot. + */ + if (vaddr == ppt->ubp.xol_vaddr) + return false; + uprobe_post_ssout(utask, ppt, regs); +#endif + } + return true; +} + +/* + * Helper routine for uprobe_report_signal(). + * We get called here with: + * state = UPTASK_RUNNING => we are here due to a breakpoint hit + * - Read-lock the process + * - Figure out which probepoint, based on regs->IP + * - Set state = UPTASK_BP_HIT + * - Invoke handler for each uprobe at this probepoint + * - Reset regs->IP to beginning of the insn, if necessary + * - Start watching for quiesce events, in case another + * engine cancels our UTRACE_SINGLESTEP with a + * UTRACE_STOP. + * - Set singlestep in motion (UTRACE_SINGLESTEP), + * with state = UPTASK_SSTEP + * - Read-unlock the process + * + * state = UPTASK_SSTEP => here after single-stepping + * - Read-lock the process + * - Validate we are here per the state machine + * - Clean up after single-stepping + * - Set state = UPTASK_RUNNING + * - Read-unlock the process + * - If it's time to quiesce, take appropriate action. + * - If the handler(s) we ran called [un]register_uprobe(), + * complete those via uprobe_run_def_regs(). + * + * state = ANY OTHER STATE + * - Not our signal, pass it on (UTRACE_RESUME) + */ +static u32 uprobe_handle_signal(u32 action, + struct uprobe_task *utask, + struct pt_regs *regs, + siginfo_t *info, + const struct k_sigaction *orig_ka) +{ + struct uprobe_probept *ppt; + struct uprobe_process *uproc; + struct uprobe_kimg *uk; + unsigned long probept; + enum utrace_resume_action resume_action; + enum utrace_signal_action signal_action = utrace_signal_action(action); + + uproc = utask->uproc; + + /* + * We may need to re-assert UTRACE_SINGLESTEP if this signal + * is not associated with the breakpoint. + */ + if (utask->state == UPTASK_SSTEP) + resume_action = UTRACE_SINGLESTEP; + else + resume_action = UTRACE_RESUME; + /* + * This might be UTRACE_SIGNAL_REPORT request but some other + * engine's callback might have changed the signal action to + * something other than UTRACE_SIGNAL_REPORT. Use orig_ka to figure + * out such cases. + */ + if (unlikely(signal_action == UTRACE_SIGNAL_REPORT) || !orig_ka) { + /* This thread was quiesced using UTRACE_INTERRUPT. */ + bool done_quiescing; + if (utask->active_probe) + /* + * We'll fake quiescence after we're done + * processing the probepoint. + */ + return UTRACE_SIGNAL_IGN | resume_action; + + down_write(&uproc->rwsem); + done_quiescing = utask_quiesce(utask); + up_write(&uproc->rwsem); + if (done_quiescing) + resume_action = UTRACE_RESUME; + else + resume_action = UTRACE_STOP; + return UTRACE_SIGNAL_IGN | resume_action; + } + + /* + * info will be null if we're called with action=UTRACE_SIGNAL_HANDLER, + * which means that single-stepping has been disabled so a signal + * handler can be called in the probed process. That should never + * happen because we intercept and delay handled signals (action = + * UTRACE_RESUME) until after we're done single-stepping. + */ + BUG_ON(!info); + if (signal_action == UTRACE_SIGNAL_DELIVER && utask->active_probe && + info->si_signo != SSTEP_SIGNAL) { + uprobe_delay_signal(utask, info); + return UTRACE_SIGNAL_IGN | UTRACE_SINGLESTEP; + } + + if (info->si_signo != BREAKPOINT_SIGNAL && + info->si_signo != SSTEP_SIGNAL) + goto no_interest; + + switch (utask->state) { + case UPTASK_RUNNING: + if (info->si_signo != BREAKPOINT_SIGNAL) + goto no_interest; + +#ifdef CONFIG_UBP_XOL + /* + * Set up the XOL area if it's not already there. We + * do this here because we have to do it before + * handling the first probepoint hit, the probed + * process has to do it, and this may be the first + * time our probed process runs uprobes code. We need + * the XOL area for the uretprobe trampoline even if + * this architectures doesn't single-step out of line. + */ + if (uproc->sstep_out_of_line && !uproc->xol_area) { + uproc->xol_area = xol_get_area(uproc->tg_leader); + if (unlikely(uproc->sstep_out_of_line) && + unlikely(!uproc->xol_area)) + uproc_cancel_xol(uproc); + } +#endif + + down_read(&uproc->rwsem); + /* Don't quiesce while running handlers. */ + clear_utrace_quiesce(utask, false); + probept = ubp_get_bkpt_addr(regs); + ppt = uprobe_find_probept(uproc, probept); + if (!ppt) { + up_read(&uproc->rwsem); + goto no_interest; + } + utask->active_probe = ppt; + utask->state = UPTASK_BP_HIT; + + if (likely(ppt->state == UPROBE_BP_SET)) { + list_for_each_entry(uk, &ppt->uprobe_list, list) { + struct uprobe *u = uk->uprobe; + if (u->handler) + u->handler(u, regs); + } + } + +#ifdef CONFIG_UBP_XOL + if ((ppt->ubp.strategy & UBP_HNT_INLINE) || + uprobe_pre_ssout(utask, ppt, regs) != 0) +#endif + uprobe_pre_ssin(utask, ppt, regs); + if (unlikely(utask->doomed)) { + utask->active_probe = NULL; + utask->state = UPTASK_RUNNING; + up_read(&uproc->rwsem); + goto no_interest; + } + utask->state = UPTASK_SSTEP; + /* In case another engine cancels our UTRACE_SINGLESTEP... */ + utask_adjust_flags(utask, UPROBE_SET_FLAGS, + UTRACE_EVENT(QUIESCE)); + /* Don't deliver this signal to the process. */ + resume_action = UTRACE_SINGLESTEP; + signal_action = UTRACE_SIGNAL_IGN; + + up_read(&uproc->rwsem); + break; + + case UPTASK_SSTEP: + if (info->si_signo != SSTEP_SIGNAL) + goto no_interest; + + down_read(&uproc->rwsem); + ppt = utask->active_probe; + BUG_ON(!ppt); + + /* + * Havent singlestepped yet? then re-assert + * UTRACE_SINGLESTEP. + */ + if (!validate_and_post_sstep(utask, regs, ppt)) { + up_read(&uproc->rwsem); + goto no_interest; + } + + /* No further need to re-assert UTRACE_SINGLESTEP. */ + clear_utrace_quiesce(utask, false); + + utask->active_probe = NULL; + utask->state = UPTASK_RUNNING; + if (unlikely(utask->doomed)) { + up_read(&uproc->rwsem); + goto no_interest; + } + + if (utask->quiescing) { + int uproc_freed; + up_read(&uproc->rwsem); + uproc_freed = utask_fake_quiesce(utask); + BUG_ON(uproc_freed); + } else + up_read(&uproc->rwsem); + + /* + * We hold a ref count on uproc, so this should never + * make utask or uproc disappear. + */ + uprobe_run_def_regs(&utask->deferred_registrations); + + uprobe_inject_delayed_signals(&utask->delayed_signals); + + resume_action = UTRACE_RESUME; + signal_action = UTRACE_SIGNAL_IGN; + break; + default: + goto no_interest; + } + +no_interest: + return signal_action | resume_action; +} + +/* + * Signal callback: + */ +static u32 uprobe_report_signal(u32 action, + struct utrace_engine *engine, + struct pt_regs *regs, + siginfo_t *info, + const struct k_sigaction *orig_ka, + struct k_sigaction *return_ka) +{ + struct uprobe_task *utask; + struct uprobe_process *uproc; + bool doomed; + enum utrace_resume_action report_action; + + utask = (struct uprobe_task *)rcu_dereference(engine->data); + BUG_ON(!utask); + uproc = utask->uproc; + + /* Keep uproc intact until just before we return. */ + uprobe_get_process(uproc); + report_action = uprobe_handle_signal(action, utask, regs, info, + orig_ka); + doomed = utask->doomed; + + if (uprobe_put_process(uproc, true)) + report_action = utrace_signal_action(report_action) | + UTRACE_DETACH; + if (doomed) + do_exit(SIGSEGV); + return report_action; +} + +/* + * Quiesce callback: The associated process has one or more breakpoint + * insertions or removals pending. If we're the last thread in this + * process to quiesce, do the insertion(s) and/or removal(s). + */ +static u32 uprobe_report_quiesce(u32 action, + struct utrace_engine *engine, + unsigned long event) +{ + struct uprobe_task *utask; + struct uprobe_process *uproc; + bool done_quiescing = false; + + utask = (struct uprobe_task *)rcu_dereference(engine->data); + BUG_ON(!utask); + + if (utask->state == UPTASK_SSTEP) + /* + * We got a breakpoint trap and tried to single-step, + * but somebody else's report_signal callback overrode + * our UTRACE_SINGLESTEP with a UTRACE_STOP. Try again. + */ + return UTRACE_SINGLESTEP; + + BUG_ON(utask->active_probe); + uproc = utask->uproc; + down_write(&uproc->rwsem); + done_quiescing = utask_quiesce(utask); + up_write(&uproc->rwsem); + return done_quiescing ? UTRACE_RESUME : UTRACE_STOP; +} + +/* + * uproc's process is exiting or exec-ing. Runs with uproc->rwsem + * write-locked. Caller must ref-count uproc before calling this + * function, to ensure that uproc doesn't get freed in the middle of + * this. + */ +static void uprobe_cleanup_process(struct uprobe_process *uproc) +{ + struct hlist_node *pnode1, *pnode2; + struct uprobe_kimg *uk, *unode; + struct uprobe_probept *ppt; + struct hlist_head *head; + int i; + + uproc->finished = true; + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { + head = &uproc->uprobe_table[i]; + hlist_for_each_entry_safe(ppt, pnode1, pnode2, head, ut_node) { + if (ppt->state == UPROBE_INSERTING || + ppt->state == UPROBE_REMOVING) { + /* + * This task is (exec/exit)ing with + * a [un]register_uprobe pending. + * [un]register_uprobe will free ppt. + */ + ppt->state = UPROBE_DISABLED; + list_del(&ppt->pd_node); + list_for_each_entry_safe(uk, unode, + &ppt->uprobe_list, list) + uk->status = -ESRCH; + wake_up_all(&ppt->waitq); + } else if (ppt->state == UPROBE_BP_SET) { + list_for_each_entry_safe(uk, unode, + &ppt->uprobe_list, list) { + list_del(&uk->list); + uprobe_free_kimg(uk); + } + uprobe_free_probept(ppt); + /* else */ + /* + * If ppt is UPROBE_DISABLED, assume that + * [un]register_uprobe() has been notified + * and will free it soon. + */ + } + } + } +} + +static u32 uprobe_exec_exit(struct utrace_engine *engine, + struct task_struct *tsk, int exit) +{ + struct uprobe_process *uproc; + struct uprobe_probept *ppt; + struct uprobe_task *utask; + bool utask_quiescing; + + utask = (struct uprobe_task *)rcu_dereference(engine->data); + uproc = utask->uproc; + uprobe_get_process(uproc); + + ppt = utask->active_probe; + if (ppt) { + printk(KERN_WARNING "Task handler called %s while at uprobe" + " probepoint: pid/tgid = %d/%d, probepoint" + " = %#lx\n", (exit ? "exit" : "exec"), + tsk->pid, tsk->tgid, ppt->ubp.vaddr); + /* + * Mutex cleanup depends on where do_execve()/do_exit() was + * called and on ubp strategy (XOL vs. SSIL). + */ + if (ppt->ubp.strategy & UBP_HNT_INLINE) { + switch (utask->state) { + unsigned long flags; + case UPTASK_SSTEP: + spin_lock_irqsave(&ppt->ssil_lock, flags); + ppt->ssil_state = SSIL_CLEAR; + wake_up(&ppt->ssilq); + spin_unlock_irqrestore(&ppt->ssil_lock, flags); + break; + default: + break; + } + } + if (utask->state == UPTASK_BP_HIT) { + /* uprobe handler called do_exit()/do_execve(). */ + up_read(&uproc->rwsem); + uprobe_decref_process(uproc); + } + } + + down_write(&uproc->rwsem); + utask_quiescing = utask->quiescing; + uproc->nthreads--; + if (utrace_set_events_pid(utask->pid, engine, 0)) + /* We don't care. */ + ; + uprobe_free_task(utask, 1); + if (uproc->nthreads) { + /* + * In case other threads are waiting for us to quiesce... + */ + if (utask_quiescing) + (void) check_uproc_quiesced(uproc, + find_surviving_thread(uproc)); + } else + /* + * We were the last remaining thread - clean up the uprobe + * remnants a la unregister_uprobe(). We don't have to + * remove the breakpoints, though. + */ + uprobe_cleanup_process(uproc); + + up_write(&uproc->rwsem); + uprobe_put_process(uproc, true); + return UTRACE_DETACH; +} + +/* + * Exit callback: The associated task/thread is exiting. + */ +static u32 uprobe_report_exit(u32 action, + struct utrace_engine *engine, + long orig_code, long *code) +{ + return uprobe_exec_exit(engine, current, 1); +} +/* + * Clone callback: The current task has spawned a thread/process. + * Utrace guarantees that parent and child pointers will be valid + * for the duration of this callback. + * + * NOTE: For now, we don't pass on uprobes from the parent to the + * child. We now do the necessary clearing of breakpoints in the + * child's address space. + * + * TODO: + * - Provide option for child to inherit uprobes. + */ +static u32 uprobe_report_clone(u32 action, + struct utrace_engine *engine, + unsigned long clone_flags, + struct task_struct *child) +{ + struct uprobe_process *uproc; + struct uprobe_task *ptask, *ctask; + + ptask = (struct uprobe_task *)rcu_dereference(engine->data); + uproc = ptask->uproc; + + /* + * Lock uproc so no new uprobes can be installed 'til all + * report_clone activities are completed. + */ + mutex_lock(&uproc_mutex); + down_write(&uproc->rwsem); + + if (clone_flags & CLONE_THREAD) { + /* New thread in the same process. */ + ctask = uprobe_find_utask(child); + if (unlikely(ctask)) { + /* + * uprobe_mk_process() ran just as this clone + * happened, and has already accounted for the + * new child. + */ + } else { + struct pid *child_pid = get_pid(task_pid(child)); + BUG_ON(!child_pid); + ctask = uprobe_add_task(child_pid, uproc); + BUG_ON(!ctask); + if (IS_ERR(ctask)) + goto done; + uproc->nthreads++; + /* + * FIXME: Handle the case where uproc is quiescing + * (assuming it's possible to clone while quiescing). + */ + } + } else { + /* + * New process spawned by parent. Remove the probepoints + * in the child's text. + * + * Its not necessary to quiesce the child as we are assured + * by utrace that this callback happens *before* the child + * gets to run userspace. + * + * We also hold the uproc->rwsem for the parent - so no + * new uprobes will be registered 'til we return. + */ + int i; + struct uprobe_probept *ppt; + struct hlist_node *node; + struct hlist_head *head; + + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { + head = &uproc->uprobe_table[i]; + hlist_for_each_entry(ppt, node, head, ut_node) { + if (ubp_remove_bkpt(child, &ppt->ubp) != 0) { + /* Ratelimit this? */ + printk(KERN_ERR "Pid %d forked %d;" + " failed to remove probepoint" + " at %#lx in child\n", + current->pid, child->pid, + ppt->ubp.vaddr); + } + } + } + } + +done: + up_write(&uproc->rwsem); + mutex_unlock(&uproc_mutex); + return UTRACE_RESUME; +} + +/* + * Exec callback: The associated process called execve() or friends + * + * The new program is about to start running and so there is no + * possibility of a uprobe from the previous user address space + * to be hit. + * + * NOTE: + * Typically, this process would have passed through the clone + * callback, where the necessary action *should* have been + * taken. However, if we still end up at this callback: + * - We don't have to clear the uprobes - memory image + * will be overlaid. + * - We have to free up uprobe resources associated with + * this process. + */ +static u32 uprobe_report_exec(u32 action, + struct utrace_engine *engine, + const struct linux_binfmt *fmt, + const struct linux_binprm *bprm, + struct pt_regs *regs) +{ + return uprobe_exec_exit(engine, current, 0); +} + +static const struct utrace_engine_ops uprobe_utrace_ops = { + .report_quiesce = uprobe_report_quiesce, + .report_signal = uprobe_report_signal, + .report_exit = uprobe_report_exit, + .report_clone = uprobe_report_clone, + .report_exec = uprobe_report_exec +}; + +static int __init init_uprobes(void) +{ + int ret, i; + + ubp_strategies = UBP_HNT_TSKINFO; + ret = ubp_init(&ubp_strategies); + if (ret != 0) { + printk(KERN_ERR "Can't start uprobes: ubp_init() returned %d\n", + ret); + return ret; + } + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { + INIT_HLIST_HEAD(&uproc_table[i]); + INIT_HLIST_HEAD(&utask_table[i]); + } + + p_uprobe_utrace_ops = &uprobe_utrace_ops; + return 0; +} + +static void __exit exit_uprobes(void) +{ +} + +module_init(init_uprobes); +module_exit(exit_uprobes); From srikar at linux.vnet.ibm.com Mon Jan 11 12:25:58 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:55:58 +0530 Subject: [RFC] [PATCH 5/7] X86 Support for Uprobes In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122558.22050.431.sendpatchset@srikar.in.ibm.com> [PATCH] x86 support for Uprobes Signed-off-by: Jim Keniston --- arch/x86/Kconfig | 1 + arch/x86/include/asm/uprobes.h | 27 +++++++++++++++++++++++++++ 2 files changed, 28 insertions(+) Index: new_uprobes.git/arch/x86/Kconfig =================================================================== --- new_uprobes.git.orig/arch/x86/Kconfig +++ new_uprobes.git/arch/x86/Kconfig @@ -51,6 +51,7 @@ config X86 select HAVE_KERNEL_LZMA select HAVE_HW_BREAKPOINT select HAVE_UBP + select HAVE_UPROBES select HAVE_ARCH_KMEMCHECK select HAVE_USER_RETURN_NOTIFIER Index: new_uprobes.git/arch/x86/include/asm/uprobes.h =================================================================== --- /dev/null +++ new_uprobes.git/arch/x86/include/asm/uprobes.h @@ -0,0 +1,27 @@ +#ifndef _ASM_UPROBES_H +#define _ASM_UPROBES_H +/* + * Userspace Probes (UProbes) + * uprobes.h + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2008, 2009 + */ +#include + +#define BREAKPOINT_SIGNAL SIGTRAP +#define SSTEP_SIGNAL SIGTRAP +#endif /* _ASM_UPROBES_H */ From srikar at linux.vnet.ibm.com Mon Jan 11 12:26:03 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:56:03 +0530 Subject: [RFC] [PATCH 6/7] Uprobes Documentation In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122603.22050.1039.sendpatchset@srikar.in.ibm.com> Uprobes documentation Signed-off-by: Jim Keniston --- Documentation/uprobes.txt | 460 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 460 insertions(+) Index: new_uprobes.git/Documentation/uprobes.txt =================================================================== --- /dev/null +++ new_uprobes.git/Documentation/uprobes.txt @@ -0,0 +1,460 @@ +Title : User-Space Probes (Uprobes) +Author : Jim Keniston + +CONTENTS + +1. Concepts: Uprobes +2. Architectures Supported +3. Configuring Uprobes +4. API Reference +5. Uprobes Features and Limitations +6. Interoperation with Kprobes +7. Interoperation with Utrace +8. Probe Overhead +9. TODO +10. Uprobes Team +11. Uprobes Example + +1. Concepts: Uprobes + +Uprobes enables you to dynamically break into any routine in a +user application and collect debugging and performance information +non-disruptively. You can trap at any code address, specifying a +kernel handler routine to be invoked when the breakpoint is hit. + +A uprobe can be inserted on any instruction in the application's +virtual address space. The registration function +register_uprobe() specifies which process is to be probed, where +the probe is to be inserted, and what handler is to be called when +the probe is hit. + +Typically, Uprobes-based instrumentation is packaged as a kernel +module. In the simplest case, the module's init function installs +("registers") one or more probes, and the exit function unregisters +them. However, probes can be registered or unregistered in response +to other events as well. For example: +- A probe handler itself can register and/or unregister probes. +- You can establish Utrace callbacks to register and/or unregister +probes when a particular process forks, clones a thread, +execs, enters a system call, receives a signal, exits, etc. +See the utrace documentation in Documentation/DocBook. + +1.1 How Does a Uprobe Work? + +When a uprobe is registered, Uprobes makes a copy of the probed +instruction, stops the probed application, replaces the first byte(s) +of the probed instruction with a breakpoint instruction (e.g., int3 +on i386 and x86_64), and allows the probed application to continue. +(When inserting the breakpoint, Uprobes uses the same copy-on-write +mechanism that ptrace uses, so that the breakpoint affects only that +process, and not any other process running that program. This is +true even if the probed instruction is in a shared library.) + +When a CPU hits the breakpoint instruction, a trap occurs, the CPU's +user-mode registers are saved, and a SIGTRAP signal is generated. +Uprobes intercepts the SIGTRAP and finds the associated uprobe. +It then executes the handler associated with the uprobe, passing the +handler the addresses of the uprobe struct and the saved registers. +The handler may block, but keep in mind that the probed thread remains +stopped while your handler runs. + +Next, Uprobes single-steps its copy of the probed instruction and +resumes execution of the probed process at the instruction following +the probepoint. (It would be simpler to single-step the actual +instruction in place, but then Uprobes would have to temporarily +remove the breakpoint instruction. This would create problems in a +multithreaded application. For example, it would open a time window +when another thread could sail right past the probepoint.) + +Instruction copies to be single-stepped are stored in a per-process +"single-step out of line (XOL) area," which is a little VM area +created by Uprobes in each probed process's address space. + +1.2 The Role of Utrace + +When a probe is registered on a previously unprobed process, +Uprobes establishes a tracing "engine" with Utrace (see +Documentation/utrace.txt) for each thread (task) in the process. +Uprobes uses the Utrace "quiesce" mechanism to stop all the threads +prior to insertion or removal of a breakpoint. Utrace also notifies +Uprobes of breakpoint and single-step traps and of other interesting +events in the lifetime of the probed process, such as fork, clone, +exec, and exit. + +1.3 Multithreaded Applications + +Uprobes supports the probing of multithreaded applications. Uprobes +imposes no limit on the number of threads in a probed application. +All threads in a process use the same text pages, so every probe +in a process affects all threads; of course, each thread hits the +probepoint (and runs the handler) independently. Multiple threads +may run the same handler simultaneously. If you want a particular +thread or set of threads to run a particular handler, your handler +should check current or current->pid to determine which thread has +hit the probepoint. + +When a process clones a new thread, that thread automatically shares +all current and future probes established for that process. + +Keep in mind that when you register or unregister a probe, the +breakpoint is not inserted or removed until Utrace has stopped all +threads in the process. The register/unregister function returns +after the breakpoint has been inserted/removed (but see the next +section). + +1.5 Registering Probes within Probe Handlers + +A uprobe handler can call [un]register_uprobe() functions. +A handler can even unregister its own probe. However, when invoked +from a handler, the actual [un]register operations do not take +place immediately. Rather, they are queued up and executed after +all handlers for that probepoint have been run. In the handler, +the [un]register call returns -EINPROGRESS. If you set the +registration_callback field in the uprobe object, that callback will +be called when the [un]register operation completes. + +2. Architectures Supported + +This ubp-based version of Uprobes is implemented on the following +architectures: + +- x86 + +3. Configuring Uprobes + +When configuring the kernel using make menuconfig/xconfig/oldconfig, +ensure that CONFIG_UPROBES is set to "y". Select "Infrastructure for +tracing and debugging user processes" to enable Utrace. Under "General +setup" select "User-space breakpoint assistance" then select +"User-space probes". + +So that you can load and unload Uprobes-based instrumentation modules, +make sure "Loadable module support" (CONFIG_MODULES) and "Module +unloading" (CONFIG_MODULE_UNLOAD) are set to "y". + +4. API Reference + +The Uprobes API includes a "register" function and an "unregister" +function for uprobes. Here are terse, mini-man-page specifications for +these functions and the associated probe handlers that you'll write. +See the latter half of this document for examples. + +4.1 register_uprobe + +#include +int register_uprobe(struct uprobe *u); + +Sets a breakpoint at virtual address u->vaddr in the process whose +pid is u->pid. When the breakpoint is hit, Uprobes calls u->handler. + +register_uprobe() returns 0 on success, -EINPROGRESS if +register_uprobe() was called from a uprobe handler (and therefore +delayed), or a negative errno otherwise. + +Section 4.4, "User's Callback for Delayed Registrations", +explains how to be notified upon completion of a delayed +registration. + +User's handler (u->handler): +#include +#include +void handler(struct uprobe *u, struct pt_regs *regs); + +Called with u pointing to the uprobe associated with the breakpoint, +and regs pointing to the struct containing the registers saved when +the breakpoint was hit. + +4.3 unregister_uprobe + +#include +void unregister_uprobe(struct uprobe *u); + +Removes the specified probe. The unregister function can be called +at any time after the probe has been registered, and can be called +from a uprobe handler. + +4.4 User's Callback for Delayed Registrations + +#include +void registration_callback(struct uprobe *u, int reg, int result); + +As previously mentioned, the functions described in Section 4 can +be called from within a uprobe. When that happens, the +[un]registration operation is delayed until all handlers +associated with that handler's probepoint have been run. Upon +completion of the [un]registration operation, Uprobes checks the +registration_callback member of the associated uprobe: +u->registration_callback for [un]register_uprobe. Uprobes calls +that callback function, if any, passing it the following values: + +- u = the address of the uprobe object. + +- reg = 1 for register_uprobe() or 0 for unregister_uprobe() + +- result = the return value that register_uprobe() would have +returned if this weren't a delayed operation. This is always 0 +for unregister_uprobe(). + +NOTE: Uprobes calls the registration_callback ONLY in the case of a +delayed [un]registration. + +5. Uprobes Features and Limitations + +The user is expected to assign values to the following members +of struct uprobe: pid, vaddr, handler, and (as needed) +registration_callback. Other members are reserved for Uprobes's use. +Uprobes may produce unexpected results if you: +- assign non-zero values to reserved members of struct uprobe; +- change the contents of a uprobe object while it is registered; or +- attempt to register a uprobe that is already registered. + +Uprobes allows any number of uprobes at a particular address. For +a particular probepoint, handlers are run in the order in which +they were registered. + +Any number of kernel modules may probe a particular process +simultaneously, and a particular module may probe any number of +processes simultaneously. + +Probes are shared by all threads in a process (including newly +created threads). + +If a probed process exits or execs, Uprobes automatically +unregisters all uprobes associated with that process. Subsequent +attempts to unregister these probes will be treated as no-ops. + +On the other hand, if a probed memory area is removed from the +process's virtual memory map (e.g., via dlclose(3) or munmap(2)), +it's currently up to you to unregister the probes first. + +There is no way to specify that probes should be inherited across fork; +Uprobes removes all probepoints in the newly created child process. +See Section 7, "Interoperation with Utrace", for more information on +this topic. + +On at least some architectures, Uprobes makes no attempt to verify +that the probe address you specify actually marks the start of an +instruction. If you get this wrong, chaos may ensue. + +To avoid interfering with interactive debuggers, Uprobes will refuse +to insert a probepoint where a breakpoint instruction already exists, +unless it was Uprobes that put it there. Some architectures may +refuse to insert probes on other types of instructions. + +If you install a probe in an inline-able function, Uprobes makes +no attempt to chase down all inline instances of the function and +install probes there. gcc may inline a function without being asked, +so keep this in mind if you're not seeing the probe hits you expect. + +A probe handler can modify the environment of the probed function +-- e.g., by modifying data structures, or by modifying the +contents of the pt_regs struct (which are restored to the registers +upon return from the breakpoint). So Uprobes can be used, for example, +to install a bug fix or to inject faults for testing. Uprobes, of +course, has no way to distinguish the deliberately injected faults +from the accidental ones. Don't drink and probe. + +When you register the first probe at probepoint or unregister the +last probe probe at a probepoint, Uprobes asks Utrace to "quiesce" +the probed process so that Uprobes can insert or remove the breakpoint +instruction. If the process is not already stopped, Utrace stops it. +If the process is entering an interruptible system call at that instant, +this may cause the system call to finish early or fail with EINTR. + +When Uprobes establishes a probepoint on a previous unprobed page +of text, Linux creates a new copy of the page via its copy-on-write +mechanism. When probepoints are removed, Uprobes makes no attempt +to consolidate identical copies of the same page. This could affect +memory availability if you probe many, many pages in many, many +long-running processes. + +6. Interoperation with Kprobes + +Uprobes is intended to interoperate usefully with Kprobes (see +Documentation/kprobes.txt). For example, an instrumentation module +can make calls to both the Kprobes API and the Uprobes API. + +A uprobe handler can register or unregister kprobes, +jprobes, and kretprobes, as well as uprobes. On the +other hand, a kprobe, jprobe, or kretprobe handler must not sleep, and +therefore cannot register or unregister any of these types of probes. +(Ideas for removing this restriction are welcome.) + +Note that the overhead of a uprobe hit is several times that of +a k[ret]probe hit. + +7. Interoperation with Utrace + +As mentioned in Section 1.2, Uprobes is a client of Utrace. For each +probed thread, Uprobes establishes a Utrace engine, and registers +callbacks for the following types of events: clone/fork, exec, exit, +and "core-dump" signals (which include breakpoint traps). Uprobes +establishes this engine when the process is first probed, or when +Uprobes is notified of the thread's creation, whichever comes first. + +An instrumentation module can use both the Utrace and Uprobes APIs (as +well as Kprobes). When you do this, keep the following facts in mind: + +- For a particular event, Utrace callbacks are called in the order in +which the engines are established. Utrace does not currently provide +a mechanism for altering this order. + +- When Uprobes learns that a probed process has forked, it removes +the breakpoints in the child process. + +- When Uprobes learns that a probed process has exec-ed or exited, +it disposes of its data structures for that process (first allowing +any outstanding [un]registration operations to terminate). + +- When a probed thread hits a breakpoint or completes single-stepping +of a probed instruction, engines with the UTRACE_EVENT(SIGNAL_CORE) +flag set are notified. + +If you want to establish probes in a newly forked child, you can use +the following procedure: + +- Register a report_clone callback with Utrace. In this callback, +the CLONE_THREAD flag distinguishes between the creation of a new +thread vs. a new process. + +- In your report_clone callback, call utrace_attach_task() to attach to +the child process, and call utrace_control(..., UTRACE_REPORT) +The child process will quiesce at a point where it is ready to +be probed. + +- In your report_quiesce callback, register the desired probes. +(Note that you cannot use the same probe object for both parent +and child. If you want to duplicate the probepoints, you must +create a new set of uprobe objects.) + +8. Probe Overhead + +On a typical CPU in use in 2007, a uprobe hit takes about 3 +microseconds to process. Specifically, a benchmark that hits the +same probepoint repeatedly, firing a simple handler each time, reports +300,000 to 350,000 hits per second, depending on the architecture. + +Here are sample overhead figures (in usec) for x86 architecture. + +x86: Intel Pentium M, 1495 MHz, 2957.31 bogomips +uprobe = 2.9 usec; + +9. TODO + +a. Support for other architectures. +b. Support return probes. + +10. Uprobes Team + +The following people have made major contributions to Uprobes: +Jim Keniston - jkenisto at us.ibm.com +Srikar Dronamraju - srikar at linux.vnet.ibm.com +Ananth Mavinakayanahalli - ananth at in.ibm.com +Prasanna Panchamukhi - prasanna at in.ibm.com +Dave Wilder - dwilder at us.ibm.com + +11. Uprobes Example + +Here's a sample kernel module showing the use of Uprobes to count the +number of times an instruction at a particular address is executed, +and optionally (unless verbose=0) report each time it's executed. +----- cut here ----- +/* uprobe_example.c */ +#include +#include +#include +#include + +/* + * Usage: insmod uprobe_example.ko pid= vaddr=
[verbose=0] + * where identifies the probed process and
is the virtual + * address of the probed instruction. + */ + +static int pid = 0; +module_param(pid, int, 0); +MODULE_PARM_DESC(pid, "pid"); + +static int verbose = 1; +module_param(verbose, int, 0); +MODULE_PARM_DESC(verbose, "verbose"); + +static long vaddr = 0; +module_param(vaddr, long, 0); +MODULE_PARM_DESC(vaddr, "vaddr"); + +static int nhits; +static struct uprobe usp; + +static void uprobe_handler(struct uprobe *u, struct pt_regs *regs) +{ + nhits++; + if (verbose) + printk(KERN_INFO "Hit #%d on probepoint at %#lx\n", + nhits, u->vaddr); +} + +int __init init_module(void) +{ + int ret; + usp.pid = pid; + usp.vaddr = vaddr; + usp.handler = uprobe_handler; + printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n", + usp.pid, usp.vaddr); + ret = register_uprobe(&usp); + if (ret != 0) { + printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret); + return ret; + } + return 0; +} + +void __exit cleanup_module(void) +{ + printk(KERN_INFO "Unregistering uprobe on pid %d, vaddr %#lx\n", + usp.pid, usp.vaddr); + printk(KERN_INFO "Probepoint was hit %d times\n", nhits); + unregister_uprobe(&usp); +} +MODULE_LICENSE("GPL"); +----- cut here ----- + +You can build the kernel module, uprobe_example.ko, using the following +Makefile: +----- cut here ----- +obj-m := uprobe_example.o +KDIR := /lib/modules/$(shell uname -r)/build +PWD := $(shell pwd) +default: + $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules +clean: + rm -f *.mod.c *.ko *.o .*.cmd + rm -rf .tmp_versions +----- cut here ----- + +For example, if you want to run myprog and monitor its calls to myfunc(), +you can do the following: + +$ make // Build the uprobe_example module. +... +$ nm -p myprog | awk '$3=="myfunc"' +080484a8 T myfunc +$ ./myprog & +$ ps + PID TTY TIME CMD + 4367 pts/3 00:00:00 bash + 8156 pts/3 00:00:00 myprog + 8157 pts/3 00:00:00 ps +$ su - +... +# insmod uprobe_example.ko pid=8156 vaddr=0x080484a8 + +In /var/log/messages and on the console, you will see a message of the +form "kernel: Hit #1 on probepoint at 0x80484a8" each time myfunc() +is called. To turn off probing, remove the module: + +# rmmod uprobe_example + +In /var/log/messages and on the console, you will see a message of the +form "Probepoint was hit 5 times". From srikar at linux.vnet.ibm.com Mon Jan 11 12:26:08 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:56:08 +0530 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> This patch implements ftrace plugin for uprobes. Description: Ftrace plugin provides an interface to dump data at a given address, top of the stack and function arguments when a user program calls a specific function. To dump the data at a given address issue echo up
D >>/sys/kernel/tracing/uprobes_events To dump the data from top of stack issue echo up
S >>/sys/kernel/tracing/uprobes_events To dump the function arguments issue echo up
A >>/sys/kernel/tracing/uprobes_events D => Dump the data at a given address. S => Dump the data from top of stack. A => Dump probed function arguments. Supported only for x86_64 arch. For example: Input: $ cd /sys/kernel/debug/tracing/ $ echo "up 6424 0x4004d8 S 100" > uprobe_events $ echo "up 6424 0x4004d8 D 0x7fff6bf587d0 35" >> uprobe_events $ echo "up 6424 0x4004d8 A 5" >> uprobe_events $ $ cat uprobe_events up 6424 0x4004d8 S 100 up 6424 0x4004d8 D 7fff6bf587d0 35 up 6424 0x4004d8 A 5 $ $ echo 1 > tracing_on Output: $ cat trace ! tracer: nop ! ! TASK-PID CPU# TIMESTAMP FUNCTION ! | | | | | <...>-6424 [004] 1156.853343: : 0x4004d8: S 0x7fff6bf587a8: 31 06 40 00 00 00 00 00 1. at ..... <...>-6424 [004] 1156.853348: : 0x4004d8: S 0x7fff6bf587b0: 00 00 00 00 00 00 00 00 ........ <...>-6424 [004] 1156.853350: : 0x4004d8: S 0x7fff6bf587b8: c0 bb c1 4a 3b 00 00 00 ...J;... <...>-6424 [004] 1156.853352: : 0x4004d8: S 0x7fff6bf587c0: 50 06 40 00 c8 00 00 00 P. at ..... <...>-6424 [004] 1156.853353: : 0x4004d8: S 0x7fff6bf587c8: ed 00 00 ff 00 00 00 00 ........ <...>-6424 [004] 1156.853355: : 0x4004d8: S 0x7fff6bf587d0: 54 68 69 73 20 73 74 72 This str <...>-6424 [004] 1156.853357: : 0x4004d8: S 0x7fff6bf587d8: 69 6e 67 20 69 73 20 6f ing is o <...>-6424 [004] 1156.853359: : 0x4004d8: S 0x7fff6bf587e0: 6e 20 74 68 65 20 73 74 n the st <...>-6424 [004] 1156.853361: : 0x4004d8: S 0x7fff6bf587e8: 61 63 6b 20 69 6e 20 6d ack in m <...>-6424 [004] 1156.853363: : 0x4004d8: S 0x7fff6bf587f0: 61 69 6e 00 00 00 00 00 ain..... <...>-6424 [004] 1156.853364: : 0x4004d8: S 0x7fff6bf587f8: 00 00 00 00 04 00 00 00 ........ <...>-6424 [004] 1156.853366: : 0x4004d8: S 0x7fff6bf58800: ff ff ff ff ff ff ff ff ........ <...>-6424 [004] 1156.853367: : 0x4004d8: S 0x7fff6bf58808: 00 00 00 00 .... <...>-6424 [004] 1156.853388: : 0x4004d8: D 0x7fff6bf587d0: 54 68 69 73 20 73 74 72 This str <...>-6424 [004] 1156.853389: : 0x4004d8: D 0x7fff6bf587d8: 69 6e 67 20 69 73 20 6f ing is o <...>-6424 [004] 1156.853391: : 0x4004d8: D 0x7fff6bf587e0: 6e 20 74 68 65 20 73 74 n the st <...>-6424 [004] 1156.853393: : 0x4004d8: D 0x7fff6bf587e8: 61 63 6b 20 69 6e 20 6d ack in m <...>-6424 [004] 1156.853394: : 0x4004d8: D 0x7fff6bf587f0: 61 69 6e ain <...>-6424 [004] 1156.853398: : 0x4004d8: A ARG 1: 0000000000000004 <...>-6424 [004] 1156.853399: : 0x4004d8: A ARG 2: 00000000000000c8 <...>-6424 [004] 1156.853400: : 0x4004d8: A ARG 3: 00000000ff0000ed <...>-6424 [004] 1156.853401: : 0x4004d8: A ARG 4: ffffffffffffffff <...>-6424 [004] 1156.853402: : 0x4004d8: A ARG 5: 0000000000000048 TODO: - use ringbuffer - Allow user to specify Nick Name for probe addresses. - Dump arguments from floating point registers. - Optimize code to use single probe instead of multiple probes for same probe addresses. -- Signed-off-by: Mahesh Salgaonkar Signed-off-by: Srikar Dronamraju --- Documentation/trace/uprobes_trace.txt | 197 ++++++++++++ kernel/trace/Makefile | 1 + kernel/trace/trace_uprobes.c | 537 +++++++++++++++++++++++++++++++++ 3 files changed, 735 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/uprobes_trace.txt b/Documentation/trace/uprobes_trace.txt new file mode 100644 index 0000000..3c4482b --- /dev/null +++ b/Documentation/trace/uprobes_trace.txt @@ -0,0 +1,197 @@ + Uprobes based Event Tracer + ========================== + + Mahesh J Salgaonkar + +Overview +-------- +This tracer, based on uprobes, enables a user to put a probe anywhere in the +user process and dump values from user specified data address or from the top +of the stack frame when the probe is hit. + +For 64-bit processes on x86_64, the tracer can also report function arguments +when the probe is hit. Currently, this feature is not supported for 32-bit +processes. + +To activate this tracer just set a probe via +/sys/kernel/debug/tracing/uprobe_events and traced information can be seen via +/sys/kernel/debug/tracing/trace. + +User can specify probes for multiple processes concurrently. + +Synopsis +-------- +up [] {|} + +up : set a user probe + : Process ID. + : Instruction address to probe in user process. + : Type of data to dump. + D => Dump the data from specified data address + S => Dump the data from top of the stack + A => Dump the function arguments (x86_64 only). +[] : Data address. Applicable only for type 'D' + : Number of bytes of data to dump. + : Number of arguments to dump. + +To dump the data at a given address when probe is hit, run: +echo up
D >>/sys/kernel/tracing/uprobes_events + +To dump the data from top of stack when probe is hit, run: +echo up
S >>/sys/kernel/tracing/uprobes_events + +To extract the function arguments when probe is hit, run: +echo up
A >>/sys/kernel/tracing/uprobes_events + +Usage Examples +-------------- +Let us consider following sample C program: + +/* SAMPLE C PROGRAM */ +#include +#include + +char *global_str_p = "Global String pointer"; +char global_str[] = "Global String"; + +int foo(int a, unsigned int b, unsigned long c, long d, char e) +{ + return 0; +} + +int main() +{ + char str[] = "This string is on the stack in main"; + int a = 4; + unsigned int b = 200; + unsigned long c = 0xff0000ed; + long d = -1; + char e = 'H'; + + while (getchar() != EOF) + foo(a, b,c,d,e ); + + return 0; +} +/* SAMPLE C PROGRAM */ + +This example puts a probe at function foo() and dumps some data values, the +top of the stack and all five arguments passed to function foo(). + +The probe address for function foo can be acquired using the 'nm' utility on +the executable file as below: + + $ gcc sample.c -o sample + $ nm sample | grep foo + 0000000000400498 T foo + +We will also dump the data from the global variables 'global_str_p' and +'global_str'. The DATA addresses for these variable can be acquired as below: + + $ nm sample | grep global + 0000000000600960 D global_str + 0000000000600958 D global_str_p + +When setting the probe, you need to specify the process id of the user process +to trace. The process id can be determined by using the 'ps' command. + + $ ps -a | grep sample + 3906 pts/6 00:00:00 sample + +Now set a probe at function foo() as a new event that dumps 100 bytes from the +stack as shown below: + +$ echo "up 3906 0x0000000000400498 S 100" > /sys/kernel/tracing/uprobes_events + +Set additional probes at function foo() to dump the data from the global +variables as shown below: + +$ echo "up 3906 0x0000000000400498 D 0000000000600960 15" >> /sys/kernel/tracing/uprobes_events +$ echo "up 3906 0x0000000000400498 D 0000000000600958 8" >> /sys/kernel/tracing/uprobes_events + +Set another probe at function foo() to dump all five arguments passed to +function foo(). (This option is only valid for x86_64 architecture.) + +$ echo "up 3906 0x0000000000400498 A 5" >> /sys/kernel/tracing/uprobes_events + +To see all the current uprobe events: + +$ cat /sys/kernel/debug/tracing/uprobe_events +up 3906 0x400498 S 100 +up 3906 0x400498 D 0x600960 15 +up 3906 0x400498 D 0x600958 8 +up 3906 0x400498 A 5 + +When the function foo() gets called all the above probes will hit and you can +see the traced information via /sys/kernel/debug/tracing/trace + +$ cat /sys/kernel/debug/tracing/trace +# tracer: nop +# +# TASK-PID CPU# TIMESTAMP FUNCTION +# | | | | | + <...>-3906 [001] 391.531431: : 0x400498: S 0x7fffd934eba8: 38 05 40 00 00 00 00 00 8. at ..... + <...>-3906 [001] 391.531436: : 0x400498: S 0x7fffd934ebb0: 54 68 69 73 20 73 74 72 This str + <...>-3906 [001] 391.531438: : 0x400498: S 0x7fffd934ebb8: 69 6e 67 20 69 73 20 6f ing is o + <...>-3906 [001] 391.531439: : 0x400498: S 0x7fffd934ebc0: 6e 20 74 68 65 20 73 74 n the st + <...>-3906 [001] 391.531441: : 0x400498: S 0x7fffd934ebc8: 61 63 6b 20 69 6e 20 6d ack in m + <...>-3906 [001] 391.531443: : 0x400498: S 0x7fffd934ebd0: 61 69 6e 00 00 00 00 01 ain..... + <...>-3906 [001] 391.531445: : 0x400498: S 0x7fffd934ebd8: c0 bb c1 4a 3b 00 00 00 ...J;... + <...>-3906 [001] 391.531446: : 0x400498: S 0x7fffd934ebe0: 04 00 00 00 c8 00 00 00 ........ + <...>-3906 [001] 391.531448: : 0x400498: S 0x7fffd934ebe8: ed 00 00 ff 00 00 00 00 ........ + <...>-3906 [001] 391.531450: : 0x400498: S 0x7fffd934ebf0: ff ff ff ff ff ff ff ff ........ + <...>-3906 [001] 391.531452: : 0x400498: S 0x7fffd934ebf8: 00 00 00 00 00 00 00 48 .......H + <...>-3906 [001] 391.531453: : 0x400498: S 0x7fffd934ec00: 00 00 00 00 00 00 00 00 ........ + <...>-3906 [001] 391.531455: : 0x400498: S 0x7fffd934ec08: 74 d9 e1 4a t..J + <...>-3906 [001] 391.531489: : 0x400498: D 0x600960: 47 6c 6f 62 61 6c 20 53 Global S + <...>-3906 [001] 391.531491: : 0x400498: D 0x600968: 74 72 69 6e 67 00 00 tring.. + <...>-3906 [001] 391.531500: : 0x400498: D 0x600958: 48 06 40 00 00 00 00 00 H. at ..... + <...>-3906 [001] 391.531504: : 0x400498: A ARG 1: 0000000000000004 + <...>-3906 [001] 391.531505: : 0x400498: A ARG 2: 00000000000000c8 + <...>-3906 [001] 391.531505: : 0x400498: A ARG 3: 00000000ff0000ed + <...>-3906 [001] 391.531506: : 0x400498: A ARG 4: ffffffffffffffff + <...>-3906 [001] 391.531507: : 0x400498: A ARG 5: 0000000000000048 + +Under the FUNCTION column, each line shows the probe address, type, data/stack +address, and 8 bytes of data in hex followed by the ascii representation of the +hex values. If the size specified is more that 8 bytes then multiple lines +will be used to dump data values. In case of type A one argument is shown per +line. + +The lines with type 'S' from tracer output display 100 bytes (8 bytes per +line) from the top of the stack when the probed function foo() is hit. The lines +with type 'A' dump all the five arguments passed to the function foo(). The +first two lines with type 'D' dump 15 bytes of data from the global variable +'global_str' at data address 0x600960. The 3rd line with type 'D' dumps 8 byte +of data from the global string pointer variable 'global_str_p' at 0x600958. +The output shows that it holds the address 0x0000000000400648. As per the +sample program this should point to a const string of 21 characters. Let's +dump the data values at this address. + +echo "up 3906 0x0000000000400498 D 0x0000000000400648 24" > /sys/kernel/tracing/uprobes_events + +Please note that we have not used '>>' operator here; as a result, all +existing probes will be cleared before this new probe is set. + +Take look at the tracer output. + +$ cat /sys/kernel/debug/tracing/trace +# tracer: nop +# +# TASK-PID CPU# TIMESTAMP FUNCTION +# | | | | | + <...>-3906 [001] 442.537669: : 0x400498: D 0x400648: 47 6c 6f 62 61 6c 20 53 Global S + <...>-3906 [001] 442.537674: : 0x400498: D 0x400650: 74 72 69 6e 67 20 70 6f tring po + <...>-3906 [001] 442.537676: : 0x400498: D 0x400658: 69 6e 74 65 72 00 00 00 inter... + + +To clear all the probe events, run: + +echo > /sys/kernel/tracing/uprobes_events + +TODO: +- Allow user to attach a name to probe addresses for address translation. +- Support reporting of arguments from 32-bit applications. +- Dump arguments from floating point registers. +- Optimize code to use single probe instead of multiple probes for same probe + addresses. diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index 26f03ac..623541f 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -54,5 +54,6 @@ obj-$(CONFIG_FTRACE_SYSCALLS) += trace_syscalls.o obj-$(CONFIG_EVENT_PROFILE) += trace_event_profile.o obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o obj-$(CONFIG_EVENT_TRACING) += power-traces.o +obj-$(CONFIG_UPROBES) += trace_uprobes.o libftrace-y := ftrace.o diff --git a/kernel/trace/trace_uprobes.c b/kernel/trace/trace_uprobes.c new file mode 100644 index 0000000..c6e3f90 --- /dev/null +++ b/kernel/trace/trace_uprobes.c @@ -0,0 +1,537 @@ +/* + * Ftrace plugin for Userspace Probes (UProbes) + * kernel/trace/trace_uprobes.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2009 + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "trace.h" + +struct trace_uprobe { + struct list_head list; + struct uprobe usp; + unsigned long daddr; + size_t length; + +#ifdef __x86_64__ +#define TYPE_ARG 'A' +#endif +#define TYPE_DATA 'D' +#define TYPE_STACK 'S' + char type; +}; + +static DEFINE_MUTEX(trace_uprobe_lock); +static LIST_HEAD(tu_list); + +#define NUMVALUES 8 /* Number of data values to print per line*/ + +/* NUMVALUES*2 for hex values + NUMVALUES for spaces + 1 */ +#define HEXBUFSIZE ((NUMVALUES * 2) + NUMVALUES + 1) + +#define CHARBUFSIZE NUMVALUES /* NUMVALUES characters */ +#define BUFSIZE (HEXBUFSIZE + CHARBUFSIZE) + +/* + * uprobe handler to dump data values and the top of the + * stack frame through tracer. + * + * The output is pushed to tracer in following format: + * + * : : + * + * The is divided into two parts - the hex area and + * the char area. The hex area contains hex data values. + * The number of hex data values contained are controlled + * by NUMVALUES. The char area is the ascii representation + * of hex data values. + * + * |<---------- BUFSIZE + 1------------>| + * + * +-----------------+---------------+--+ + * obuf | HEX Area | CHAR Area |\0| + * +-----------------+---------------+--+ + * ^ ^ ^ + * |<--HEXBUFSIZE -->|<-CHARBUFSIZE->| + * + + * + * 0x400498: S 0x7fffd934eba8: c8 00 00 00 ed 00 00 ff ........ + * 0x400498: S 0x7fffd934ebb0: 54 68 69 73 20 73 74 72 This str + */ + +static void uprobe_handler(struct uprobe *u, struct pt_regs *regs) +{ + struct trace_uprobe *tu; + char *buf; + unsigned long ip = instruction_pointer(regs), daddr; + int len; + char obuf[BUFSIZE + 1]; + + tu = container_of(u, struct trace_uprobe, usp); + buf = kzalloc(tu->length + 1, GFP_KERNEL); + if (!buf) + return; + + if (tu->type == TYPE_STACK) { + /* Get Stack Pointer. Dump stack memory */ + daddr = (unsigned long)user_stack_pointer(regs); + } else + daddr = tu->daddr; + + len = tu->length; + if (!copy_from_user(buf, (void *)daddr, tu->length)) { + int pos = 0; + + for (pos = 0; pos < len; pos += NUMVALUES) { + char *hp = obuf; /* Hex area buf pointer */ + char *cp = hp + HEXBUFSIZE; /* char area buf pointer */ + int i = 0, last; + + memset(obuf, ' ', BUFSIZE); + obuf[BUFSIZE] = '\0'; + + last = pos + (NUMVALUES - 1); + if (last >= len) + last = len - 1; + + for (i = pos; i <= last; i++) { + sprintf(hp, "%02x", (unsigned char)buf[i]); + + /* + * Character representation.. + * ignore non-printable chars + */ + if ((buf[i] >= ' ') && (buf[i] <= '~')) + *cp = buf[i]; + else + *cp = '.'; + + hp += 2; + *hp++ = ' '; + cp++; + } + + __trace_bprintk(ip, "0x%lx: %c 0x%lx: %s\n", + tu->usp.vaddr, tu->type, + (daddr + pos), obuf); + } + } else { + __trace_bprintk(ip, "0x%lx: %c 0x%lx: " + "Data capture failed. Invalid address\n", + tu->usp.vaddr, tu->type, daddr); + } + kfree(buf); +} + +#ifdef __x86_64__ + +/* + * uprobe handler to dump function arguments through tracer. + * Currently, supported for x86_64 architecture. + * Argument extraction as per x86_64 ABI (Application Binary + * Interface) document Version 0.99. + * + * The output is pushed to tracer in following format: + * + * : A ARG #: + * + * e.g. + * 0x400498: A ARG 1: 0000000000000004 + * 0x400498: A ARG 2: 00000000000000c8 + */ +static void uprobe_handler_args(struct uprobe *u, struct pt_regs *regs) +{ + struct trace_uprobe *tu; + unsigned long ip = instruction_pointer(regs); + unsigned long args[6]; + int i; + + tu = container_of(u, struct trace_uprobe, usp); + + /* Function arguments */ + args[0] = regs->di; + args[1] = regs->si; + args[2] = regs->dx; + args[3] = regs->cx; + args[4] = regs->r8; + args[5] = regs->r9; + + for (i = 0; i < tu->length; i++) { + __trace_bprintk(ip, "0x%lx: %c ARG %d: %016lx\n", + u->vaddr, tu->type, i + 1, args[i]); + } +} +#endif + +/* + * Updates the size/numargs of existing probe event if found. + */ +static struct trace_uprobe *update_trace_probe(pid_t pid, + unsigned long taddr, unsigned long daddr, size_t length, + char type) +{ + struct trace_uprobe *tu, *tmp; + + mutex_lock(&trace_uprobe_lock); + list_for_each_entry_safe(tu, tmp, &tu_list, list) { + if ((tu->usp.pid == pid) && (tu->usp.vaddr == taddr) + && (tu->type == type) && (tu->daddr == daddr)) { + tu->length = length; + mutex_unlock(&trace_uprobe_lock); + return tu; + } + } + mutex_unlock(&trace_uprobe_lock); + return NULL; +} + +/* + * Creates a new probe event entry and sets the user probe by calling + * register_uprobe() + */ +static int trace_register_uprobe(pid_t pid, unsigned long taddr, + unsigned long daddr, size_t length, char type) +{ + struct trace_uprobe *tu; + int ret = 0; + + /* Check for duplication. If probe for same data address + * already exists then just update the length. + */ + tu = update_trace_probe(pid, taddr, daddr, length, type); + if (tu) + return 0; + + /* This is a new probe. */ + tu = kzalloc(sizeof(struct trace_uprobe), GFP_KERNEL); + if (!tu) + return -ENOMEM; + + INIT_LIST_HEAD(&tu->list); + tu->length = length; + tu->daddr = daddr; + tu->type = type; + tu->usp.pid = pid; + tu->usp.vaddr = taddr; +#ifdef __x86_64__ + tu->usp.handler = (tu->type == TYPE_ARG) ? + uprobe_handler_args : uprobe_handler; +#else + tu->usp.handler = uprobe_handler; +#endif + ret = register_uprobe(&tu->usp); + + if (ret) { + pr_err("register_uprobe(pid=%d vaddr=%lx) = ret(%d) failed\n", + pid, taddr, ret); + kfree(tu); + return ret; + } + mutex_lock(&trace_uprobe_lock); + list_add_tail(&tu->list, &tu_list); + mutex_unlock(&trace_uprobe_lock); + return 0; +} + +static void uprobes_clear_all_events(void) +{ + struct trace_uprobe *tu, *tmp; + + mutex_lock(&trace_uprobe_lock); + list_for_each_entry_safe(tu, tmp, &tu_list, list) { + unregister_uprobe(&tu->usp); + list_del(&tu->list); + kfree(tu); + } + mutex_unlock(&trace_uprobe_lock); +} + +/* User probes listing interfaces */ +static void *uprobes_seq_start(struct seq_file *m, loff_t *pos) +{ + mutex_lock(&trace_uprobe_lock); + return seq_list_start(&tu_list, *pos); +} + +static void *uprobes_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + return seq_list_next(v, &tu_list, pos); +} + +static void uprobes_seq_stop(struct seq_file *m, void *v) +{ + mutex_unlock(&trace_uprobe_lock); +} + +static int uprobes_seq_show(struct seq_file *m, void *v) +{ + struct trace_uprobe *tu = v; + + if (tu == NULL) + return 0; + + if (tu->type == TYPE_DATA) + seq_printf(m, "%-3s%d 0x%lx D 0x%lx %zu\n", + "up", tu->usp.pid, tu->usp.vaddr, tu->daddr, tu->length); + else + seq_printf(m, "%-3s%d 0x%lx %c %zu\n", + "up", tu->usp.pid, tu->usp.vaddr, tu->type, tu->length); + + return 0; +} + +static const struct seq_operations uprobes_seq_ops = { + .start = uprobes_seq_start, + .next = uprobes_seq_next, + .stop = uprobes_seq_stop, + .show = uprobes_seq_show +}; + +static int uprobe_events_open(struct inode *inode, struct file *file) +{ + if ((file->f_mode & FMODE_WRITE) && + !(file->f_flags & O_APPEND)) + uprobes_clear_all_events(); + + return seq_open(file, &uprobes_seq_ops); +} + +#ifdef __x86_64__ +static int process_check_64bit(pid_t p) +{ + struct pid *pid = NULL; + struct task_struct *tsk; + int ret = -ESRCH; + + rcu_read_lock(); + if (current->nsproxy) + pid = find_vpid(p); + + if (pid) { + tsk = pid_task(pid, PIDTYPE_PID); + + if (tsk) { + if (test_tsk_thread_flag(tsk, TIF_IA32)) { + pr_err("Option to dump arguments is" + "not supported for 32bit process\n"); + ret = -EPERM; + } else + ret = 0; + } + } + rcu_read_unlock(); + return ret; +} +#endif + +/* + * Input syntax: + * up [] + */ + +static int enable_uprobe_trace(int argc, char **argv) +{ + unsigned long taddr, daddr = 0, tmpval; + size_t dsize; + pid_t pid; + int ret = -EINVAL; + char type; + + if ((argc < 5) || (argc > 6)) + return -EINVAL; + + if (strcmp(argv[0], "up")) + return -EINVAL; + + /* get the pid */ + ret = strict_strtoul(argv[1], 10, &tmpval); + if (ret) + return ret; + + pid = (pid_t) tmpval; + + /* get the address to probe */ + ret = strict_strtoul(argv[2], 16, &taddr); + if (ret) + return ret; + + /* See if user asked for Stack or Data address. */ + if ((strlen(argv[3]) != 1) || (!isalpha(*argv[3]))) + return -EINVAL; + + switch (*argv[3]) { +#ifdef __x86_64__ + /* + * dumping of arguments supported only for x86_64 arch + */ + case 'A': + case 'a': + type = TYPE_ARG; + if (argc > 5) + return -EINVAL; + /* Option 'A' is not supported for 32 bit process. */ + ret = process_check_64bit(pid); + if (ret) + return ret; + + daddr = 0; + break; +#endif + case 'D': + case 'd': + type = TYPE_DATA; + if (argc < 6) + return -EINVAL; + /* get the data address */ + ret = strict_strtoul(argv[4], 16, &daddr); + if (ret) + return ret; + break; + case 'S': + case 's': + type = TYPE_STACK; + if (argc > 5) + return -EINVAL; + daddr = 0; + break; + default: + return -EINVAL; + } + + /* + * In case of TYPE_DATA and TYPE_STACK: get the size of data to dump. + * In case of TYPE_ARG: this is the number of arguments to dump + */ + ret = strict_strtoul(((type == TYPE_DATA) ? + argv[5] : argv[4]), 10, &tmpval); + if (ret) + return ret; + + dsize = (size_t) tmpval; + +#ifdef __x86_64__ + /* Only upto 6 args supported */ + if ((type == TYPE_ARG) && (dsize > 6)) { + pr_err("Can not dump more than 6 arguments\n"); + return -EINVAL; + } +#endif + + ret = trace_register_uprobe(pid, taddr, daddr, dsize, type); + return ret; +} + +/* + * Process commands written to /sys/kernel/debug/tracing/uprobe_events. + * Supports multiple lines. It reads the entire ubuf into local buffer + * and then breaks the input into lines. Invokes enable_uprobe_trace() + * for each line after splitting them into args array. + */ + +static ssize_t +uprobe_events_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + char *kbuf, *start, *end = NULL, *tmp; + char **argv = NULL; + int argc = 0; + int ret = 0; + size_t done = 0; + size_t size; + + if (!count) + return 0; + + kbuf = kmalloc(count + 1, GFP_KERNEL); + if (!kbuf) + return -ENOMEM; + + if (copy_from_user(kbuf, ubuf, count)) { + ret = -EFAULT; + goto err_out; + } + + kbuf[count] = '\0'; + for (start = kbuf; done < count; start = end + 1) { + end = strchr(start, '\n'); + if (!end) { + pr_err("Line length is too long"); + ret = -EINVAL; + goto err_out; + } + *end = '\0'; + size = end - start + 1; + done += size; + /* Remove comments */ + tmp = strchr(start, '#'); + if (tmp) + *tmp = '\0'; + + argv = argv_split(GFP_KERNEL, start, &argc); + if (!argv) { + ret = -ENOMEM; + goto err_out; + } + + if (argc) + ret = enable_uprobe_trace(argc, argv); + + argv_free(argv); + if (ret < 0) + goto err_out; + } + ret = done; +err_out: + kfree(kbuf); + return ret; +} + +static const struct file_operations uprobes_events_ops = { + .open = uprobe_events_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, + .write = uprobe_events_write, +}; + +static __init int init_uprobe_trace(void) +{ + struct dentry *d_tracer; + struct dentry *entry; + + d_tracer = tracing_init_dentry(); + + entry = debugfs_create_file("uprobe_events", 0644, d_tracer, + NULL, &uprobes_events_ops); + + if (!entry) + pr_warning("Could not create debugfs 'uprobe_events' entry\n"); + + return 0; +} +fs_initcall(init_uprobe_trace); From srikar at linux.vnet.ibm.com Mon Jan 11 12:25:21 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 11 Jan 2010 17:55:21 +0530 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes Message-ID: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Hi, This patchset implements Uprobes which enables you to dynamically break into any routine in a user space application and collect information non-disruptively. Uprobes is based on utrace and uses x86 instruction decoder. When a uprobe is registered, Uprobes makes a copy of the probed instruction, stops the probed application, replaces the first byte(s) of the probed instruction with a breakpoint instruction and allows the probed application to continue. (Uprobes uses the same copy-on-write mechanism so that the breakpoint affects only that process.) When a CPU hits the breakpoint instruction, Uprobes intercepts the SIGTRAP and finds the associated uprobe. It then executes the associated handler. Uprobes single-steps its copy of the probed instruction and resumes execution of the probed process at the instruction following the probepoint. Instruction copies to be single-stepped are stored in a per-process "single-step out of line (XOL) area," Uprobes can be used to take advantage of static markers available in user space applications. Advantages of uprobes over conventional debugging include: 1. Non-disruptive. 2. Uses Execution out of line(XOL), 3. Much better handling of multithreaded programs because of XOL. 4. No context switch between tracer, tracee. Here is the list of TODO Items. - Provide a perf interface to uprobes. - Return probes. - Support for Other Architectures. - Jump optimization. This patchset provides Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Subject: [RFC] [PATCH 2/7] x86 support for UBP Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) Subject: [RFC] [PATCH 4/7] Uprobes Implementation Subject: [RFC] [PATCH 5/7] X86 Support for Uprobes Subject: [RFC] [PATCH 6/7] Uprobes Documentation Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes This patchset is based on git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git If and when utrace gets accepted into tip tree or Mainline, I will rebase this patchset. Please do provide your valuable comments. -- Thanks and Regards Srikar From info at atalantafilmesnewsletter.com Mon Jan 11 13:12:04 2010 From: info at atalantafilmesnewsletter.com (Atalanta Filmes) Date: Mon, 11 Jan 2010 13:12:04 +0000 Subject: =?iso-8859-1?Q?UM_PROFETA, _um_filme_de_Jacques_Audiard_j=C3=A1_em_exibi?= =?iso-8859-1?Q?=C3=A7=C3=A3o?= Message-ID: Se n??o visualizar esta p??gina correctamente, clique aqui xxxxVeja o trailer do filme >> AQUI CHEGOU UM DOS FILMES DO ANO UM PROFETA DE JACQUES AUDIARD EM EXIBI????O NOS CINEMAS ??um filme de implac??vel realismo?? Jo??o Lopes, Di??rio de Not??cias ??Pela forma como filma, o nosso cora????o quase parou?? Vasco C??mara, ??psilon ??Jacques Audiard torna-se um dos nomes essenciais do cinema contempor??neo?? Jornal I ??Interpreta????o soberba do rec??m-chegado Rahim?? The Holllywood Reporter ??Inteligente, elegante, ao qual se fica compulsivamente preso?? The Daily Telegraph ??Implac??vel?? Screen Daily EM EXIBI????O EM LISBOA: Medeia Monumental Medeia King Medeia Fonte Nova UCI El Corte Ingl??s Lusomundo Amoreiras EM EXIBI????O NO PORTO: Medeia Cidade do Porto UCI Arr??bida VEJA O TRAILER DO FILME AQUI xxxxDISTRIBUI????O ATALANTA FILMES mais informa????o em http://www.atalantafilmes.pt -- Para RE-ENVIAR / To FORWARD - http://www.atalantafilmesnewsletter.com/phplist/?p=forward&uid=8796d6f78d5efbb8958965a0e70ab9c8&mid=10 Para REMOVER / To REMOVE - http://www.atalantafilmesnewsletter.com/phplist/?p=unsubscribe&uid=8796d6f78d5efbb8958965a0e70ab9c8 Para MODIFICAR / To MODIFY - http://www.atalantafilmesnewsletter.com/phplist/?p=preferences&uid=8796d6f78d5efbb8958965a0e70ab9c8 -- Powered by PHPlist, www.phplist.com -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: powerphplist.png Type: image/png Size: 2408 bytes Desc: not available URL: From mhiramat at redhat.com Mon Jan 11 14:35:39 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Mon, 11 Jan 2010 09:35:39 -0500 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <4B4B373B.5010802@redhat.com> Srikar Dronamraju wrote: > Hi, > > This patchset implements Uprobes which enables you to dynamically > break into any routine in a user space application and collect > information non-disruptively. Uprobes is based on utrace and uses > x86 instruction decoder. > > When a uprobe is registered, Uprobes makes a copy of the probed > instruction, stops the probed application, replaces the first > byte(s) of the probed instruction with a breakpoint instruction and > allows the probed application to continue. (Uprobes uses the same > copy-on-write mechanism so that the breakpoint affects only that > process.) > > When a CPU hits the breakpoint instruction, Uprobes intercepts the > SIGTRAP and finds the associated uprobe. It then executes the > associated handler. Uprobes single-steps its copy of the probed > instruction and resumes execution of the probed process at the > instruction following the probepoint. Instruction copies to be > single-stepped are stored in a per-process "single-step out of line > (XOL) area," > > Uprobes can be used to take advantage of static markers available > in user space applications. > > Advantages of uprobes over conventional debugging include: > 1. Non-disruptive. > 2. Uses Execution out of line(XOL), > 3. Much better handling of multithreaded programs because of XOL. > 4. No context switch between tracer, tracee. Hi Srikar and Jim, Great work! thanks for releasing it. > > Here is the list of TODO Items. > > - Provide a perf interface to uprobes. I think we also need to integrate ftrace-kprobe/uprobe to support dynamic trace event. it helps perf probe to support uprobe much easier. > - Return probes. Hmm, I think we need some symbol information for supporting return probes in user space. Could you tell me how to work it? is that requires some user-space helper? > - Support for Other Architectures. > - Jump optimization. I assume that you meant this is "uprobe-booster" to skip just single stepping after probing, isn't it? Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From oleg at redhat.com Mon Jan 11 14:37:57 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Mon, 11 Jan 2010 15:37:57 +0100 Subject: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set ->ops = utrace_detached_ops lockless) In-Reply-To: <253423212.42121263203965061.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> References: <20091209181241.GA20475@redhat.com> <253423212.42121263203965061.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <20100111143756.GA4970@redhat.com> On 01/11, CAI Qian wrote: > > Looks like the following patch from Oleg has not been checked in > ptrace testsuite yet. > ... > > --- step-jump-cont.c~ 2009-12-09 12:17:04.367733643 -0500 > > +++ step-jump-cont.c 2009-12-09 13:12:50.708535770 -0500 > > @@ -153,12 +153,19 @@ raise_sigusr2 (void) > > assert (0); > > } > > > > +typedef struct { > > + unsigned long entry; > > + unsigned long toc; > > + unsigned long env; > > +} func_descr_t; > > + > > int main (void) > > { > > long l; > > int status; > > pid_t pid; > > REGS_TYPE (regs); > > + func_descr_t *fp; > > > > setbuf (stdout, NULL); > > atexit (cleanup); > > @@ -214,7 +221,12 @@ int main (void) > > #elif defined __x86_64__ > > REGS_ACCESS (regs, rip) = (unsigned long) raise_sigusr2; > > #elif defined __powerpc__ > > - REGS_ACCESS (regs, nip) = (unsigned long) raise_sigusr2; > > + > > + fp = (void*)raise_sigusr2; > > + > > + REGS_ACCESS(regs, nip) = fp->entry; > > + REGS_ACCESS(regs, gpr[2]) = fp->toc; > > + My patch was a quick and dirty hack, afaics Jan has commited the right change, step-jump-cont.c does: #elif defined __powerpc64__ { /* ppc64 `raise_sigusr2' resolves to the function descriptor. */ union { void (*f) (void); struct { void *entry; void *toc; } *p; } const func_u = { raise_sigusr2 }; REGS_ACCESS (regs, nip) = (unsigned long) func_u.p->entry; REGS_ACCESS (regs, gpr[2]) = (unsigned long) func_u.p->toc; } #elif defined __powerpc__ Oleg. From jkenisto at us.ibm.com Mon Jan 11 22:59:32 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Mon, 11 Jan 2010 14:59:32 -0800 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes In-Reply-To: <4B4B373B.5010802@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <4B4B373B.5010802@redhat.com> Message-ID: <1263250772.5094.41.camel@localhost.localdomain> On Mon, 2010-01-11 at 09:35 -0500, Masami Hiramatsu wrote: > Srikar Dronamraju wrote: > > Hi, > > > > This patchset implements Uprobes which enables you to dynamically > > break into any routine in a user space application and collect > > information non-disruptively. Uprobes is based on utrace and uses > > x86 instruction decoder. ... > > > - Return probes. > > Hmm, I think we need some symbol information for supporting > return probes in user space. Could you tell me how to work it? > is that requires some user-space helper? Return probes are on the TODO list, but we actually already have a pretty solid implementation of that. It's held out for now because Srikar's patch set is already big, and we want get a review of the basic ubp/xol/uprobes feature. For the most part, we don't need special symbol information for return probes. We just do as we did in kretprobes: hijack the return address and replace it with the address of a trampoline. In user-space return probes, the trampoline is one of the instruction slots in the XOL vma, and contains a breakpoint to trap us into the kernel. (Of course, as in kretprobes, we need to know the address of the function so we can hijack the return address upon entry to the function.) One place where symbol info would come in handy is when a function returns in a weird way. We handle longjmps by noticing that the task's stack is smaller than expected, and presumably missing stack frames that were bypassed by the longjmp. But this heuristic gets dicey when you consider that in a 32-bit x86 app, a struct-returning function pops not only the return address upon return, but also the address of the returned struct value. So it'd be nice to know if a function returns a struct. Does this answer your question, or did I miss something? > > > - Support for Other Architectures. > > - Jump optimization. > > I assume that you meant this is "uprobe-booster" to skip > just single stepping after probing, isn't it? Yes, I think that's what Srikar meant: avoid single-stepping by adding a jump instruction after the instruction-copy in the XOL slot -- as you did in your kprobes-booster work. Your instruction-analysis work makes this much more feasible. > > > Thank you, Jim Keniston From cmtechnical at bsnl.in Tue Jan 12 00:06:14 2010 From: cmtechnical at bsnl.in (Kalpesh Sharma) Date: Tue, 12 Jan 2010 05:36:14 +0530 Subject: Kalpesh Sharma CV - 12 years Experienced Specialist with World Records. Message-ID: <20100112.LWVGNFISTJUGMLNW@bsnl.in> Respected Sir/Madam, It's my pleasure to get in touch with you. I am an expert with world records and exceptional achievements in my field. I am attaching my resume with this email. I kindly request you to have a look incase my skills, expertise, experience, etc. can play an primary role for your company's growth. I assure you with full confidence that once given an opportunity, I will work hard to do as much as possible best for the company. Because I believe that the company for which I work is not just a place to fulfill formality to come in morning and go back in evening, but instead the company for which I work is like my home. And a person who thinks his work place(company) his home, will take care of his company in the same way as he takes care of the safety, growth and security of his own home. Thanking you, Sincerely Kalpesh Sharma Note For Entertainment Industry Only: I am ready for working 100% free of cost for short films, ad films, corporate films and full featured films. And have more then sufficient knowledge and medium acting skills to give the best delivery of my services. I will do free of cost just to gain professional experience in entertainment industry. So, only I will need is a certificate of good performance and experience in entertainment industry. So, if you do not have enough budget then feel free to get in touch with me. However, the costs of food, accomodation and travel that are an essential part of shooting will not be paid from my pocket. I will give service 100% free of cost without charging anything, but at the same time cost of accomodation, food and travel will have to be borne by the production company. As far as it concerns to delivery of service and performance, I will try to do the best of best and assure you of satisfaction for the same at your level. My Quote: Marketing is an art of selling. Companies employed marketing executives till date, but now there should be a change. Start employing marketing actors. My Primary Challenging Skill for Executive Position: I have a lot of ideas for every industry sector. Once given an opportunity, I challenge to prove my work with practical results rather then floating in dream world and describing it theoretically. I have worked at sophisticated levels and know those ideas, which will generate excellent revenue for the companies. I research and then generate ideas, along with taking care of every pros and cons involved. Detailed Information of My Skills (Top 15): Online Marketing Expert, Internet Advertising Expert, SEO, SEM, etc. Article Writing Expert, Article Top Ranking Techniques, Bulk Article Submissions. Blog Marketing Expert, Forum Marketing Expert, Social Networking Expert. Media (Unique & Exclusive News), Arts & Entertainment Industry Artist, Reporter, Writer, Assistant Director/ Assistant Producer, etc. (Portfolio Images Attached for Entertainment Industry Related Jobs on http://www.desitara.com/shriganesh) Networking (LAN of Wired/Wireless up to 25 PC). Hardware Troubleshooting, Operating System Installations (Server/Desktop/Laptop). Information Security Expert, Ethical Hacking, Penetration Testing Expert, Vulnerability Assessment and Solutions. Market Research, Web Research, Industry Research, Subject Specific & Geo Specific Research. Data Extraction, Email Extraction, Contact & Address Extraction. Competitor Research, Analysis & Technical Intelligence Business Development Director, Business Management, Business Planning, Creative New Business Ideas Innovation, Planning and Implementing Business Strategies. Linksys Wired/Wireless Router Configuration WAG325N & WRVS 4400N. Trainee/Assistant to Network Engineer Wide LAN/WAN Network, Corporate Network, Web Designing, Web Development, Web Programming, Software Programming, Software Development, Software Engineering, Research & Development, Executive/Director/Corporate/Management Levels, Software Testing Client/Server Level, Cisco Router CCNA, CCNP, CCIE Specialist, Sap Specialist, CISSP/CISA Specialist. Almost All Types of Administrative & Management Skills. Moderate Legal Working Knowledge. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Specialist CV.doc Type: application/msword Size: 77312 bytes Desc: not available URL: From assures at eurac.nl Tue Jan 12 01:08:07 2010 From: assures at eurac.nl (Dings) Date: Tue, 12 Jan 2010 02:08:07 +0100 Subject: al others were leaning over the rail abo Message-ID: <4B4BC7F1.6010307@eurac.nl> T!' I see--now." "Will you keep still?" whispered Dan. "If they hear us, you'll find out who wants to kill you. The root she took that time was nothing. There'll be worse ones--this boat is not through rooting yet." Neither was she. Ahead the tug loomed, a great dark shape; and the pulse of her engines was lost in the roiling water rising from the screw blades and the hiss of it as it raced by the row-boat. There was a dim blur of light from one of the after-cabin portholes and the shadow of figures passing to and fro inside could be seen. The decks were deserted. It was too cold to brave the night wind except under necessity--a night wind that cut through the pea-jackets and ear-caps and thick woollen gloves of the two men in the rowboat. Captain Barney felt a fierce resentment that the _Quinn's_ men should be so warm and comfortable while he was shivering. "Christmas Eve!" he exclaimed. "Fine, ain't it?" and he flailed his arms about to keep the blood in circulation. "Christmas Eve," said Dan solemnly, as though to himself, "the finest I ever spent"; and he added apologetically, "even if I am making an eternal fool of myself." On they sped. Frequently the tug would hit a large stretch of clear water, and at such times the jingle-bell would sound in the engine-room and the _ -------------- next part -------------- A non-text attachment was scrubbed... Name: extravert.jpg Type: image/jpeg Size: 16503 bytes Desc: not available URL: From paulmck at linux.vnet.ibm.com Tue Jan 12 02:01:55 2010 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 11 Jan 2010 18:01:55 -0800 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> Message-ID: <20100112020155.GC10869@linux.vnet.ibm.com> On Mon, Jan 11, 2010 at 05:55:53PM +0530, Srikar Dronamraju wrote: > Uprobes Implementation > > Uprobes Infrastructure enables user to dynamically establish > probepoints in user applications and collect information by executing > a handler functions when the probepoints are hit. > Please refer Documentation/uprobes.txt for more details. > > This patch provides the core implementation of uprobes. > This patch builds on utrace infrastructure. > > You need to follow this up with the uprobes patch for your > architecture. Good to see this!!! Several questions interspersed below. Thanx, Paul > Signed-off-by: Jim Keniston > Signed-off-by: Srikar Dronamraju > --- > arch/Kconfig | 12 > include/linux/uprobes.h | 292 ++++++ > kernel/Makefile | 1 > kernel/uprobes_core.c | 2017 ++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 2322 insertions(+) > > Index: new_uprobes.git/arch/Kconfig > =================================================================== > --- new_uprobes.git.orig/arch/Kconfig > +++ new_uprobes.git/arch/Kconfig > @@ -66,6 +66,16 @@ config UBP > in user applications. This service is used by components > such as uprobes. If in doubt, say "N". > > +config UPROBES > + bool "User-space probes (EXPERIMENTAL)" > + depends on UTRACE && MODULES && UBP > + depends on HAVE_UPROBES > + help > + Uprobes enables kernel modules to establish probepoints > + in user applications and execute handler functions when > + the probepoints are hit. For more information, refer to > + Documentation/uprobes.txt. If in doubt, say "N". > + > config HAVE_EFFICIENT_UNALIGNED_ACCESS > bool > help > @@ -115,6 +125,8 @@ config HAVE_KPROBES > config HAVE_KRETPROBES > bool > > +config HAVE_UPROBES > + def_bool n > # > # An arch should select this if it provides all these things: > # > Index: new_uprobes.git/include/linux/uprobes.h > =================================================================== > --- /dev/null > +++ new_uprobes.git/include/linux/uprobes.h > @@ -0,0 +1,292 @@ > +#ifndef _LINUX_UPROBES_H > +#define _LINUX_UPROBES_H > +/* > + * Userspace Probes (UProbes) > + * include/linux/uprobes.h > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > + * > + * Copyright (C) IBM Corporation, 2006, 2009 > + */ > +#include > +#include > + > +struct pt_regs; > + > +/* This is what the user supplies us. */ > +struct uprobe { > + /* > + * The pid of the probed process. Currently, this can be the > + * thread ID (task->pid) of any active thread in the process. > + */ > + pid_t pid; > + > + /* Location of the probepoint */ > + unsigned long vaddr; > + > + /* Handler to run when the probepoint is hit */ > + void (*handler)(struct uprobe*, struct pt_regs*); > + > + /* > + * This function, if non-NULL, will be called upon completion of > + * an ASYNCHRONOUS registration (i.e., one initiated by a uprobe > + * handler). reg = 1 for register, 0 for unregister. > + */ > + void (*registration_callback)(struct uprobe *u, int reg, int result); > + > + /* Reserved for use by uprobes */ > + void *kdata; > +}; > + > +#if defined(CONFIG_UPROBES) > +extern int register_uprobe(struct uprobe *u); > +extern void unregister_uprobe(struct uprobe *u); > +#else > +static inline int register_uprobe(struct uprobe *u) > +{ > + return -ENOSYS; > +} > +static inline void unregister_uprobe(struct uprobe *u) > +{ > +} > +#endif /* CONFIG_UPROBES */ > + > +#ifdef UPROBES_IMPLEMENTATION > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct utrace_engine; > +struct task_struct; > +struct pid; > + > +enum uprobe_probept_state { > + UPROBE_INSERTING, /* process quiescing prior to insertion */ > + UPROBE_BP_SET, /* breakpoint in place */ > + UPROBE_REMOVING, /* process quiescing prior to removal */ > + UPROBE_DISABLED /* removal completed */ > +}; > + > +enum uprobe_task_state { > + UPTASK_QUIESCENT, > + UPTASK_SLEEPING, /* See utask_fake_quiesce(). */ > + UPTASK_RUNNING, > + UPTASK_BP_HIT, > + UPTASK_SSTEP > +}; > + > +enum uprobe_ssil_state { > + SSIL_DISABLE, > + SSIL_CLEAR, > + SSIL_SET > +}; > + > +#define UPROBE_HASH_BITS 5 > +#define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS) > + > +/* > + * uprobe_process -- not a user-visible struct. > + * A uprobe_process represents a probed process. A process can have > + * multiple probepoints (each represented by a uprobe_probept) and > + * one or more threads (each represented by a uprobe_task). > + */ > +struct uprobe_process { > + /* > + * rwsem is write-locked for any change to the uprobe_process's > + * graph (including uprobe_tasks, uprobe_probepts, and uprobe_kimgs) -- > + * e.g., due to probe [un]registration or special events like exit. > + * It's read-locked during the whole time we process a probepoint hit. > + */ > + struct rw_semaphore rwsem; > + > + /* Table of uprobe_probepts registered for this process */ > + /* TODO: Switch to list_head[] per Ingo. */ > + struct hlist_head uprobe_table[UPROBE_TABLE_SIZE]; > + > + /* List of uprobe_probepts awaiting insertion or removal */ > + struct list_head pending_uprobes; > + > + /* List of uprobe_tasks in this task group */ > + struct list_head thread_list; > + int nthreads; > + int n_quiescent_threads; > + > + /* this goes on the uproc_table */ > + struct hlist_node hlist; > + > + /* > + * All threads (tasks) in a process share the same uprobe_process. > + */ > + struct pid *tg_leader; > + pid_t tgid; > + > + /* Threads in UTASK_SLEEPING state wait here to be roused. */ > + wait_queue_head_t waitq; > + > + /* > + * We won't free the uprobe_process while... > + * - any register/unregister operations on it are in progress; or > + * - any uprobe_report_* callbacks are running; or > + * - uprobe_table[] is not empty; or > + * - any tasks are UTASK_SLEEPING in the waitq; > + * refcount reflects this. We do NOT ref-count tasks (threads), > + * since once the last thread has exited, the rest is academic. > + */ > + atomic_t refcount; > + > + /* > + * finished = 1 means the process is execing or the last thread > + * is exiting, and we're cleaning up the uproc. If the execed > + * process is probed, a new uproc will be created. > + */ > + bool finished; > + > + /* > + * 1 to single-step out of line; 0 for inline. This can drop to > + * 0 if we can't set up the XOL area, but never goes from 0 to 1. > + */ > + bool sstep_out_of_line; > + > + /* > + * Manages slots for instruction-copies to be single-stepped > + * out of line. > + */ > + void *xol_area; > +}; > + > +/* > + * uprobe_kimg -- not a user-visible struct. > + * Holds implementation-only per-uprobe data. > + * uprobe->kdata points to this. > + */ > +struct uprobe_kimg { > + struct uprobe *uprobe; > + struct uprobe_probept *ppt; > + > + /* > + * -EBUSY while we're waiting for all threads to quiesce so the > + * associated breakpoint can be inserted or removed. > + * 0 if the the insert/remove operation has succeeded, or -errno > + * otherwise. > + */ > + int status; > + > + /* on ppt's list */ > + struct list_head list; > +}; > + > +/* > + * uprobe_probept -- not a user-visible struct. > + * A probepoint, at which several uprobes can be registered. > + * Guarded by uproc->rwsem. > + */ > +struct uprobe_probept { > + /* breakpoint/XOL details */ > + struct ubp_bkpt ubp; > + > + /* The uprobe_kimg(s) associated with this uprobe_probept */ > + struct list_head uprobe_list; > + > + enum uprobe_probept_state state; > + > + /* The parent uprobe_process */ > + struct uprobe_process *uproc; > + > + /* > + * ppt goes in the uprobe_process->uprobe_table when registered -- > + * even before the breakpoint has been inserted. > + */ > + struct hlist_node ut_node; > + > + /* > + * ppt sits in the uprobe_process->pending_uprobes queue while > + * awaiting insertion or removal of the breakpoint. > + */ > + struct list_head pd_node; > + > + /* [un]register_uprobe() waits 'til bkpt inserted/removed */ > + wait_queue_head_t waitq; > + > + /* > + * ssil_lock, ssilq and ssil_state are used to serialize > + * single-stepping inline, so threads don't clobber each other > + * swapping the breakpoint instruction in and out. This helps > + * prevent crashing the probed app, but it does NOT prevent > + * probe misses while the breakpoint is swapped out. > + * ssilq - threads wait for their chance to single-step inline. > + */ > + spinlock_t ssil_lock; > + wait_queue_head_t ssilq; > + enum uprobe_ssil_state ssil_state; > +}; > + > +/* > + * uprobe_utask -- not a user-visible struct. > + * Corresponds to a thread in a probed process. > + * Guarded by uproc->rwsem. > + */ > +struct uprobe_task { > + /* Lives in the global utask_table */ > + struct hlist_node hlist; > + > + /* Lives on the thread_list for the uprobe_process */ > + struct list_head list; > + > + struct task_struct *tsk; > + struct pid *pid; > + > + /* The utrace engine for this task */ > + struct utrace_engine *engine; > + > + /* Back pointer to the associated uprobe_process */ > + struct uprobe_process *uproc; > + > + enum uprobe_task_state state; > + > + /* > + * quiescing = 1 means this task has been asked to quiesce. > + * It may not be able to comply immediately if it's hit a bkpt. > + */ > + bool quiescing; > + > + /* Set before running handlers; cleared after single-stepping. */ > + struct uprobe_probept *active_probe; > + > + /* Saved address of copied original instruction */ > + long singlestep_addr; > + > + struct ubp_task_arch_info arch_info; > + > + /* > + * Unexpected error in probepoint handling has left task's > + * text or stack corrupted. Kill task ASAP. > + */ > + bool doomed; > + > + /* [un]registrations initiated by handlers must be asynchronous. */ > + struct list_head deferred_registrations; > + > + /* Delay handler-destined signals 'til after single-step done. */ > + struct list_head delayed_signals; > +}; > + > +#endif /* UPROBES_IMPLEMENTATION */ > + > +#endif /* _LINUX_UPROBES_H */ > Index: new_uprobes.git/kernel/Makefile > =================================================================== > --- new_uprobes.git.orig/kernel/Makefile > +++ new_uprobes.git/kernel/Makefile > @@ -104,6 +104,7 @@ obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_b > obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o > obj-$(CONFIG_UBP) += ubp_core.o > obj-$(CONFIG_UBP_XOL) += ubp_xol.o > +obj-$(CONFIG_UPROBES) += uprobes_core.o > > ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) > # According to Alan Modra , the -fno-omit-frame-pointer is > Index: new_uprobes.git/kernel/uprobes_core.c > =================================================================== > --- /dev/null > +++ new_uprobes.git/kernel/uprobes_core.c > @@ -0,0 +1,2017 @@ > +/* > + * Userspace Probes (UProbes) > + * kernel/uprobes_core.c > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > + * > + * Copyright (C) IBM Corporation, 2006, 2009 > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#define UPROBES_IMPLEMENTATION 1 > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define UPROBE_SET_FLAGS 1 > +#define UPROBE_CLEAR_FLAGS 0 > + > +#define MAX_XOL_SLOTS 1024 > + > +static int utask_fake_quiesce(struct uprobe_task *utask); > +static int uprobe_post_ssout(struct uprobe_task *utask, > + struct uprobe_probept *ppt, struct pt_regs *regs); > + > +typedef void (*uprobe_handler_t)(struct uprobe*, struct pt_regs*); > + > +/* > + * Table of currently probed processes, hashed by task-group leader's > + * struct pid. > + */ > +static struct hlist_head uproc_table[UPROBE_TABLE_SIZE]; > + > +/* Protects uproc_table during uprobe (un)registration */ > +static DEFINE_MUTEX(uproc_mutex); > + > +/* Table of uprobe_tasks, hashed by task_struct pointer. */ > +static struct hlist_head utask_table[UPROBE_TABLE_SIZE]; > +static DEFINE_SPINLOCK(utask_table_lock); > + > +/* p_uprobe_utrace_ops = &uprobe_utrace_ops. Fwd refs are a pain w/o this. */ > +static const struct utrace_engine_ops *p_uprobe_utrace_ops; > + > +struct deferred_registration { > + struct list_head list; > + struct uprobe *uprobe; > + int regflag; /* 0 - unregister, 1 - register */ > +}; > + > +/* > + * Calling a signal handler cancels single-stepping, so uprobes delays > + * calling the handler, as necessary, until after single-stepping is completed. > + */ > +struct delayed_signal { > + struct list_head list; > + siginfo_t info; > +}; > + > +static u16 ubp_strategies; > + > +static struct uprobe_task *uprobe_find_utask(struct task_struct *tsk) > +{ > + struct hlist_head *head; > + struct hlist_node *node; > + struct uprobe_task *utask; > + unsigned long flags; > + > + head = &utask_table[hash_ptr(tsk, UPROBE_HASH_BITS)]; > + spin_lock_irqsave(&utask_table_lock, flags); > + hlist_for_each_entry(utask, node, head, hlist) { > + if (utask->tsk == tsk) { > + spin_unlock_irqrestore(&utask_table_lock, flags); > + return utask; > + } > + } > + spin_unlock_irqrestore(&utask_table_lock, flags); > + return NULL; > +} > + > +static void uprobe_hash_utask(struct uprobe_task *utask) > +{ > + struct hlist_head *head; > + unsigned long flags; > + > + INIT_HLIST_NODE(&utask->hlist); > + head = &utask_table[hash_ptr(utask->tsk, UPROBE_HASH_BITS)]; > + spin_lock_irqsave(&utask_table_lock, flags); > + hlist_add_head(&utask->hlist, head); > + spin_unlock_irqrestore(&utask_table_lock, flags); > +} > + > +static void uprobe_unhash_utask(struct uprobe_task *utask) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&utask_table_lock, flags); > + hlist_del(&utask->hlist); > + spin_unlock_irqrestore(&utask_table_lock, flags); > +} > + > +static inline void uprobe_get_process(struct uprobe_process *uproc) > +{ > + atomic_inc(&uproc->refcount); > +} > + > +/* > + * Decrement uproc's refcount in a situation where we "know" it can't > + * reach zero. It's OK to call this with uproc locked. Compare with > + * uprobe_put_process(). > + */ > +static inline void uprobe_decref_process(struct uprobe_process *uproc) > +{ > + if (atomic_dec_and_test(&uproc->refcount)) > + BUG(); > +} > + > +/* > + * Runs with the uproc_mutex held. Returns with uproc ref-counted and > + * write-locked. > + * > + * Around exec time, briefly, it's possible to have one (finished) uproc > + * for the old image and one for the new image. We find the latter. > + */ > +static struct uprobe_process *uprobe_find_process(struct pid *tg_leader) > +{ > + struct uprobe_process *uproc; > + struct hlist_head *head; > + struct hlist_node *node; > + > + head = &uproc_table[hash_ptr(tg_leader, UPROBE_HASH_BITS)]; > + hlist_for_each_entry(uproc, node, head, hlist) { > + if (uproc->tg_leader == tg_leader && !uproc->finished) { > + uprobe_get_process(uproc); > + down_write(&uproc->rwsem); > + return uproc; > + } > + } > + return NULL; > +} > + > +/* > + * In the given uproc's hash table of probepoints, find the one with the > + * specified virtual address. Runs with uproc->rwsem locked. > + */ > +static struct uprobe_probept *uprobe_find_probept(struct uprobe_process *uproc, > + unsigned long vaddr) > +{ > + struct uprobe_probept *ppt; > + struct hlist_node *node; > + struct hlist_head *head = &uproc->uprobe_table[hash_long(vaddr, > + UPROBE_HASH_BITS)]; > + > + hlist_for_each_entry(ppt, node, head, ut_node) { > + if (ppt->ubp.vaddr == vaddr && ppt->state != UPROBE_DISABLED) > + return ppt; > + } > + return NULL; > +} > + > +/* > + * Save a copy of the original instruction (so it can be single-stepped > + * out of line), insert the breakpoint instruction, and awake > + * register_uprobe(). > + */ > +static void uprobe_insert_bkpt(struct uprobe_probept *ppt, > + struct task_struct *tsk) > +{ > + struct uprobe_kimg *uk; > + int result; > + > + if (tsk) > + result = ubp_insert_bkpt(tsk, &ppt->ubp); > + else > + /* No surviving tasks associated with ppt->uproc */ > + result = -ESRCH; > + ppt->state = (result ? UPROBE_DISABLED : UPROBE_BP_SET); > + list_for_each_entry(uk, &ppt->uprobe_list, list) > + uk->status = result; > + wake_up_all(&ppt->waitq); > +} > + > +/* > + * Check if task has just stepped on a trap instruction at the > + * indicated address. If it has indeed stepped on that address, > + * then reset Instruction Pointer for the task. > + * > + * tsk should either be current thread or already quiesced thread. > + */ > +static inline void reset_thread_ip(struct task_struct *tsk, > + struct pt_regs *regs, unsigned long addr) > +{ > + if ((ubp_get_bkpt_addr(regs) == addr) && > + !test_tsk_thread_flag(tsk, TIF_SINGLESTEP)) > + ubp_set_ip(regs, addr); > +} > + > +/* > + * ppt's breakpoint has been removed. If any threads are in the middle of > + * single-stepping at this probepoint, fix things up so they can proceed. > + * If any threads have just hit breakpoint but are yet to start > + * pre-processing, reset their instruction pointers. > + * > + * Runs with all of ppt->uproc's threads quiesced and ppt->uproc->rwsem > + * write-locked > + */ > +static inline void adjust_trapped_thread_ip(struct uprobe_probept *ppt) > +{ > + struct uprobe_process *uproc = ppt->uproc; > + struct uprobe_task *utask; > + struct pt_regs *regs; > + > + list_for_each_entry(utask, &uproc->thread_list, list) { > + regs = task_pt_regs(utask->tsk); > + if (utask->active_probe != ppt) { > + reset_thread_ip(utask->tsk, regs, ppt->ubp.vaddr); > + continue; > + } > + > + /* > + * Current thread cannot have an active breakpoint > + * and still request for a breakpoint removal. The > + * above case is handled by utask_fake_quiesce(). > + */ > + BUG_ON(utask->tsk == current); > + > +#ifdef CONFIG_UBP_XOL > + if (instruction_pointer(regs) == ppt->ubp.xol_vaddr) > + /* adjust the ip to breakpoint addr. */ > + ubp_set_ip(regs, ppt->ubp.vaddr); > + else > + /* adjust the ip to next instruction. */ > + uprobe_post_ssout(utask, ppt, regs); > +#endif > + } > +} > + > +static void uprobe_remove_bkpt(struct uprobe_probept *ppt, > + struct task_struct *tsk) > +{ > + if (tsk) { > + if (ubp_remove_bkpt(tsk, &ppt->ubp) != 0) { > + printk(KERN_ERR > + "Error removing uprobe at pid %d vaddr %#lx:" > + " can't restore original instruction\n", > + tsk->tgid, ppt->ubp.vaddr); > + /* > + * This shouldn't happen, since we were previously > + * able to write the breakpoint at that address. > + * There's not much we can do besides let the > + * process die with a SIGTRAP the next time the > + * breakpoint is hit. > + */ > + } > + adjust_trapped_thread_ip(ppt); > + if (ppt->ubp.strategy & UBP_HNT_INLINE) { > + unsigned long flags; > + spin_lock_irqsave(&ppt->ssil_lock, flags); > + ppt->ssil_state = SSIL_DISABLE; > + wake_up_all(&ppt->ssilq); > + spin_unlock_irqrestore(&ppt->ssil_lock, flags); > + } > + } > + /* Wake up unregister_uprobe(). */ > + ppt->state = UPROBE_DISABLED; > + wake_up_all(&ppt->waitq); > +} > + > +/* > + * Runs with all of uproc's threads quiesced and uproc->rwsem write-locked. > + * As specified, insert or remove the breakpoint instruction for each > + * uprobe_probept on uproc's pending list. > + * tsk = one of the tasks associated with uproc -- NULL if there are > + * no surviving threads. > + * It's OK for uproc->pending_uprobes to be empty here. It can happen > + * if a register and an unregister are requested (by different probers) > + * simultaneously for the same pid/vaddr. > + */ > +static void handle_pending_uprobes(struct uprobe_process *uproc, > + struct task_struct *tsk) > +{ > + struct uprobe_probept *ppt, *tmp; > + > + list_for_each_entry_safe(ppt, tmp, &uproc->pending_uprobes, pd_node) { > + switch (ppt->state) { > + case UPROBE_INSERTING: > + uprobe_insert_bkpt(ppt, tsk); > + break; > + case UPROBE_REMOVING: > + uprobe_remove_bkpt(ppt, tsk); > + break; > + default: > + BUG(); > + } > + list_del(&ppt->pd_node); > + } > +} > + > +static void utask_adjust_flags(struct uprobe_task *utask, int set, > + unsigned long flags) > +{ > + unsigned long newflags, oldflags; > + > + oldflags = utask->engine->flags; > + newflags = oldflags; > + if (set) > + newflags |= flags; > + else > + newflags &= ~flags; > + /* > + * utrace_barrier[_pid] is not appropriate here. If we're > + * adjusting current, it's not needed. And if we're adjusting > + * some other task, we're holding utask->uproc->rwsem, which > + * could prevent that task from completing the callback we'd > + * be waiting on. > + */ > + if (newflags != oldflags) { > + if (utrace_set_events_pid(utask->pid, utask->engine, > + newflags) != 0) > + /* We don't care. */ > + ; > + } > +} > + > +static inline void clear_utrace_quiesce(struct uprobe_task *utask, bool resume) > +{ > + utask_adjust_flags(utask, UPROBE_CLEAR_FLAGS, UTRACE_EVENT(QUIESCE)); > + if (resume) { > + if (utrace_control_pid(utask->pid, utask->engine, > + UTRACE_RESUME) != 0) > + /* We don't care. */ > + ; > + } > +} > + > +/* Opposite of quiesce_all_threads(). Same locking applies. */ > +static void rouse_all_threads(struct uprobe_process *uproc) > +{ > + struct uprobe_task *utask; > + > + list_for_each_entry(utask, &uproc->thread_list, list) { > + if (utask->quiescing) { > + utask->quiescing = false; > + if (utask->state == UPTASK_QUIESCENT) { > + utask->state = UPTASK_RUNNING; > + uproc->n_quiescent_threads--; > + clear_utrace_quiesce(utask, true); > + } > + } > + } > + /* Wake any threads that decided to sleep rather than quiesce. */ > + wake_up_all(&uproc->waitq); > +} > + > +/* > + * If all of uproc's surviving threads have quiesced, do the necessary > + * breakpoint insertions or removals, un-quiesce everybody, and return 1. > + * tsk is a surviving thread, or NULL if there is none. Runs with > + * uproc->rwsem write-locked. > + */ > +static int check_uproc_quiesced(struct uprobe_process *uproc, > + struct task_struct *tsk) > +{ > + if (uproc->n_quiescent_threads >= uproc->nthreads) { > + handle_pending_uprobes(uproc, tsk); > + rouse_all_threads(uproc); > + return 1; > + } > + return 0; > +} > + > +/* Direct the indicated thread to quiesce. */ > +static void uprobe_stop_thread(struct uprobe_task *utask) > +{ > + int result; > + > + /* > + * As with utask_adjust_flags, calling utrace_barrier_pid below > + * could deadlock. > + */ > + BUG_ON(utask->tsk == current); > + result = utrace_control_pid(utask->pid, utask->engine, UTRACE_STOP); > + if (result == 0) { > + /* Already stopped. */ > + utask->state = UPTASK_QUIESCENT; > + utask->uproc->n_quiescent_threads++; > + } else if (result == -EINPROGRESS) { > + if (utask->tsk->state & TASK_INTERRUPTIBLE) { > + /* > + * Task could be in interruptible wait for a long > + * time -- e.g., if stopped for I/O. But we know > + * it's not going to run user code before all > + * threads quiesce, so pretend it's quiesced. > + * This avoids terminating a system call via > + * UTRACE_INTERRUPT. > + */ > + utask->state = UPTASK_QUIESCENT; > + utask->uproc->n_quiescent_threads++; > + } else { > + /* > + * Task will eventually stop, but it may be a long time. > + * Don't wait. > + */ > + result = utrace_control_pid(utask->pid, utask->engine, > + UTRACE_INTERRUPT); > + if (result != 0) > + /* We don't care. */ > + ; > + } > + } > +} > + > +/* > + * Quiesce all threads in the specified process -- e.g., prior to > + * breakpoint insertion. Runs with uproc->rwsem write-locked. > + * Returns false if all threads have died. > + */ > +static bool quiesce_all_threads(struct uprobe_process *uproc, > + struct uprobe_task **cur_utask_quiescing) > +{ > + struct uprobe_task *utask; > + struct task_struct *survivor = NULL; /* any survivor */ > + bool survivors = false; > + > + *cur_utask_quiescing = NULL; > + list_for_each_entry(utask, &uproc->thread_list, list) { > + if (!survivors) { > + survivor = pid_task(utask->pid, PIDTYPE_PID); > + if (survivor) > + survivors = true; > + } > + if (!utask->quiescing) { > + /* > + * If utask is currently handling a probepoint, it'll > + * check utask->quiescing and quiesce when it's done. > + */ > + utask->quiescing = true; > + if (utask->tsk == current) > + *cur_utask_quiescing = utask; > + else if (utask->state == UPTASK_RUNNING) { > + utask_adjust_flags(utask, UPROBE_SET_FLAGS, > + UTRACE_EVENT(QUIESCE)); > + uprobe_stop_thread(utask); > + } > + } > + } > + /* > + * If all the (other) threads are already quiesced, it's up to the > + * current thread to do the necessary work. > + */ > + check_uproc_quiesced(uproc, survivor); > + return survivors; > +} > + > +/* Called with utask->uproc write-locked. */ > +static void uprobe_free_task(struct uprobe_task *utask, bool in_callback) > +{ > + struct deferred_registration *dr, *d; > + struct delayed_signal *ds, *ds2; > + int result; > + > + if (utask->engine && (utask->tsk != current || !in_callback)) { > + /* > + * No other tasks in this process should be running > + * uprobe_report_* callbacks. (If they are, utrace_barrier() > + * here could deadlock.) > + */ > + result = utrace_control_pid(utask->pid, utask->engine, > + UTRACE_DETACH); > + BUG_ON(result == -EINPROGRESS); > + } > + put_pid(utask->pid); /* null pid OK */ > + > + uprobe_unhash_utask(utask); > + list_del(&utask->list); > + list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) { > + list_del(&dr->list); > + kfree(dr); > + } > + > + list_for_each_entry_safe(ds, ds2, &utask->delayed_signals, list) { > + list_del(&ds->list); > + kfree(ds); > + } > + > + kfree(utask); > +} > + > +/* > + * Dismantle uproc and all its remaining uprobe_tasks. > + * in_callback = 1 if the caller is a uprobe_report_* callback who will > + * handle the UTRACE_DETACH operation. > + * Runs with uproc_mutex held; called with uproc->rwsem write-locked. > + */ > +static void uprobe_free_process(struct uprobe_process *uproc, int in_callback) > +{ > + struct uprobe_task *utask, *tmp; > + > + if (!hlist_unhashed(&uproc->hlist)) > + hlist_del(&uproc->hlist); > + list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list) > + uprobe_free_task(utask, in_callback); > + put_pid(uproc->tg_leader); > + if (uproc->xol_area) > + xol_put_area(uproc->xol_area); > + up_write(&uproc->rwsem); /* So kfree doesn't complain */ > + kfree(uproc); > +} > + > +/* > + * Decrement uproc's ref count. If it's zero, free uproc and return > + * 1. Else return 0. If uproc is locked, don't call this; use > + * uprobe_decref_process(). > + */ > +static int uprobe_put_process(struct uprobe_process *uproc, bool in_callback) > +{ > + int freed = 0; > + > + if (atomic_dec_and_test(&uproc->refcount)) { > + mutex_lock(&uproc_mutex); > + down_write(&uproc->rwsem); > + if (unlikely(atomic_read(&uproc->refcount) != 0)) { > + /* > + * The works because uproc_mutex is held any > + * time the ref count can go from 0 to 1 -- e.g., > + * register_uprobe() sneaks in with a new probe. > + */ > + up_write(&uproc->rwsem); > + } else { > + uprobe_free_process(uproc, in_callback); > + freed = 1; > + } > + mutex_unlock(&uproc_mutex); > + } > + return freed; > +} > + > +static struct uprobe_kimg *uprobe_mk_kimg(struct uprobe *u) > +{ > + struct uprobe_kimg *uk = kzalloc(sizeof *uk, > + GFP_USER); > + > + if (unlikely(!uk)) > + return ERR_PTR(-ENOMEM); > + u->kdata = uk; > + uk->uprobe = u; > + uk->ppt = NULL; > + INIT_LIST_HEAD(&uk->list); > + uk->status = -EBUSY; > + return uk; > +} > + > +/* > + * Allocate a uprobe_task object for p and add it to uproc's list. > + * Called with p "got" and uproc->rwsem write-locked. Called in one of > + * the following cases: > + * - before setting the first uprobe in p's process > + * - we're in uprobe_report_clone() and p is the newly added thread > + * Returns: > + * - pointer to new uprobe_task on success > + * - NULL if t dies before we can utrace_attach it > + * - negative errno otherwise > + */ > +static struct uprobe_task *uprobe_add_task(struct pid *p, > + struct uprobe_process *uproc) > +{ > + struct uprobe_task *utask; > + struct utrace_engine *engine; > + struct task_struct *t = pid_task(p, PIDTYPE_PID); What keeps the task_struct referenced by "t" from disappearing at this point? > + > + if (!t) > + return NULL; > + utask = kzalloc(sizeof *utask, GFP_USER); > + if (unlikely(utask == NULL)) > + return ERR_PTR(-ENOMEM); > + > + utask->pid = p; > + utask->tsk = t; > + utask->state = UPTASK_RUNNING; > + utask->quiescing = false; > + utask->uproc = uproc; > + utask->active_probe = NULL; > + utask->doomed = false; > + INIT_LIST_HEAD(&utask->deferred_registrations); > + INIT_LIST_HEAD(&utask->delayed_signals); > + INIT_LIST_HEAD(&utask->list); > + list_add_tail(&utask->list, &uproc->thread_list); > + uprobe_hash_utask(utask); > + > + engine = utrace_attach_pid(p, UTRACE_ATTACH_CREATE, > + p_uprobe_utrace_ops, utask); > + if (IS_ERR(engine)) { > + long err = PTR_ERR(engine); > + printk("uprobes: utrace_attach_task failed, returned %ld\n", > + err); > + uprobe_free_task(utask, 0); > + if (err == -ESRCH) > + return NULL; > + return ERR_PTR(err); > + } > + utask->engine = engine; > + /* > + * Always watch for traps, clones, execs and exits. Caller must > + * set any other engine flags. > + */ > + utask_adjust_flags(utask, UPROBE_SET_FLAGS, > + UTRACE_EVENT(SIGNAL) | UTRACE_EVENT(SIGNAL_IGN) | > + UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(EXEC) | > + UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT)); > + /* > + * Note that it's OK if t dies just after utrace_attach, because > + * with the engine in place, the appropriate report_* callback > + * should handle it after we release uproc->rwsem. > + */ > + utrace_engine_put(engine); > + return utask; > +} > + > +/* > + * start_pid is the pid for a thread in the probed process. Find the > + * next thread that doesn't have a corresponding uprobe_task yet. Return > + * a ref-counted pid for that task, if any, else NULL. > + */ > +static struct pid *find_next_thread_to_add(struct uprobe_process *uproc, > + struct pid *start_pid) > +{ > + struct task_struct *t, *start; > + struct uprobe_task *utask; > + struct pid *pid = NULL; > + > + rcu_read_lock(); > + start = pid_task(start_pid, PIDTYPE_PID); > + t = start; > + if (t) { > + do { > + if (unlikely(t->flags & PF_EXITING)) > + goto dont_add; > + list_for_each_entry(utask, &uproc->thread_list, list) { Doesn't this need to be list_for_each_entry_rcu()? Or do you have ->thread_list protected elsewise? > + if (utask->tsk == t) > + /* Already added */ > + goto dont_add; > + } > + /* Found thread/task to add. */ > + pid = get_pid(task_pid(t)); > + break; > +dont_add: > + t = next_thread(t); > + } while (t != start); > + } > + rcu_read_unlock(); Now that we are outside of rcu_read_lock()'s protection, the task indicated by "pid" might disappear, and the value of "pid" might well be reused. Is this really OK? > + return pid; > +} > + > +/* Runs with uproc_mutex held; returns with uproc->rwsem write-locked. */ > +static struct uprobe_process *uprobe_mk_process(struct pid *tg_leader) > +{ > + struct uprobe_process *uproc; > + struct uprobe_task *utask; > + struct pid *add_me; > + int i; > + long err; > + > + uproc = kzalloc(sizeof *uproc, GFP_USER); > + if (unlikely(uproc == NULL)) > + return ERR_PTR(-ENOMEM); > + > + /* Initialize fields */ > + atomic_set(&uproc->refcount, 1); > + init_rwsem(&uproc->rwsem); > + down_write(&uproc->rwsem); > + init_waitqueue_head(&uproc->waitq); > + for (i = 0; i < UPROBE_TABLE_SIZE; i++) > + INIT_HLIST_HEAD(&uproc->uprobe_table[i]); > + INIT_LIST_HEAD(&uproc->pending_uprobes); > + INIT_LIST_HEAD(&uproc->thread_list); > + uproc->nthreads = 0; > + uproc->n_quiescent_threads = 0; > + INIT_HLIST_NODE(&uproc->hlist); > + uproc->tg_leader = get_pid(tg_leader); > + uproc->tgid = pid_task(tg_leader, PIDTYPE_PID)->tgid; > + uproc->finished = false; > + > +#ifdef CONFIG_UBP_XOL > + if (!(ubp_strategies & UBP_HNT_INLINE)) > + uproc->sstep_out_of_line = true; > + else > +#endif > + uproc->sstep_out_of_line = false; > + > + /* > + * Create and populate one utask per thread in this process. We > + * can't call uprobe_add_task() while holding RCU lock, so we: > + * 1. rcu_read_lock() > + * 2. Find the next thread, add_me, in this process that's not > + * already on uproc's thread_list. > + * 3. rcu_read_unlock() > + * 4. uprobe_add_task(add_me, uproc) > + * Repeat 1-4 'til we have utasks for all threads. > + */ > + add_me = tg_leader; > + while ((add_me = find_next_thread_to_add(uproc, add_me)) != NULL) { > + utask = uprobe_add_task(add_me, uproc); > + if (IS_ERR(utask)) { > + err = PTR_ERR(utask); > + goto fail; > + } > + if (utask) > + uproc->nthreads++; > + } > + > + if (uproc->nthreads == 0) { > + /* All threads -- even p -- are dead. */ > + err = -ESRCH; > + goto fail; > + } > + return uproc; > + > +fail: > + uprobe_free_process(uproc, 0); > + return ERR_PTR(err); > +} > + > +/* > + * Creates a uprobe_probept and connects it to uk and uproc. Runs with > + * uproc->rwsem write-locked. > + */ > +static struct uprobe_probept *uprobe_add_probept(struct uprobe_kimg *uk, > + struct uprobe_process *uproc) > +{ > + struct uprobe_probept *ppt; > + > + ppt = kzalloc(sizeof *ppt, GFP_USER); > + if (unlikely(ppt == NULL)) > + return ERR_PTR(-ENOMEM); > + init_waitqueue_head(&ppt->waitq); > + init_waitqueue_head(&ppt->ssilq); > + spin_lock_init(&ppt->ssil_lock); > + ppt->ssil_state = SSIL_CLEAR; > + > + /* Connect to uk. */ > + INIT_LIST_HEAD(&ppt->uprobe_list); > + list_add_tail(&uk->list, &ppt->uprobe_list); > + uk->ppt = ppt; > + uk->status = -EBUSY; > + ppt->ubp.vaddr = uk->uprobe->vaddr; > + ppt->ubp.xol_vaddr = 0; > + > + /* Connect to uproc. */ > + if (!uproc->sstep_out_of_line) > + ppt->ubp.strategy = UBP_HNT_INLINE; > + else > + ppt->ubp.strategy = ubp_strategies; > + ppt->state = UPROBE_INSERTING; > + ppt->uproc = uproc; > + INIT_LIST_HEAD(&ppt->pd_node); > + list_add_tail(&ppt->pd_node, &uproc->pending_uprobes); > + INIT_HLIST_NODE(&ppt->ut_node); > + hlist_add_head(&ppt->ut_node, > + &uproc->uprobe_table[hash_long(ppt->ubp.vaddr, > + UPROBE_HASH_BITS)]); > + uprobe_get_process(uproc); > + return ppt; > +} > + > +/* > + * Runs with ppt->uproc write-locked. Frees ppt and decrements the ref > + * count on ppt->uproc (but ref count shouldn't hit 0). > + */ > +static void uprobe_free_probept(struct uprobe_probept *ppt) > +{ > + struct uprobe_process *uproc = ppt->uproc; > + > + xol_free_insn_slot(ppt->ubp.xol_vaddr, uproc->xol_area); > + hlist_del(&ppt->ut_node); > + kfree(ppt); > + uprobe_decref_process(uproc); > +} > + > +static void uprobe_free_kimg(struct uprobe_kimg *uk) > +{ > + uk->uprobe->kdata = NULL; > + kfree(uk); > +} > + > +/* > + * Runs with uprobe_process write-locked. > + * Note that we never free uk->uprobe, because the user owns that. > + */ > +static void purge_uprobe(struct uprobe_kimg *uk) > +{ > + struct uprobe_probept *ppt = uk->ppt; > + > + list_del(&uk->list); > + uprobe_free_kimg(uk); > + if (list_empty(&ppt->uprobe_list)) > + uprobe_free_probept(ppt); > +} > + > +/* > + * Runs with utask->uproc locked. > + * read lock if called from uprobe handler. > + * else write lock. > + * Returns -EINPROGRESS on success. > + * Returns -EBUSY if a request for defer registration already exists. > + * Returns 0 if we have deferred request for both register/unregister. > + * > + */ > +static int defer_registration(struct uprobe *u, int regflag, > + struct uprobe_task *utask) > +{ > + struct deferred_registration *dr, *d; > + > + /* Check if we already have such a defer request */ > + list_for_each_entry_safe(dr, d, &utask->deferred_registrations, list) { > + if (dr->uprobe == u) { > + if (dr->regflag != regflag) { > + /* same as successful register + unregister */ > + list_del(&dr->list); > + kfree(dr); > + return 0; > + } else > + /* we already have identical request */ > + return -EBUSY; > + } > + } > + > + /* We have a new unique request */ > + dr = kmalloc(sizeof(struct deferred_registration), GFP_USER); > + if (!dr) > + return -ENOMEM; > + dr->uprobe = u; > + dr->regflag = regflag; > + INIT_LIST_HEAD(&dr->list); > + list_add_tail(&dr->list, &utask->deferred_registrations); > + return -EINPROGRESS; > +} > + > +/* > + * Given a numeric thread ID, return a ref-counted struct pid for the > + * task-group-leader thread. > + */ > +static struct pid *uprobe_get_tg_leader(pid_t p) > +{ > + struct pid *pid = NULL; > + > + rcu_read_lock(); > + if (current->nsproxy) > + pid = find_vpid(p); > + if (pid) { > + struct task_struct *t = pid_task(pid, PIDTYPE_PID); > + if (t) > + pid = task_tgid(t); > + else > + pid = NULL; > + } > + rcu_read_unlock(); What happens if the thread disappears at this point? We are outside of rcu_read_lock() protection, so all the structures could potentially be freed up by other CPUs, especially if this CPU takes an interrupt or is preempted. > + return get_pid(pid); /* null pid OK here */ > +} > + > +/* See Documentation/uprobes.txt. */ > +int register_uprobe(struct uprobe *u) > +{ > + struct uprobe_task *cur_utask, *cur_utask_quiescing = NULL; > + struct uprobe_process *uproc; > + struct uprobe_probept *ppt; > + struct uprobe_kimg *uk; > + struct pid *p; > + int ret = 0, uproc_is_new = 0; > + bool survivors; > +#ifndef CONFIG_UBP_XOL > + struct task_struct *tsk; > +#endif > + > + if (!u || !u->handler) > + return -EINVAL; > + > + p = uprobe_get_tg_leader(u->pid); > + if (!p) > + return -ESRCH; > + > + cur_utask = uprobe_find_utask(current); > + if (cur_utask && cur_utask->active_probe) { > + /* > + * Called from handler; cur_utask->uproc is read-locked. > + * Do this registration later. > + */ > + put_pid(p); > + return defer_registration(u, 1, cur_utask); > + } > + > + /* Get the uprobe_process for this pid, or make a new one. */ > + mutex_lock(&uproc_mutex); > + uproc = uprobe_find_process(p); > + > + if (uproc) { > + struct uprobe_task *utask; > + > + mutex_unlock(&uproc_mutex); > + list_for_each_entry(utask, &uproc->thread_list, list) { > + if (!utask->active_probe) > + continue; > + /* > + * utask is at a probepoint, but has dropped > + * uproc->rwsem to single-step. If utask is > + * stopped, then it's probably because some > + * other engine has asserted UTRACE_STOP; > + * that engine may not allow UTRACE_RESUME > + * until register_uprobe() returns. But, for > + * reasons we won't go into here, utask wants > + * to finish with utask->active_probe before > + * allowing handle_pending_uprobes() to run > + * (via utask_fake_quiesce()). So we defer this > + * registration operation; it will be run after > + * utask->active_probe is taken care of. > + */ > + BUG_ON(utask->state != UPTASK_SSTEP); > + if (task_is_stopped_or_traced(utask->tsk)) { > + ret = defer_registration(u, 1, utask); > + goto fail_uproc; > + } > + } > + } else { > + uproc = uprobe_mk_process(p); > + if (IS_ERR(uproc)) { > + ret = (int) PTR_ERR(uproc); > + mutex_unlock(&uproc_mutex); > + goto fail_tsk; > + } > + /* Hold uproc_mutex until we've added uproc to uproc_table. */ > + uproc_is_new = 1; > + } > + > +#ifdef CONFIG_UBP_XOL > + ret = xol_validate_vaddr(p, u->vaddr, uproc->xol_area); > +#else > + tsk = pid_task(p, PIDTYPE_PID); > + ret = ubp_validate_insn_addr(tsk, u->vaddr); > +#endif > + if (ret < 0) > + goto fail_uproc; > + > + if (u->kdata) { > + /* > + * Probe is already/still registered. This is the only > + * place we return -EBUSY to the user. > + */ > + ret = -EBUSY; > + goto fail_uproc; > + } > + > + uk = uprobe_mk_kimg(u); > + if (IS_ERR(uk)) { > + ret = (int) PTR_ERR(uk); > + goto fail_uproc; > + } > + > + /* See if we already have a probepoint at the vaddr. */ > + ppt = (uproc_is_new ? NULL : uprobe_find_probept(uproc, u->vaddr)); > + if (ppt) { > + /* Breakpoint is already in place, or soon will be. */ > + uk->ppt = ppt; > + list_add_tail(&uk->list, &ppt->uprobe_list); > + switch (ppt->state) { > + case UPROBE_INSERTING: > + uk->status = -EBUSY; /* in progress */ > + if (uproc->tg_leader == task_tgid(current)) { > + cur_utask_quiescing = cur_utask; > + BUG_ON(!cur_utask_quiescing); > + } > + break; > + case UPROBE_REMOVING: > + /* Wait! Don't remove that bkpt after all! */ > + ppt->state = UPROBE_BP_SET; > + /* Remove from pending list. */ > + list_del(&ppt->pd_node); > + /* Wake unregister_uprobe(). */ > + wake_up_all(&ppt->waitq); > + /*FALLTHROUGH*/ > + case UPROBE_BP_SET: > + uk->status = 0; > + break; > + default: > + BUG(); > + } > + up_write(&uproc->rwsem); > + put_pid(p); > + if (uk->status == 0) { > + uprobe_decref_process(uproc); > + return 0; > + } > + goto await_bkpt_insertion; > + } else { > + ppt = uprobe_add_probept(uk, uproc); > + if (IS_ERR(ppt)) { > + ret = (int) PTR_ERR(ppt); > + goto fail_uk; > + } > + } > + > + if (uproc_is_new) { > + hlist_add_head(&uproc->hlist, > + &uproc_table[hash_ptr(uproc->tg_leader, > + UPROBE_HASH_BITS)]); > + mutex_unlock(&uproc_mutex); > + } > + put_pid(p); > + survivors = quiesce_all_threads(uproc, &cur_utask_quiescing); > + > + if (!survivors) { > + purge_uprobe(uk); > + up_write(&uproc->rwsem); > + uprobe_put_process(uproc, false); > + return -ESRCH; > + } > + up_write(&uproc->rwsem); > + > +await_bkpt_insertion: > + if (cur_utask_quiescing) > + /* Current task is probing its own process. */ > + (void) utask_fake_quiesce(cur_utask_quiescing); > + else > + wait_event(ppt->waitq, ppt->state != UPROBE_INSERTING); > + ret = uk->status; > + if (ret != 0) { > + down_write(&uproc->rwsem); > + purge_uprobe(uk); > + up_write(&uproc->rwsem); > + } > + uprobe_put_process(uproc, false); > + return ret; > + > +fail_uk: > + uprobe_free_kimg(uk); > + > +fail_uproc: > + if (uproc_is_new) { > + uprobe_free_process(uproc, 0); > + mutex_unlock(&uproc_mutex); > + } else { > + up_write(&uproc->rwsem); > + uprobe_put_process(uproc, false); > + } > + > +fail_tsk: > + put_pid(p); > + return ret; > +} > +EXPORT_SYMBOL_GPL(register_uprobe); > + > +/* See Documentation/uprobes.txt. */ > +void unregister_uprobe(struct uprobe *u) > +{ > + struct pid *p; > + struct uprobe_process *uproc; > + struct uprobe_kimg *uk; > + struct uprobe_probept *ppt; > + struct uprobe_task *cur_utask, *cur_utask_quiescing = NULL; > + struct uprobe_task *utask; > + > + if (!u) > + return; > + p = uprobe_get_tg_leader(u->pid); > + if (!p) > + return; > + > + cur_utask = uprobe_find_utask(current); > + if (cur_utask && cur_utask->active_probe) { > + /* Called from handler; uproc is read-locked; do this later */ > + put_pid(p); > + (void) defer_registration(u, 0, cur_utask); > + return; > + } > + > + /* > + * Lock uproc before walking the graph, in case the process we're > + * probing is exiting. > + */ > + mutex_lock(&uproc_mutex); > + uproc = uprobe_find_process(p); > + mutex_unlock(&uproc_mutex); > + put_pid(p); > + if (!uproc) > + return; > + > + list_for_each_entry(utask, &uproc->thread_list, list) { > + if (!utask->active_probe) > + continue; > + > + /* See comment in register_uprobe(). */ > + BUG_ON(utask->state != UPTASK_SSTEP); > + if (task_is_stopped_or_traced(utask->tsk)) { > + (void) defer_registration(u, 0, utask); > + goto done; > + } > + } > + uk = (struct uprobe_kimg *)u->kdata; > + if (!uk) > + /* > + * This probe was never successfully registered, or > + * has already been unregistered. > + */ > + goto done; > + if (uk->status == -EBUSY) > + /* Looks like register or unregister is already in progress. */ > + goto done; > + ppt = uk->ppt; > + > + list_del(&uk->list); > + uprobe_free_kimg(uk); > + > + if (!list_empty(&ppt->uprobe_list)) > + goto done; > + > + /* > + * The last uprobe at ppt's probepoint is being unregistered. > + * Queue the breakpoint for removal. > + */ > + ppt->state = UPROBE_REMOVING; > + list_add_tail(&ppt->pd_node, &uproc->pending_uprobes); > + > + (void) quiesce_all_threads(uproc, &cur_utask_quiescing); > + up_write(&uproc->rwsem); > + if (cur_utask_quiescing) > + /* Current task is probing its own process. */ > + (void) utask_fake_quiesce(cur_utask_quiescing); > + else > + wait_event(ppt->waitq, ppt->state != UPROBE_REMOVING); > + > + if (likely(ppt->state == UPROBE_DISABLED)) { > + down_write(&uproc->rwsem); > + uprobe_free_probept(ppt); > + /* else somebody else's register_uprobe() resurrected ppt. */ > + up_write(&uproc->rwsem); > + } > + uprobe_put_process(uproc, false); > + return; > + > +done: > + up_write(&uproc->rwsem); > + uprobe_put_process(uproc, false); > +} > +EXPORT_SYMBOL_GPL(unregister_uprobe); > + > +/* Find a surviving thread in uproc. Runs with uproc->rwsem locked. */ > +static struct task_struct *find_surviving_thread(struct uprobe_process *uproc) > +{ > + struct uprobe_task *utask; > + > + list_for_each_entry(utask, &uproc->thread_list, list) { > + if (!(utask->tsk->flags & PF_EXITING)) > + return utask->tsk; > + } > + return NULL; > +} > + > +/* > + * Run all the deferred_registrations previously queued by the current utask. > + * Runs with no locks or mutexes held. The current utask's uprobe_process > + * is ref-counted, so it won't disappear as the result of unregister_u*probe() > + * called here. > + */ > +static void uprobe_run_def_regs(struct list_head *drlist) > +{ > + struct deferred_registration *dr, *d; > + > + list_for_each_entry_safe(dr, d, drlist, list) { > + int result = 0; > + struct uprobe *u = dr->uprobe; > + > + if (dr->regflag) > + result = register_uprobe(u); > + else > + unregister_uprobe(u); > + if (u && u->registration_callback) > + u->registration_callback(u, dr->regflag, result); > + list_del(&dr->list); > + kfree(dr); > + } > +} > + > +/* > + * utrace engine report callbacks > + */ > + > +/* > + * We've been asked to quiesce, but aren't in a position to do so. > + * This could happen in either of the following cases: > + * > + * 1) Our own thread is doing a register or unregister operation -- > + * e.g., as called from a uprobe handler or a non-uprobes utrace > + * callback. We can't wait_event() for ourselves in [un]register_uprobe(). > + * > + * 2) We've been asked to quiesce, but we hit a probepoint first. Now > + * we're in the report_signal callback, having handled the probepoint. > + * We'd like to just turn on UTRACE_EVENT(QUIESCE) and coast into > + * quiescence. Unfortunately, it's possible to hit a probepoint again > + * before we quiesce. When processing the SIGTRAP, utrace would call > + * uprobe_report_quiesce(), which must decline to take any action so > + * as to avoid removing the uprobe just hit. As a result, we could > + * keep hitting breakpoints and never quiescing. > + * > + * So here we do essentially what we'd prefer to do in uprobe_report_quiesce(). > + * If we're the last thread to quiesce, handle_pending_uprobes() and > + * rouse_all_threads(). Otherwise, pretend we're quiescent and sleep until > + * the last quiescent thread handles that stuff and then wakes us. > + * > + * Called and returns with no mutexes held. Returns 1 if we free utask->uproc, > + * else 0. > + */ > +static int utask_fake_quiesce(struct uprobe_task *utask) > +{ > + struct uprobe_process *uproc = utask->uproc; > + enum uprobe_task_state prev_state = utask->state; > + > + down_write(&uproc->rwsem); > + > + /* In case we're somehow set to quiesce for real... */ > + clear_utrace_quiesce(utask, false); > + > + if (uproc->n_quiescent_threads == uproc->nthreads-1) { > + /* We're the last thread to "quiesce." */ > + handle_pending_uprobes(uproc, utask->tsk); > + rouse_all_threads(uproc); > + up_write(&uproc->rwsem); > + return 0; > + } else { > + utask->state = UPTASK_SLEEPING; > + uproc->n_quiescent_threads++; > + up_write(&uproc->rwsem); > + /* We ref-count sleepers. */ > + uprobe_get_process(uproc); > + > + wait_event(uproc->waitq, !utask->quiescing); > + > + down_write(&uproc->rwsem); > + utask->state = prev_state; > + uproc->n_quiescent_threads--; > + up_write(&uproc->rwsem); > + > + /* > + * If uproc's last uprobe has been unregistered, and > + * unregister_uprobe() woke up before we did, it's up > + * to us to free uproc. > + */ > + return uprobe_put_process(uproc, false); > + } > +} > + > +/* Prepare to single-step ppt's probed instruction inline. */ > +static void uprobe_pre_ssin(struct uprobe_task *utask, > + struct uprobe_probept *ppt, struct pt_regs *regs) > +{ > + unsigned long flags; > + > + if (unlikely(ppt->ssil_state == SSIL_DISABLE)) { > + reset_thread_ip(utask->tsk, regs, ppt->ubp.vaddr); > + return; > + } > + spin_lock_irqsave(&ppt->ssil_lock, flags); > + while (ppt->ssil_state == SSIL_SET) { > + spin_unlock_irqrestore(&ppt->ssil_lock, flags); > + up_read(&utask->uproc->rwsem); > + wait_event(ppt->ssilq, ppt->ssil_state != SSIL_SET); > + down_read(&utask->uproc->rwsem); > + spin_lock_irqsave(&ppt->ssil_lock, flags); > + } > + if (unlikely(ppt->ssil_state == SSIL_DISABLE)) { > + /* > + * While waiting to single step inline, breakpoint has > + * been removed. Thread continues as if nothing happened. > + */ > + spin_unlock_irqrestore(&ppt->ssil_lock, flags); > + reset_thread_ip(utask->tsk, regs, ppt->ubp.vaddr); > + return; > + } > + ppt->ssil_state = SSIL_SET; > + spin_unlock_irqrestore(&ppt->ssil_lock, flags); > + > + if (unlikely(ubp_pre_sstep(utask->tsk, &ppt->ubp, > + &utask->arch_info, regs) != 0)) { > + printk(KERN_ERR "Failed to temporarily restore original " > + "instruction for single-stepping: " > + "pid/tgid=%d/%d, vaddr=%#lx\n", > + utask->tsk->pid, utask->tsk->tgid, ppt->ubp.vaddr); > + utask->doomed = true; > + } > +} > + > +/* Prepare to continue execution after single-stepping inline. */ > +static void uprobe_post_ssin(struct uprobe_task *utask, > + struct uprobe_probept *ppt, struct pt_regs *regs) > +{ > + unsigned long flags; > + > + if (unlikely(ubp_post_sstep(utask->tsk, &ppt->ubp, > + &utask->arch_info, regs) != 0)) > + printk("Couldn't restore bp: pid/tgid=%d/%d, addr=%#lx\n", > + utask->tsk->pid, utask->tsk->tgid, ppt->ubp.vaddr); > + spin_lock_irqsave(&ppt->ssil_lock, flags); > + if (likely(ppt->ssil_state == SSIL_SET)) { > + ppt->ssil_state = SSIL_CLEAR; > + wake_up(&ppt->ssilq); > + } > + spin_unlock_irqrestore(&ppt->ssil_lock, flags); > +} > + > +#ifdef CONFIG_UBP_XOL > +/* > + * This architecture wants to do single-stepping out of line, but now we've > + * discovered that it can't -- typically because we couldn't set up the XOL > + * vma. Make all probepoints use inline single-stepping. > + */ > +static void uproc_cancel_xol(struct uprobe_process *uproc) > +{ > + down_write(&uproc->rwsem); > + if (likely(uproc->sstep_out_of_line)) { > + /* No other task beat us to it. */ > + int i; > + struct uprobe_probept *ppt; > + struct hlist_node *node; > + struct hlist_head *head; > + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { > + head = &uproc->uprobe_table[i]; > + hlist_for_each_entry(ppt, node, head, ut_node) { > + if (!(ppt->ubp.strategy & UBP_HNT_INLINE)) > + ubp_cancel_xol(current, &ppt->ubp); > + } > + } > + /* Do this last, so other tasks don't proceed too soon. */ > + uproc->sstep_out_of_line = false; > + } > + up_write(&uproc->rwsem); > +} > + > +/* Prepare to single-step ppt's probed instruction out of line. */ > +static int uprobe_pre_ssout(struct uprobe_task *utask, > + struct uprobe_probept *ppt, struct pt_regs *regs) > +{ > + if (!ppt->ubp.xol_vaddr) > + ppt->ubp.xol_vaddr = xol_get_insn_slot(&ppt->ubp, > + ppt->uproc->xol_area); > + if (unlikely(!ppt->ubp.xol_vaddr)) { > + ubp_cancel_xol(utask->tsk, &ppt->ubp); > + return -1; > + } > + utask->singlestep_addr = ppt->ubp.xol_vaddr; > + return ubp_pre_sstep(utask->tsk, &ppt->ubp, &utask->arch_info, regs); > +} > + > +/* Prepare to continue execution after single-stepping out of line. */ > +static int uprobe_post_ssout(struct uprobe_task *utask, > + struct uprobe_probept *ppt, struct pt_regs *regs) > +{ > + int ret; > + > + ret = ubp_post_sstep(utask->tsk, &ppt->ubp, &utask->arch_info, regs); > + return ret; > +} > +#endif > + > +/* > + * If this thread is supposed to be quiescing, mark it quiescent; and > + * if it was the last thread to quiesce, do the work we quiesced for. > + * Runs with utask->uproc->rwsem write-locked. Returns true if we can > + * let this thread resume. > + */ > +static bool utask_quiesce(struct uprobe_task *utask) > +{ > + if (utask->quiescing) { > + if (utask->state != UPTASK_QUIESCENT) { > + utask->state = UPTASK_QUIESCENT; > + utask->uproc->n_quiescent_threads++; > + } > + return check_uproc_quiesced(utask->uproc, current); > + } else { > + clear_utrace_quiesce(utask, false); > + return true; > + } > +} > + > +/* > + * Delay delivery of the indicated signal until after single-step. > + * Otherwise single-stepping will be cancelled as part of calling > + * the signal handler. > + */ > +static void uprobe_delay_signal(struct uprobe_task *utask, siginfo_t *info) > +{ > + struct delayed_signal *ds; > + > + ds = kmalloc(sizeof(*ds), GFP_USER); > + if (ds) { > + ds->info = *info; > + INIT_LIST_HEAD(&ds->list); > + list_add_tail(&ds->list, &utask->delayed_signals); > + } > +} > + > +static void uprobe_inject_delayed_signals(struct list_head *delayed_signals) > +{ > + struct delayed_signal *ds, *tmp; > + > + list_for_each_entry_safe(ds, tmp, delayed_signals, list) { > + send_sig_info(ds->info.si_signo, &ds->info, current); > + list_del(&ds->list); > + kfree(ds); > + } > +} > + > +/* > + * Verify from Instruction Pointer if singlestep has indeed occurred. > + * If Singlestep has occurred, then do post singlestep fix-ups. > + */ > +static bool validate_and_post_sstep(struct uprobe_task *utask, > + struct pt_regs *regs, > + struct uprobe_probept *ppt) > +{ > + unsigned long vaddr = instruction_pointer(regs); > + > + if (ppt->ubp.strategy & UBP_HNT_INLINE) { > + /* > + * If we have singlestepped, Instruction pointer cannot > + * be same as virtual address of probepoint. > + */ > + if (vaddr == ppt->ubp.vaddr) > + return false; > + uprobe_post_ssin(utask, ppt, regs); > +#ifdef CONFIG_UBP_XOL > + } else { > + /* > + * If we have executed out of line, Instruction pointer > + * cannot be same as virtual address of XOL slot. > + */ > + if (vaddr == ppt->ubp.xol_vaddr) > + return false; > + uprobe_post_ssout(utask, ppt, regs); > +#endif > + } > + return true; > +} > + > +/* > + * Helper routine for uprobe_report_signal(). > + * We get called here with: > + * state = UPTASK_RUNNING => we are here due to a breakpoint hit > + * - Read-lock the process > + * - Figure out which probepoint, based on regs->IP > + * - Set state = UPTASK_BP_HIT > + * - Invoke handler for each uprobe at this probepoint > + * - Reset regs->IP to beginning of the insn, if necessary > + * - Start watching for quiesce events, in case another > + * engine cancels our UTRACE_SINGLESTEP with a > + * UTRACE_STOP. > + * - Set singlestep in motion (UTRACE_SINGLESTEP), > + * with state = UPTASK_SSTEP > + * - Read-unlock the process > + * > + * state = UPTASK_SSTEP => here after single-stepping > + * - Read-lock the process > + * - Validate we are here per the state machine > + * - Clean up after single-stepping > + * - Set state = UPTASK_RUNNING > + * - Read-unlock the process > + * - If it's time to quiesce, take appropriate action. > + * - If the handler(s) we ran called [un]register_uprobe(), > + * complete those via uprobe_run_def_regs(). > + * > + * state = ANY OTHER STATE > + * - Not our signal, pass it on (UTRACE_RESUME) > + */ > +static u32 uprobe_handle_signal(u32 action, > + struct uprobe_task *utask, > + struct pt_regs *regs, > + siginfo_t *info, > + const struct k_sigaction *orig_ka) > +{ > + struct uprobe_probept *ppt; > + struct uprobe_process *uproc; > + struct uprobe_kimg *uk; > + unsigned long probept; > + enum utrace_resume_action resume_action; > + enum utrace_signal_action signal_action = utrace_signal_action(action); > + > + uproc = utask->uproc; > + > + /* > + * We may need to re-assert UTRACE_SINGLESTEP if this signal > + * is not associated with the breakpoint. > + */ > + if (utask->state == UPTASK_SSTEP) > + resume_action = UTRACE_SINGLESTEP; > + else > + resume_action = UTRACE_RESUME; > + /* > + * This might be UTRACE_SIGNAL_REPORT request but some other > + * engine's callback might have changed the signal action to > + * something other than UTRACE_SIGNAL_REPORT. Use orig_ka to figure > + * out such cases. > + */ > + if (unlikely(signal_action == UTRACE_SIGNAL_REPORT) || !orig_ka) { > + /* This thread was quiesced using UTRACE_INTERRUPT. */ > + bool done_quiescing; > + if (utask->active_probe) > + /* > + * We'll fake quiescence after we're done > + * processing the probepoint. > + */ > + return UTRACE_SIGNAL_IGN | resume_action; > + > + down_write(&uproc->rwsem); > + done_quiescing = utask_quiesce(utask); > + up_write(&uproc->rwsem); > + if (done_quiescing) > + resume_action = UTRACE_RESUME; > + else > + resume_action = UTRACE_STOP; > + return UTRACE_SIGNAL_IGN | resume_action; > + } > + > + /* > + * info will be null if we're called with action=UTRACE_SIGNAL_HANDLER, > + * which means that single-stepping has been disabled so a signal > + * handler can be called in the probed process. That should never > + * happen because we intercept and delay handled signals (action = > + * UTRACE_RESUME) until after we're done single-stepping. > + */ > + BUG_ON(!info); > + if (signal_action == UTRACE_SIGNAL_DELIVER && utask->active_probe && > + info->si_signo != SSTEP_SIGNAL) { > + uprobe_delay_signal(utask, info); > + return UTRACE_SIGNAL_IGN | UTRACE_SINGLESTEP; > + } > + > + if (info->si_signo != BREAKPOINT_SIGNAL && > + info->si_signo != SSTEP_SIGNAL) > + goto no_interest; > + > + switch (utask->state) { > + case UPTASK_RUNNING: > + if (info->si_signo != BREAKPOINT_SIGNAL) > + goto no_interest; > + > +#ifdef CONFIG_UBP_XOL > + /* > + * Set up the XOL area if it's not already there. We > + * do this here because we have to do it before > + * handling the first probepoint hit, the probed > + * process has to do it, and this may be the first > + * time our probed process runs uprobes code. We need > + * the XOL area for the uretprobe trampoline even if > + * this architectures doesn't single-step out of line. > + */ > + if (uproc->sstep_out_of_line && !uproc->xol_area) { > + uproc->xol_area = xol_get_area(uproc->tg_leader); > + if (unlikely(uproc->sstep_out_of_line) && > + unlikely(!uproc->xol_area)) > + uproc_cancel_xol(uproc); > + } > +#endif > + > + down_read(&uproc->rwsem); > + /* Don't quiesce while running handlers. */ > + clear_utrace_quiesce(utask, false); > + probept = ubp_get_bkpt_addr(regs); > + ppt = uprobe_find_probept(uproc, probept); > + if (!ppt) { > + up_read(&uproc->rwsem); > + goto no_interest; > + } > + utask->active_probe = ppt; > + utask->state = UPTASK_BP_HIT; > + > + if (likely(ppt->state == UPROBE_BP_SET)) { > + list_for_each_entry(uk, &ppt->uprobe_list, list) { > + struct uprobe *u = uk->uprobe; > + if (u->handler) > + u->handler(u, regs); > + } > + } > + > +#ifdef CONFIG_UBP_XOL > + if ((ppt->ubp.strategy & UBP_HNT_INLINE) || > + uprobe_pre_ssout(utask, ppt, regs) != 0) > +#endif > + uprobe_pre_ssin(utask, ppt, regs); > + if (unlikely(utask->doomed)) { > + utask->active_probe = NULL; > + utask->state = UPTASK_RUNNING; > + up_read(&uproc->rwsem); > + goto no_interest; > + } > + utask->state = UPTASK_SSTEP; > + /* In case another engine cancels our UTRACE_SINGLESTEP... */ > + utask_adjust_flags(utask, UPROBE_SET_FLAGS, > + UTRACE_EVENT(QUIESCE)); > + /* Don't deliver this signal to the process. */ > + resume_action = UTRACE_SINGLESTEP; > + signal_action = UTRACE_SIGNAL_IGN; > + > + up_read(&uproc->rwsem); > + break; > + > + case UPTASK_SSTEP: > + if (info->si_signo != SSTEP_SIGNAL) > + goto no_interest; > + > + down_read(&uproc->rwsem); > + ppt = utask->active_probe; > + BUG_ON(!ppt); > + > + /* > + * Havent singlestepped yet? then re-assert > + * UTRACE_SINGLESTEP. > + */ > + if (!validate_and_post_sstep(utask, regs, ppt)) { > + up_read(&uproc->rwsem); > + goto no_interest; > + } > + > + /* No further need to re-assert UTRACE_SINGLESTEP. */ > + clear_utrace_quiesce(utask, false); > + > + utask->active_probe = NULL; > + utask->state = UPTASK_RUNNING; > + if (unlikely(utask->doomed)) { > + up_read(&uproc->rwsem); > + goto no_interest; > + } > + > + if (utask->quiescing) { > + int uproc_freed; > + up_read(&uproc->rwsem); > + uproc_freed = utask_fake_quiesce(utask); > + BUG_ON(uproc_freed); > + } else > + up_read(&uproc->rwsem); > + > + /* > + * We hold a ref count on uproc, so this should never > + * make utask or uproc disappear. > + */ > + uprobe_run_def_regs(&utask->deferred_registrations); > + > + uprobe_inject_delayed_signals(&utask->delayed_signals); > + > + resume_action = UTRACE_RESUME; > + signal_action = UTRACE_SIGNAL_IGN; > + break; > + default: > + goto no_interest; > + } > + > +no_interest: > + return signal_action | resume_action; > +} > + > +/* > + * Signal callback: > + */ > +static u32 uprobe_report_signal(u32 action, > + struct utrace_engine *engine, > + struct pt_regs *regs, > + siginfo_t *info, > + const struct k_sigaction *orig_ka, > + struct k_sigaction *return_ka) > +{ > + struct uprobe_task *utask; > + struct uprobe_process *uproc; > + bool doomed; > + enum utrace_resume_action report_action; > + > + utask = (struct uprobe_task *)rcu_dereference(engine->data); Are we really in an RCU read-side critical section here? > + BUG_ON(!utask); > + uproc = utask->uproc; > + > + /* Keep uproc intact until just before we return. */ > + uprobe_get_process(uproc); > + report_action = uprobe_handle_signal(action, utask, regs, info, > + orig_ka); > + doomed = utask->doomed; > + > + if (uprobe_put_process(uproc, true)) > + report_action = utrace_signal_action(report_action) | > + UTRACE_DETACH; > + if (doomed) > + do_exit(SIGSEGV); > + return report_action; > +} > + > +/* > + * Quiesce callback: The associated process has one or more breakpoint > + * insertions or removals pending. If we're the last thread in this > + * process to quiesce, do the insertion(s) and/or removal(s). > + */ > +static u32 uprobe_report_quiesce(u32 action, > + struct utrace_engine *engine, > + unsigned long event) > +{ > + struct uprobe_task *utask; > + struct uprobe_process *uproc; > + bool done_quiescing = false; > + > + utask = (struct uprobe_task *)rcu_dereference(engine->data); Are we really in an RCU read-side critical section here? > + BUG_ON(!utask); > + > + if (utask->state == UPTASK_SSTEP) > + /* > + * We got a breakpoint trap and tried to single-step, > + * but somebody else's report_signal callback overrode > + * our UTRACE_SINGLESTEP with a UTRACE_STOP. Try again. > + */ > + return UTRACE_SINGLESTEP; > + > + BUG_ON(utask->active_probe); > + uproc = utask->uproc; > + down_write(&uproc->rwsem); > + done_quiescing = utask_quiesce(utask); > + up_write(&uproc->rwsem); > + return done_quiescing ? UTRACE_RESUME : UTRACE_STOP; > +} > + > +/* > + * uproc's process is exiting or exec-ing. Runs with uproc->rwsem > + * write-locked. Caller must ref-count uproc before calling this > + * function, to ensure that uproc doesn't get freed in the middle of > + * this. > + */ > +static void uprobe_cleanup_process(struct uprobe_process *uproc) > +{ > + struct hlist_node *pnode1, *pnode2; > + struct uprobe_kimg *uk, *unode; > + struct uprobe_probept *ppt; > + struct hlist_head *head; > + int i; > + > + uproc->finished = true; > + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { > + head = &uproc->uprobe_table[i]; > + hlist_for_each_entry_safe(ppt, pnode1, pnode2, head, ut_node) { > + if (ppt->state == UPROBE_INSERTING || > + ppt->state == UPROBE_REMOVING) { > + /* > + * This task is (exec/exit)ing with > + * a [un]register_uprobe pending. > + * [un]register_uprobe will free ppt. > + */ > + ppt->state = UPROBE_DISABLED; > + list_del(&ppt->pd_node); > + list_for_each_entry_safe(uk, unode, > + &ppt->uprobe_list, list) > + uk->status = -ESRCH; > + wake_up_all(&ppt->waitq); > + } else if (ppt->state == UPROBE_BP_SET) { > + list_for_each_entry_safe(uk, unode, > + &ppt->uprobe_list, list) { > + list_del(&uk->list); > + uprobe_free_kimg(uk); > + } > + uprobe_free_probept(ppt); > + /* else */ > + /* > + * If ppt is UPROBE_DISABLED, assume that > + * [un]register_uprobe() has been notified > + * and will free it soon. > + */ > + } > + } > + } > +} > + > +static u32 uprobe_exec_exit(struct utrace_engine *engine, > + struct task_struct *tsk, int exit) > +{ > + struct uprobe_process *uproc; > + struct uprobe_probept *ppt; > + struct uprobe_task *utask; > + bool utask_quiescing; > + > + utask = (struct uprobe_task *)rcu_dereference(engine->data); Are we really in an RCU read-side critical section here? > + uproc = utask->uproc; > + uprobe_get_process(uproc); > + > + ppt = utask->active_probe; > + if (ppt) { > + printk(KERN_WARNING "Task handler called %s while at uprobe" > + " probepoint: pid/tgid = %d/%d, probepoint" > + " = %#lx\n", (exit ? "exit" : "exec"), > + tsk->pid, tsk->tgid, ppt->ubp.vaddr); > + /* > + * Mutex cleanup depends on where do_execve()/do_exit() was > + * called and on ubp strategy (XOL vs. SSIL). > + */ > + if (ppt->ubp.strategy & UBP_HNT_INLINE) { > + switch (utask->state) { > + unsigned long flags; > + case UPTASK_SSTEP: > + spin_lock_irqsave(&ppt->ssil_lock, flags); > + ppt->ssil_state = SSIL_CLEAR; > + wake_up(&ppt->ssilq); > + spin_unlock_irqrestore(&ppt->ssil_lock, flags); > + break; > + default: > + break; > + } > + } > + if (utask->state == UPTASK_BP_HIT) { > + /* uprobe handler called do_exit()/do_execve(). */ > + up_read(&uproc->rwsem); > + uprobe_decref_process(uproc); > + } > + } > + > + down_write(&uproc->rwsem); > + utask_quiescing = utask->quiescing; > + uproc->nthreads--; > + if (utrace_set_events_pid(utask->pid, engine, 0)) > + /* We don't care. */ > + ; > + uprobe_free_task(utask, 1); > + if (uproc->nthreads) { > + /* > + * In case other threads are waiting for us to quiesce... > + */ > + if (utask_quiescing) > + (void) check_uproc_quiesced(uproc, > + find_surviving_thread(uproc)); > + } else > + /* > + * We were the last remaining thread - clean up the uprobe > + * remnants a la unregister_uprobe(). We don't have to > + * remove the breakpoints, though. > + */ > + uprobe_cleanup_process(uproc); > + > + up_write(&uproc->rwsem); > + uprobe_put_process(uproc, true); > + return UTRACE_DETACH; > +} > + > +/* > + * Exit callback: The associated task/thread is exiting. > + */ > +static u32 uprobe_report_exit(u32 action, > + struct utrace_engine *engine, > + long orig_code, long *code) > +{ > + return uprobe_exec_exit(engine, current, 1); > +} > +/* > + * Clone callback: The current task has spawned a thread/process. > + * Utrace guarantees that parent and child pointers will be valid > + * for the duration of this callback. > + * > + * NOTE: For now, we don't pass on uprobes from the parent to the > + * child. We now do the necessary clearing of breakpoints in the > + * child's address space. > + * > + * TODO: > + * - Provide option for child to inherit uprobes. > + */ > +static u32 uprobe_report_clone(u32 action, > + struct utrace_engine *engine, > + unsigned long clone_flags, > + struct task_struct *child) > +{ > + struct uprobe_process *uproc; > + struct uprobe_task *ptask, *ctask; > + > + ptask = (struct uprobe_task *)rcu_dereference(engine->data); Are we really in an RCU read-side critical section here? > + uproc = ptask->uproc; > + > + /* > + * Lock uproc so no new uprobes can be installed 'til all > + * report_clone activities are completed. > + */ > + mutex_lock(&uproc_mutex); > + down_write(&uproc->rwsem); > + > + if (clone_flags & CLONE_THREAD) { > + /* New thread in the same process. */ > + ctask = uprobe_find_utask(child); > + if (unlikely(ctask)) { > + /* > + * uprobe_mk_process() ran just as this clone > + * happened, and has already accounted for the > + * new child. > + */ > + } else { > + struct pid *child_pid = get_pid(task_pid(child)); > + BUG_ON(!child_pid); > + ctask = uprobe_add_task(child_pid, uproc); > + BUG_ON(!ctask); > + if (IS_ERR(ctask)) > + goto done; > + uproc->nthreads++; > + /* > + * FIXME: Handle the case where uproc is quiescing > + * (assuming it's possible to clone while quiescing). > + */ > + } > + } else { > + /* > + * New process spawned by parent. Remove the probepoints > + * in the child's text. > + * > + * Its not necessary to quiesce the child as we are assured > + * by utrace that this callback happens *before* the child > + * gets to run userspace. > + * > + * We also hold the uproc->rwsem for the parent - so no > + * new uprobes will be registered 'til we return. > + */ > + int i; > + struct uprobe_probept *ppt; > + struct hlist_node *node; > + struct hlist_head *head; > + > + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { > + head = &uproc->uprobe_table[i]; > + hlist_for_each_entry(ppt, node, head, ut_node) { > + if (ubp_remove_bkpt(child, &ppt->ubp) != 0) { > + /* Ratelimit this? */ > + printk(KERN_ERR "Pid %d forked %d;" > + " failed to remove probepoint" > + " at %#lx in child\n", > + current->pid, child->pid, > + ppt->ubp.vaddr); > + } > + } > + } > + } > + > +done: > + up_write(&uproc->rwsem); > + mutex_unlock(&uproc_mutex); > + return UTRACE_RESUME; > +} > + > +/* > + * Exec callback: The associated process called execve() or friends > + * > + * The new program is about to start running and so there is no > + * possibility of a uprobe from the previous user address space > + * to be hit. > + * > + * NOTE: > + * Typically, this process would have passed through the clone > + * callback, where the necessary action *should* have been > + * taken. However, if we still end up at this callback: > + * - We don't have to clear the uprobes - memory image > + * will be overlaid. > + * - We have to free up uprobe resources associated with > + * this process. > + */ > +static u32 uprobe_report_exec(u32 action, > + struct utrace_engine *engine, > + const struct linux_binfmt *fmt, > + const struct linux_binprm *bprm, > + struct pt_regs *regs) > +{ > + return uprobe_exec_exit(engine, current, 0); > +} > + > +static const struct utrace_engine_ops uprobe_utrace_ops = { > + .report_quiesce = uprobe_report_quiesce, > + .report_signal = uprobe_report_signal, > + .report_exit = uprobe_report_exit, > + .report_clone = uprobe_report_clone, > + .report_exec = uprobe_report_exec > +}; > + > +static int __init init_uprobes(void) > +{ > + int ret, i; > + > + ubp_strategies = UBP_HNT_TSKINFO; > + ret = ubp_init(&ubp_strategies); > + if (ret != 0) { > + printk(KERN_ERR "Can't start uprobes: ubp_init() returned %d\n", > + ret); > + return ret; > + } > + for (i = 0; i < UPROBE_TABLE_SIZE; i++) { > + INIT_HLIST_HEAD(&uproc_table[i]); > + INIT_HLIST_HEAD(&utask_table[i]); > + } > + > + p_uprobe_utrace_ops = &uprobe_utrace_ops; > + return 0; > +} > + > +static void __exit exit_uprobes(void) > +{ > +} > + > +module_init(init_uprobes); > +module_exit(exit_uprobes); > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ From caiqian at redhat.com Tue Jan 12 04:33:27 2010 From: caiqian at redhat.com (CAI Qian) Date: Mon, 11 Jan 2010 23:33:27 -0500 (EST) Subject: powerpc: step-jump-cont failure (Was: [PATCH] utrace: don't set ->ops = utrace_detached_ops lockless) In-Reply-To: <20100111143756.GA4970@redhat.com> Message-ID: <1972914884.138241263270807777.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Thanks for pointing out. Sorry for the false alarm. From fweisbec at gmail.com Tue Jan 12 04:54:56 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Tue, 12 Jan 2010 05:54:56 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> Message-ID: <20100112045454.GJ5243@nowhere> On Mon, Jan 11, 2010 at 05:56:08PM +0530, Srikar Dronamraju wrote: > This patch implements ftrace plugin for uprobes. > > Description: > Ftrace plugin provides an interface to dump data at a given address, top of > the stack and function arguments when a user program calls a specific > function. So, as told before, ftrace plugins tend to be relegated to obsolescence and I first suggested to plug this into kprobe events so that we have a unified interface to control/create u|k|kret probe events. But after digging more into first appearances, uprobe creation can follow the kprobes creation flow. kprobe can be created whenever we want. This is about probing kernel text and it is already there so that we can set the probe, default deactivated, in advance. This is much more tricky in the case of uprobes as I see two ways to work with it: - probing on an already running process - probing on a process we are about to run Now say we create to create a uprobe trace event for an already running process. No problem in the workflow, we just need to set the address and the pid. Fine. Now what if I want to launch ls and want to profile a function inside. What can I do with a trace event. I can't create the probe event based on a pid as I don't know it in advance. I could give it the ls cmdline and it manages to activate on the next ls launched. This is racy as another ls can be launched concurrently. So I can only say there that an ftrace plugin or an ftrace trace event would be only a half-useful interface to exploit utrace possibilities because it only lets us trace already running apps. Moreover I bet the most chosen workflow to profile/trace uprobes is by launching an app and profile it from the beginning, not by profiling an already running one, which makes an ftrace interface even less than half useful there. ftrace is cool to trace the kernel, but this kind of tricky userspace tracing workflow is not adapted to it. What do you think? From rostedt at goodmis.org Tue Jan 12 05:08:53 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Tue, 12 Jan 2010 00:08:53 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100112045454.GJ5243@nowhere> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> Message-ID: <1263272933.28171.3804.camel@gandalf.stny.rr.com> On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote: > Now what if I want to launch ls and want to profile a function > inside. What can I do with a trace event. I can't create the > probe event based on a pid as I don't know it in advance. > I could give it the ls cmdline and it manages to activate > on the next ls launched. This is racy as another ls can > be launched concurrently. You make a wrapper script: #!/bin/sh $$ exec $* I do this all the time to limit the function tracer to a specific app. #!/bin/sh echo $$ > /debug/tracing/set_ftrace_pid echo function > /debug/tracing/current_tracer exec $* The exec will cause the ls to have the pid of $$. -- Steve > > So I can only say there that an ftrace plugin or an ftrace trace > event would be only a half-useful interface to exploit utrace > possibilities because it only lets us trace already running > apps. Moreover I bet the most chosen workflow to profile/trace > uprobes is by launching an app and profile it from the beginning, > not by profiling an already running one, which makes an ftrace > interface even less than half useful there. > > ftrace is cool to trace the kernel, but this kind of tricky > userspace tracing workflow is not adapted to it. > > What do you think? > From fweisbec at gmail.com Tue Jan 12 05:36:00 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Tue, 12 Jan 2010 06:36:00 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> Message-ID: <20100112053559.GL5243@nowhere> On Mon, Jan 11, 2010 at 05:55:53PM +0530, Srikar Dronamraju wrote: > +static const struct utrace_engine_ops uprobe_utrace_ops = { > + .report_quiesce = uprobe_report_quiesce, > + .report_signal = uprobe_report_signal, > + .report_exit = uprobe_report_exit, > + .report_clone = uprobe_report_clone, > + .report_exec = uprobe_report_exec > +}; So, as stated before, uprobe seems to handle too much standalone policies such as freeing on exec, always inherit on clone and never on fork. Such rules should be decided from uprobe clients not from uprobe itself and that makes it not enough flexible to be usable for now. For example if we want it to be usable by perf, we have two ways: - a trace event. Unfortunately, like I explained in a previous mail, this doesn't seem to be a suitable interface for this particular case. - a performance monitoring unit, with the existing unified interface struct pmu, usable by perf. Typically, to use it with perf toward a pmu, perf tools need to create a uprobe on perf process and activate its hook on the next exec. Thereafter, it's up to perf to decide if we inherit through clone and fork. Here I fear utrace and perf are going to collide. See how could be the final struct pmu (we need to extend it to support utrace): struct pmu { enable() -> called we schedule in a context where we want a uprobe to be active. Called very often disable() -> the above opposite /* Not yet existing callbacks */ hook_task() -> called when a process is created which we want to activate our hook would be typically called once on exec if we have set enable_on_exec and also on clone()/fork() if we want to inherit. } The above hook_task (could be divided in more precise callback events like hook_on_exec, hook_on_clone, etc...) would be needed by perf to drive correctly utrace and this is going to collide with utrace callbacks that notify execs and forks. Probably utrace can be kept for all the utrace breakpoint signal handling an so. But I guess the rest can be implemented on top of a struct pmu and driven by perf like we did with hardware breakpoints re-implementation. Just an idea. From fweisbec at gmail.com Tue Jan 12 05:44:23 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Tue, 12 Jan 2010 06:44:23 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263272933.28171.3804.camel@gandalf.stny.rr.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <1263272933.28171.3804.camel@gandalf.stny.rr.com> Message-ID: <20100112054422.GM5243@nowhere> On Tue, Jan 12, 2010 at 12:08:53AM -0500, Steven Rostedt wrote: > On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote: > > > Now what if I want to launch ls and want to profile a function > > inside. What can I do with a trace event. I can't create the > > probe event based on a pid as I don't know it in advance. > > I could give it the ls cmdline and it manages to activate > > on the next ls launched. This is racy as another ls can > > be launched concurrently. > > You make a wrapper script: > > #!/bin/sh > $$ > exec $* > > I do this all the time to limit the function tracer to a specific app. > > #!/bin/sh > echo $$ > /debug/tracing/set_ftrace_pid > echo function > /debug/tracing/current_tracer > exec $* > > > The exec will cause the ls to have the pid of $$. Sounds like a good idea. In this case we could indeed think about a trace event. It would typically have the benefit to have the same interface than kprobes. We can use it with perf, the only constraint is that we need to launch the record right after creating the trace event. Or we can pre-create them and set the pid of the target later when we launch perf record. And we need an enable on exec option in the probe definition. That's a nice option. From ananth at in.ibm.com Tue Jan 12 08:14:12 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 12 Jan 2010 13:44:12 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100112053559.GL5243@nowhere> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <20100112053559.GL5243@nowhere> Message-ID: <20100112081412.GD28425@in.ibm.com> On Tue, Jan 12, 2010 at 06:36:00AM +0100, Frederic Weisbecker wrote: > On Mon, Jan 11, 2010 at 05:55:53PM +0530, Srikar Dronamraju wrote: > > +static const struct utrace_engine_ops uprobe_utrace_ops = { > > + .report_quiesce = uprobe_report_quiesce, > > + .report_signal = uprobe_report_signal, > > + .report_exit = uprobe_report_exit, > > + .report_clone = uprobe_report_clone, > > + .report_exec = uprobe_report_exec > > +}; > > > So, as stated before, uprobe seems to handle too much standalone > policies such as freeing on exec, always inherit on clone and never > on fork. Such rules should be decided from uprobe clients not > from uprobe itself and that makes it not enough flexible to > be usable for now. The freeing on exec is only housekeeping of uprobe data structures. And probepoints are inherited only on CLONE_THREAD and not otherwise, simply since the existing probes can be hit in the new thread's context. Not sure what other policy you are hinting at. > For example if we want it to be usable by perf, we have two ways: > > - a trace event. Unfortunately, like I explained in a previous > mail, this doesn't seem to be a suitable interface for this > particular case. > > - a performance monitoring unit, with the existing unified interface > struct pmu, usable by perf. > > > Typically, to use it with perf toward a pmu, perf tools need to > create a uprobe on perf process and activate its hook on the next exec. > Thereafter, it's up to perf to decide if we inherit through clone > and fork. As mentioned above, the inheritance is only for threads. It should be fairly easy to inherit probes on fork, and that can be made a perf based policy decision. > Here I fear utrace and perf are going to collide. Utrace does not mandate any of the above concerns you've mentioned. Utrace just provides callbacks at the said events and uprobes can be tweaked to accommodate perf's requirements as possible, as feasible. > See how could be the final struct pmu (we need to extend it > to support utrace): > > struct pmu { > enable() -> called we schedule in a context where we want > a uprobe to be active. Called very often > disable() -> the above opposite > > /* Not yet existing callbacks */ > > hook_task() -> called when a process is created which > we want to activate our hook > would be typically called once on > exec if we have set enable_on_exec > and also on clone()/fork() > if we want to inherit. > } > > > The above hook_task (could be divided in more precise callback events > like hook_on_exec, hook_on_clone, etc...) would be needed by perf > to drive correctly utrace and this is going to collide with utrace > callbacks that notify execs and forks. > > Probably utrace can be kept for all the utrace breakpoint signal > handling an so. But I guess the rest can be implemented on top > of a struct pmu and driven by perf like we did with hardware > breakpoints re-implementation. > > Just an idea. Well, I wonder if perf can ride on utrace's callbacks for the hook_task() for the clone/fork cases? Ananth From srikar at linux.vnet.ibm.com Tue Jan 12 08:21:28 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Tue, 12 Jan 2010 13:51:28 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100112020155.GC10869@linux.vnet.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <20100112020155.GC10869@linux.vnet.ibm.com> Message-ID: <20100112082128.GA22420@linux.vnet.ibm.com> Hi Paul, > > + > > +/* > > + * Allocate a uprobe_task object for p and add it to uproc's list. > > + * Called with p "got" and uproc->rwsem write-locked. Called in one of > > + * the following cases: > > + * - before setting the first uprobe in p's process > > + * - we're in uprobe_report_clone() and p is the newly added thread > > + * Returns: > > + * - pointer to new uprobe_task on success > > + * - NULL if t dies before we can utrace_attach it > > + * - negative errno otherwise > > + */ > > +static struct uprobe_task *uprobe_add_task(struct pid *p, > > + struct uprobe_process *uproc) > > +{ > > + struct uprobe_task *utask; > > + struct utrace_engine *engine; > > + struct task_struct *t = pid_task(p, PIDTYPE_PID); > > What keeps the task_struct referenced by "t" from disappearing at this > point? We have a ref-counted pid which is used for creation of the utrace engine. If the task_struct disappears then utrace would refuse to create an engine and we wouldnt proceed further. We only use the task struct and pid only when we have a successful utrace engine. Once utrace engine is created,utrace guarantees us that the task will remain till Uprobes is notified of the death/exit. > > > + > > + if (!t) > > + return NULL; > > + utask = kzalloc(sizeof *utask, GFP_USER); > > + if (unlikely(utask == NULL)) > > + return ERR_PTR(-ENOMEM); > > + > > + utask->pid = p; > > + utask->tsk = t; > > + utask->state = UPTASK_RUNNING; > > + utask->quiescing = false; > > + utask->uproc = uproc; > > + utask->active_probe = NULL; > > + utask->doomed = false; > > + INIT_LIST_HEAD(&utask->deferred_registrations); > > + INIT_LIST_HEAD(&utask->delayed_signals); > > + INIT_LIST_HEAD(&utask->list); > > + list_add_tail(&utask->list, &uproc->thread_list); > > + uprobe_hash_utask(utask); > > + > > + engine = utrace_attach_pid(p, UTRACE_ATTACH_CREATE, > > + p_uprobe_utrace_ops, utask); > > + if (IS_ERR(engine)) { > > + long err = PTR_ERR(engine); > > + printk("uprobes: utrace_attach_task failed, returned %ld\n", > > + err); > > + uprobe_free_task(utask, 0); > > + if (err == -ESRCH) > > + return NULL; > > + return ERR_PTR(err); > > + } > > + goto dont_add; > > + list_for_each_entry(utask, &uproc->thread_list, list) { > > Doesn't this need to be list_for_each_entry_rcu()? > > Or do you have ->thread_list protected elsewise? thread_list is protected by write lock for uproc->rwsem. > > > + if (utask->tsk == t) > > + /* Already added */ > > + goto dont_add; > > + } > > + /* Found thread/task to add. */ > > + pid = get_pid(task_pid(t)); > > + break; > > +dont_add: > > + t = next_thread(t); > > + } while (t != start); > > + } > > + rcu_read_unlock(); > > Now that we are outside of rcu_read_lock()'s protection, the task > indicated by "pid" might disappear, and the value of "pid" might well > be reused. Is this really OK? We have a ref-counted pid; so pid should ideally not disappear. And as I said earlier, once utrace engine gets created, we are sure that the task struct lies till the engine gets detached. If an engine is not created, we dont use the task struct or the pid. We piggyback on the guarantee that utrace provides. > > > + return pid; > > +} > > + > > +/* > > + * Given a numeric thread ID, return a ref-counted struct pid for the > > + * task-group-leader thread. > > + */ > > +static struct pid *uprobe_get_tg_leader(pid_t p) > > +{ > > + struct pid *pid = NULL; > > + > > + rcu_read_lock(); > > + if (current->nsproxy) > > + pid = find_vpid(p); > > + if (pid) { > > + struct task_struct *t = pid_task(pid, PIDTYPE_PID); > > + if (t) > > + pid = task_tgid(t); > > + else > > + pid = NULL; > > + } > > + rcu_read_unlock(); > > What happens if the thread disappears at this point? We are outside of > rcu_read_lock() protection, so all the structures could potentially be > freed up by other CPUs, especially if this CPU takes an interrupt or is > preempted. > > > + return get_pid(pid); /* null pid OK here */ > > +} Same as above ; > > +/* > > + * Signal callback: > > + */ > > +static u32 uprobe_report_signal(u32 action, > > + struct utrace_engine *engine, > > + struct pt_regs *regs, > > + siginfo_t *info, > > + const struct k_sigaction *orig_ka, > > + struct k_sigaction *return_ka) > > +{ > > + struct uprobe_task *utask; > > + struct uprobe_process *uproc; > > + bool doomed; > > + enum utrace_resume_action report_action; > > + > > + utask = (struct uprobe_task *)rcu_dereference(engine->data); > > Are we really in an RCU read-side critical section here? Yeah we dont need the rcu_deference here. > > > +static u32 uprobe_report_quiesce(u32 action, > > + struct utrace_engine *engine, > > + unsigned long event) > > +{ > > + struct uprobe_task *utask; > > + struct uprobe_process *uproc; > > + bool done_quiescing = false; > > + > > + utask = (struct uprobe_task *)rcu_dereference(engine->data); > > Are we really in an RCU read-side critical section here? Yeah we dont need the rcu_deference here also. > > > + > > +static u32 uprobe_exec_exit(struct utrace_engine *engine, > > + struct task_struct *tsk, int exit) > > +{ > > + struct uprobe_process *uproc; > > + struct uprobe_probept *ppt; > > + struct uprobe_task *utask; > > + bool utask_quiescing; > > + > > + utask = (struct uprobe_task *)rcu_dereference(engine->data); > > Are we really in an RCU read-side critical section here? Yeah we dont need the rcu_deference here also. > > > + * - Provide option for child to inherit uprobes. > > + */ > > +static u32 uprobe_report_clone(u32 action, > > + struct utrace_engine *engine, > > + unsigned long clone_flags, > > + struct task_struct *child) > > +{ > > + struct uprobe_process *uproc; > > + struct uprobe_task *ptask, *ctask; > > + > > + ptask = (struct uprobe_task *)rcu_dereference(engine->data); > > Are we really in an RCU read-side critical section here? Yeah we dont need the rcu_deference here also. -- Thanks and Regards Srikar From srikar at linux.vnet.ibm.com Tue Jan 12 08:54:17 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Tue, 12 Jan 2010 14:24:17 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100112053559.GL5243@nowhere> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <20100112053559.GL5243@nowhere> Message-ID: <20100112085417.GA17299@linux.vnet.ibm.com> Hi Frederic, > > > So, as stated before, uprobe seems to handle too much standalone > policies such as freeing on exec, always inherit on clone and never > on fork. Such rules should be decided from uprobe clients not > from uprobe itself and that makes it not enough flexible to > be usable for now. Lets say we were tracing process A and had inserted few breakpoints. If this process were to peform an exec, we would be loading a new process image. The old breakpoints are actually detrimental now and hence all the breakpoints that we had installed would have to be removed anyway. And new breakpoints have to be installed at different locations. If this process were to fork then we would have to create all the per process uprobes book-keeping including one page per process instruction store. In most cases fork would be followed by exec. which would mean we would have to trash the breakpoints that we inherited. Tracing a newly exec-ed process or a forked process is similar to starting a new uprobes session. Also uprobes would allow more than one kernel module/plugin to trace the same process. i.e for the same process at the same breakpoint one client may want a follow-on-fork, or follow-on-exec, the other one may not be wanting it. But I understand your requirements for tracing a session rather than just a process. And thats where the utrace based task-finder or something similar finds its application. So this layer(task-finder) would be able to tell uprobes to start tracing an process based on certain criteria. Since uprobes uses breakpoint instruction, all threads of a process which is being traced would take an exception when passing thro a breakpoint. Hence we have to always inherit on clone. If a client wants to trace only certain threads of a process, then he could filter them in the uprobe trace handler. I feel the current uprobes + task finder would be much more flexible. perf could probably use this combination. Also this approach would reduce un-necessary creation of uprobes book-keeping for process where we may never place probes. > > For example if we want it to be usable by perf, we have two ways: > > - a trace event. Unfortunately, like I explained in a previous > mail, this doesn't seem to be a suitable interface for this > particular case. > > - a performance monitoring unit, with the existing unified interface > struct pmu, usable by perf. > > > Typically, to use it with perf toward a pmu, perf tools need to > create a uprobe on perf process and activate its hook on the next exec. > Thereafter, it's up to perf to decide if we inherit through clone > and fork. > > Here I fear utrace and perf are going to collide. I am not sure why utrace and perf would collide. I think utrace is a layer below uprobes so perf could use utrace directly(if it implements the task-finder logic) or use utrace thro uprobes. > > See how could be the final struct pmu (we need to extend it > to support utrace): > > struct pmu { > enable() -> called we schedule in a context where we want > a uprobe to be active. Called very often > disable() -> the above opposite > > /* Not yet existing callbacks */ > > hook_task() -> called when a process is created which > we want to activate our hook > would be typically called once on > exec if we have set enable_on_exec > and also on clone()/fork() > if we want to inherit. > } > > > The above hook_task (could be divided in more precise callback events > like hook_on_exec, hook_on_clone, etc...) would be needed by perf > to drive correctly utrace and this is going to collide with utrace > callbacks that notify execs and forks. As pointed by Ananth, this hook on exec, hook on fork is exactly what the taskfinder/perf would provide using the utrace api's. If there is any reason why utrace and perf could collide then can you please put in more details. In such a case, Roland and others may have more ideas on how to work around these issues. Please let me know your thoughts. -- Thanks and Regards Srikar From caiqian at redhat.com Tue Jan 12 10:03:42 2010 From: caiqian at redhat.com (caiqian at redhat.com) Date: Tue, 12 Jan 2010 05:03:42 -0500 (EST) Subject: [PATCH] link detach_sigkill_race with -Wl,-z,now In-Reply-To: <291715653.143791263290486753.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <676626112.143821263290622331.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> detach_sigkill_race also need to link with -Wl,-z,now and then minimize the library code you rely on. Otherwise, it will fail due to the reason mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=542731 . Signed-off-by: CAI Qian -------------- next part -------------- A non-text attachment was scrubbed... Name: detach_sigkill_race.patch Type: text/x-patch Size: 410 bytes Desc: not available URL: From jan.kratochvil at redhat.com Tue Jan 12 11:54:00 2010 From: jan.kratochvil at redhat.com (Jan Kratochvil) Date: Tue, 12 Jan 2010 12:54:00 +0100 Subject: [PATCH] link detach_sigkill_race with -Wl,-z,now In-Reply-To: <676626112.143821263290622331.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> References: <291715653.143791263290486753.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <676626112.143821263290622331.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <20100112115400.GA32115@host0.dyn.jankratochvil.net> On Tue, 12 Jan 2010 11:03:42 +0100, caiqian at redhat.com wrote: > detach_sigkill_race also need to link with -Wl,-z,now and then minimize the > library code you rely on. Otherwise, it will fail due to the reason > mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=542731 . Good catch, used it also for some other clone()-calling tests. Thanks, Jan --- tests/Makefile.am 14 Dec 2009 09:47:54 -0000 1.61 +++ tests/Makefile.am 12 Jan 2010 11:52:55 -0000 1.62 @@ -112,7 +112,11 @@ erestartsys_trap_LDFLAGS = -lutil erestartsys_trap_debugger_LDFLAGS = -lutil erestartsys_trap_32fails_debugger_LDFLAGS = -lutil # After clone syscall it must call no glibc code (such as _dl_runtime_resolve). +clone_get_signal_LDFLAGS = -Wl,-z,now clone_multi_ptrace_LDFLAGS = -Wl,-z,now +clone_ptrace_LDFLAGS = -Wl,-z,now +detach_sigkill_race_LDFLAGS = -Wl,-z,now +ptrace_event_clone_LDFLAGS = -Wl,-z,now check_TESTS = $(SAFE) xcheck_TESTS = $(CRASHERS) From envoi at bdop89.info Tue Jan 12 14:48:27 2010 From: envoi at bdop89.info (Regus par SoftDirect) Date: Tue, 12 Jan 2010 16:48:27 +0200 Subject: Les secrets des managers efficaces Message-ID: <25a28d071b41cde3149d1f114242de78@om3.market-products.com> An HTML attachment was scrubbed... URL: From thismailboxnotchecked at gmail.com Tue Jan 12 16:21:51 2010 From: thismailboxnotchecked at gmail.com (Kathrine creedal) Date: Tue, 12 Jan 2010 10:21:51 -0600 Subject: dentist data Message-ID: <201001121652.o0CGqCwB025606@mx1.redhat.com> There are package deals available for our American Doctor Marketing Lists. Discounts are up to 50%. Please reply to Roosevelt: Ricky at choicemedicaldata.net for the details. Send a blank email to rembox at choicemedicaldata.net to be erased from this campaign From fche at redhat.com Tue Jan 12 18:54:01 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 12 Jan 2010 13:54:01 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100112045454.GJ5243@nowhere> (Frederic Weisbecker's message of "Tue, 12 Jan 2010 05:54:56 +0100") References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> Message-ID: Frederic Weisbecker writes: > [...] > This is much more tricky in the case of uprobes as I see two > ways to work with it: > - probing on an already running process > - probing on a process we are about to run > [...] As you might expect, in systemtap we've had to figure out this area some time ago. We use another utrace consumer called "task finder" that registers interest in present / future processes, and gives us kernel-space callbacks when these come and go. You could merge it or something like it. - FChE From tim.bird at am.sony.com Tue Jan 12 19:12:18 2010 From: tim.bird at am.sony.com (Tim Bird) Date: Tue, 12 Jan 2010 11:12:18 -0800 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263272933.28171.3804.camel@gandalf.stny.rr.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <1263272933.28171.3804.camel@gandalf.stny.rr.com> Message-ID: <4B4CC992.3010707@am.sony.com> Steven Rostedt wrote: > I do this all the time to limit the function tracer to a specific app. > > #!/bin/sh > echo $$ > /debug/tracing/set_ftrace_pid > echo function > /debug/tracing/current_tracer > exec $* This seems pretty handy. You should put this in scripts/tracing. :-) Do you have any other helper scripts you use commonly? -- Tim ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Corporation of America ============================= From mhiramat at redhat.com Tue Jan 12 22:00:20 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Tue, 12 Jan 2010 17:00:20 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> Message-ID: <4B4CF0F4.3020706@redhat.com> Frank Ch. Eigler wrote: > > Frederic Weisbecker writes: > >> [...] >> This is much more tricky in the case of uprobes as I see two >> ways to work with it: >> - probing on an already running process >> - probing on a process we are about to run >> [...] > > As you might expect, in systemtap we've had to figure out this area > some time ago. We use another utrace consumer called "task finder" > that registers interest in present / future processes, and gives us > kernel-space callbacks when these come and go. You could merge it or > something like it. So, could you tell us how the task-finder works and is implemented? I think we'd better clarify what functions are required for uprobes and pmu, and I think we may be able to re-implement improved pmu on utrace. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From fche at redhat.com Tue Jan 12 22:15:54 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 12 Jan 2010 17:15:54 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <4B4CF0F4.3020706@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <4B4CF0F4.3020706@redhat.com> Message-ID: <20100112221554.GI4822@redhat.com> Hi - > > As you might expect, in systemtap we've had to figure out this area > > some time ago. We use another utrace consumer called "task finder" [...] > > So, could you tell us how the task-finder works and is implemented? The code may be found at runtime/task_finder* in the systemtap sources. There is a simple interest-registration structure/API that identifies processes / shared libraries of interest, and a set of callbacks to be invoked when said processes/shared libraries are mapped or unmapped. It is implemented in terms of utrace callbacks for process/thread lifetime monitoring, and utrace syscall callbacks for tracking shared library segments being mapped and unmapped. http://sourceware.org/git/?p=systemtap.git;a=tree;f=runtime > I think we'd better clarify what functions are required for uprobes > and pmu, and I think we may be able to re-implement improved pmu on > utrace. I don't see any collision between pmu / perf / utrace, so nothing is really "required" for them or simple usage of uprobes. If you wish to track dynamic process/shared-library lifetimes, then you need extra code somewhere to respond to those changes. Layering this dynamic capability seems like the natural way to go, and is easily done with utrace and/or tracepoints. - FChE From mhiramat at redhat.com Tue Jan 12 22:30:27 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Tue, 12 Jan 2010 17:30:27 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100112221554.GI4822@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <4B4CF0F4.3020706@redhat.com> <20100112221554.GI4822@redhat.com> Message-ID: <4B4CF803.1020602@redhat.com> Frank Ch. Eigler wrote: > Hi - > >>> As you might expect, in systemtap we've had to figure out this area >>> some time ago. We use another utrace consumer called "task finder" [...] >> >> So, could you tell us how the task-finder works and is implemented? > > The code may be found at runtime/task_finder* in the systemtap sources. > There is a simple interest-registration structure/API that identifies > processes / shared libraries of interest, and a set of callbacks to be > invoked when said processes/shared libraries are mapped or unmapped. > > It is implemented in terms of utrace callbacks for process/thread > lifetime monitoring, and utrace syscall callbacks for tracking shared > library segments being mapped and unmapped. > > http://sourceware.org/git/?p=systemtap.git;a=tree;f=runtime Nice! so we can set a probe by the relative address in the library segments. >> I think we'd better clarify what functions are required for uprobes >> and pmu, and I think we may be able to re-implement improved pmu on >> utrace. > > I don't see any collision between pmu / perf / utrace, so nothing is > really "required" for them or simple usage of uprobes. If you wish to > track dynamic process/shared-library lifetimes, then you need extra > code somewhere to respond to those changes. And that code we can find in runtime/task_finder*, right? :-) > Layering this dynamic > capability seems like the natural way to go, and is easily done with > utrace and/or tracepoints. Sure, and I think that will allow us to use uprobe events as trace-events, because we can set probes before executing programs. :-) Thank you! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From jkenisto at us.ibm.com Wed Jan 13 00:53:48 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Tue, 12 Jan 2010 16:53:48 -0800 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100112081412.GD28425@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <20100112053559.GL5243@nowhere> <20100112081412.GD28425@in.ibm.com> Message-ID: <1263344028.4983.35.camel@localhost.localdomain> On Tue, 2010-01-12 at 13:44 +0530, Ananth N Mavinakayanahalli wrote: > On Tue, Jan 12, 2010 at 06:36:00AM +0100, Frederic Weisbecker wrote: ... > > So, as stated before, uprobe seems to handle too much standalone > > policies such as freeing on exec, always inherit on clone and never > > on fork. Such rules should be decided from uprobe clients not > > from uprobe itself and that makes it not enough flexible to > > be usable for now. > > The freeing on exec is only housekeeping of uprobe data structures. And > probepoints are inherited only on CLONE_THREAD and not otherwise, simply > since the existing probes can be hit in the new thread's context. Not > sure what other policy you are hinting at. > ... > > > > > > Typically, to use it with perf toward a pmu, perf tools need to > > create a uprobe on perf process and activate its hook on the next exec. > > Thereafter, it's up to perf to decide if we inherit through clone > > and fork. > > As mentioned above, the inheritance is only for threads. It should be > fairly easy to inherit probes on fork, and that can be made a perf based > policy decision. > One reason we don't currently support inheritance (or cloning) of uprobes across fork is that a uprobe object is (a) per-process (and I think we want to keep it that way); and (b) owned by the uprobes client. That is, the client creates and populates that uprobe object, and passes a pointer to it to both register_uprobe() and unregister_uprobe(). We could clone this object on fork, but then how would the client refer to the cloned uprobes in the new process -- e.g., to unregister them? I guess each cloned uprobe could remember its "patriarch" uprobe -- its ultimate ancestor, the one created by the client; and we could add an "unregister_uprobe_clone" function that takes both the address of the patriarch uprobe and the pid of the (clone) uprobe to be unregistered. It has also been suggested that it might be more user-friendly to let the client discard (or reuse) the uprobe object as soon as register_uprobe() returns. register_uprobe() would presumably copy everything it needs from the uprobe to the uprobe_kimg, and pass back a handle (e.g., the address of the uprobe_kimg) that the client can later pass to unregister_uprobe() -- or unregister_uprobe_clone(). (In this case, only the uprobe_kimg would be cloned.) It might be good to consider both these enhancement requests together. Anyway, as previously described, the clone-on-fork feature can be (and has been) implemented by a utrace-based task-finder that notices forks, and creates and registers a whole new set of uprobes for the new process. Jim From tertullian at pukkemuk.nl Wed Jan 13 02:49:24 2010 From: tertullian at pukkemuk.nl (Ragus) Date: Wed, 13 Jan 2010 00:49:24 -0200 Subject: _I_ don't want to go Message-ID: <4B4D3379.8080102@pukkemuk.nl> On, all had passed away out of her life like a dream and shadows. The other one too, most horribly. What if Dick were taken from her as well? This haunting trouble had been with her a long time; up to a few months ago it had been mainly personal and selfish--the dread of being left alone. But lately it had altered and become more acute. Dick had changed in her eyes, and the fear was now for him. Her own personality had suddenly and strangely become merged in his. The idea of life without him was unthinkable, yet the trouble remained, a menace in the blue. Some days it would be worse than others. To-day, for instance, it was worse than yesterday, as though some danger had crept close to them during the night. Yet the sky and sea were stainless, the sun shone on tree and flower, the west wind brought the tune of the far-away reef like a lullaby. There was nothing to hint of danger or the need of distrust. At last Dick finished his spear and rose to his feet. "Where are you going?" asked Emmeline. "The reef," he replied. "The tide's going out." "I'll go with you," said she. He went into the house and stowed the precious knife away. Then he came out, spear in one hand, and half a fathom of liana in the other. The liana was for the purpose of stringing the fish on, should the catch be large. He led the way down the grassy sward to the lagoon where the dinghy lay, close up to the bank, and moored to a post driven into the soft soil. Emmeline got in, and, taking the sculls, he pushed off. The tide was going out. I have said that the reef just here lay a great way out from the shore. The lagoon was so shallow that at low tide one could have waded almost right across it, were it not for pot-holes here and there--ten-feet traps--and great beds of rotten coral, into which one would sink as into brushwood, to say nothing of the nettle coral that stings like a -------------- next part -------------- A non-text attachment was scrubbed... Name: vilifies.jpg Type: image/jpeg Size: 11806 bytes Desc: not available URL: From envoi at email-packs.com Wed Jan 13 10:00:47 2010 From: envoi at email-packs.com (Email Packs) Date: Wed, 13 Jan 2010 12:00:47 +0200 Subject: =?UTF-8?Q?Nouveau_:_250_000_Emails_Qualifies_pour_199_=E2=82=AC_HT?= Message-ID: <9b8f1bac29199f217feb87039faf9851@om5.market-products.com> An HTML attachment was scrubbed... URL: From mhiramat at redhat.com Wed Jan 13 21:58:23 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Wed, 13 Jan 2010 16:58:23 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263272933.28171.3804.camel@gandalf.stny.rr.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <1263272933.28171.3804.camel@gandalf.stny.rr.com> Message-ID: <4B4E41FF.4020800@redhat.com> Steven Rostedt wrote: > On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote: > >> Now what if I want to launch ls and want to profile a function >> inside. What can I do with a trace event. I can't create the >> probe event based on a pid as I don't know it in advance. >> I could give it the ls cmdline and it manages to activate >> on the next ls launched. This is racy as another ls can >> be launched concurrently. > > You make a wrapper script: > > #!/bin/sh > $$ > exec $* > > I do this all the time to limit the function tracer to a specific app. > > #!/bin/sh > echo $$ > /debug/tracing/set_ftrace_pid > echo function > /debug/tracing/current_tracer > exec $* I recommend you to add below line at the end of the script, from my experience. :) echo nop > /debug/tracing/current_tracer Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From mhiramat at redhat.com Wed Jan 13 22:12:33 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Wed, 13 Jan 2010 17:12:33 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <4B4E41FF.4020800@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <1263272933.28171.3804.camel@gandalf.stny.rr.com> <4B4E41FF.4020800@redhat.com> Message-ID: <4B4E4551.2010302@redhat.com> Masami Hiramatsu wrote: > Steven Rostedt wrote: >> On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote: >> >>> Now what if I want to launch ls and want to profile a function >>> inside. What can I do with a trace event. I can't create the >>> probe event based on a pid as I don't know it in advance. >>> I could give it the ls cmdline and it manages to activate >>> on the next ls launched. This is racy as another ls can >>> be launched concurrently. >> >> You make a wrapper script: >> >> #!/bin/sh >> $$ >> exec $* >> >> I do this all the time to limit the function tracer to a specific app. >> >> #!/bin/sh >> echo $$ > /debug/tracing/set_ftrace_pid >> echo function > /debug/tracing/current_tracer >> exec $* > > I recommend you to add below line at the end of the script, > from my experience. :) > > echo nop > /debug/tracing/current_tracer Oops, my bad, it doesn't work after exec... But, it is very important to disable function tracer after tracing target process. So, perhaps, below script may work. #!/bin/sh (echo $BASHPID > /debug/tracing/set_ftrace_pid echo function > /debug/tracing/current_tracer exec $*) echo nop > /debug/tracing/current_tracer Thanks, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From rostedt at goodmis.org Wed Jan 13 23:36:27 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 13 Jan 2010 18:36:27 -0500 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <4B4E4551.2010302@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <20100112045454.GJ5243@nowhere> <1263272933.28171.3804.camel@gandalf.stny.rr.com> <4B4E41FF.4020800@redhat.com> <4B4E4551.2010302@redhat.com> Message-ID: <1263425787.28171.3830.camel@gandalf.stny.rr.com> On Wed, 2010-01-13 at 17:12 -0500, Masami Hiramatsu wrote: > Masami Hiramatsu wrote: > > Steven Rostedt wrote: > >> On Tue, 2010-01-12 at 05:54 +0100, Frederic Weisbecker wrote: > >> > >>> Now what if I want to launch ls and want to profile a function > >>> inside. What can I do with a trace event. I can't create the > >>> probe event based on a pid as I don't know it in advance. > >>> I could give it the ls cmdline and it manages to activate > >>> on the next ls launched. This is racy as another ls can > >>> be launched concurrently. > >> > >> You make a wrapper script: > >> > >> #!/bin/sh > >> $$ > >> exec $* > >> > >> I do this all the time to limit the function tracer to a specific app. > >> > >> #!/bin/sh > >> echo $$ > /debug/tracing/set_ftrace_pid > >> echo function > /debug/tracing/current_tracer > >> exec $* > > > > I recommend you to add below line at the end of the script, > > from my experience. :) > > > > echo nop > /debug/tracing/current_tracer > > Oops, my bad, it doesn't work after exec... > But, it is very important to disable function tracer after > tracing target process. > > So, perhaps, below script may work. > > #!/bin/sh > (echo $BASHPID > /debug/tracing/set_ftrace_pid > echo function > /debug/tracing/current_tracer > exec $*) > echo nop > /debug/tracing/current_tracer Unfortunately, that would lose the entire trace you just recorded. So perhaps adding: trace-cmd extract echo nop > /debug/tracing/current_tracer would work better. The extract feature of trace-cmd pulls the data from the kernel buffer and saves it in a file format. -- Steve From tyf at perb.com Thu Jan 14 01:58:10 2010 From: tyf at perb.com (=?GB2312?B?xeDRtQ==?=) Date: Thu, 14 Jan 2010 01:58:10 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVsvKjQp7+8usv030tQSStCU0PKtdW9?= Message-ID: <201001140157.o0E1vYY2017788@mx1.redhat.com> utrace-devel?????KPI+BSC?? ?????2010?1?15-16? ?? ?????2010?1?22-23? ?? ??????????????????????????????????????? ?????2600?/?(?????????????????) ???????600?/?;??800?/?(??????????????????????????????) ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comkpi???????????????????? ???????????????KPI???????? 1?????Kprom emphasis at nicevilla.com Thu Jan 14 05:21:05 2010 From: emphasis at nicevilla.com (Khalaf) Date: Thu, 14 Jan 2010 06:21:05 +0100 Subject: rfect and so enshrined Message-ID: <4B4EA72F.5080806@nicevilla.com> Ences; to whom our possessions, our houses, lands, goods, money, are such substantial things;--it cannot be that we are not fixed permanently here,--that the years like a swift river, sweep us nearer and nearer to a point where we must sink and leave it all,--that the corridors of the earth echo our footsteps only as the footsteps of a successive march-myriads going before, and myriads coming after us-and soon they will catch no more murmurs of our individual life; for that will be as "a tale that is told." The whole train of thought I am now pursuing strikes us with peculiar force, in reading the biographies of men who have lived intensely, who have realized the fulness of life, who have mingled intimately with its varied experiences, and occupied a large place in it. We see how to them life was, as it is to us, an absorbing fact,--how they have planned, and thought, and acted, as though they were to live forever; and yet we have noticed the premonitions of change, the dropping away of friends, the failing of vigor, the deepening of melancholy shadows, and the coming of the end; the business closed, the active curiosity and intermeddling ceased, the familiar haunts abandoned, the home made desolate, the lights put out, the cup fallen beneath the festal board, and all the earnest existence stopped forever. And this, too, so quick,--filling so small a space in absolute time! From their illustration let us, then, realize that our life, too, amid all these real conditions, is unfolding rapidly to an end, and is "as a tale that is told." But life is like a tale that is told, because of its comprehensiveness. It is a common characteristic of a narrative that it contains a great deal in a small compass. It includes many years, and exp -------------- next part -------------- A non-text attachment was scrubbed... Name: ragamuffin.jpg Type: image/jpeg Size: 15565 bytes Desc: not available URL: From peterz at infradead.org Thu Jan 14 11:08:09 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:08:09 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> Message-ID: <1263467289.4244.288.camel@laptop> On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > User Space Breakpoint Assistance Layer (UBP) > > User space breakpointing Infrastructure provides kernel subsystems > with architecture independent interface to establish breakpoints in > user applications. This patch provides core implementation of ubp and > also wrappers for architecture dependent methods. So if this is the basic infrastructure to set userspace breakpoints, then why not call this uprobe? > UBP currently supports both single stepping inline and execution out > of line strategies. Two different probepoints in the same process can > have two different strategies. maybe explain wth these are? From peterz at infradead.org Thu Jan 14 11:08:38 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:08:38 +0100 Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) In-Reply-To: <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> Message-ID: <1263467318.4244.289.camel@laptop> On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > Execution out of line (XOL) > > Slot allocation mechanism for Execution Out of Line strategy in User > space breakpointing Inftrastructure. (XOL) > > This patch provides slot allocation mechanism for execution out of > line strategy for use with user space breakpoint infrastructure. > This patch requires utrace support in kernel. > > This patch provides five functions xol_get_insn_slot(), > xol_free_insn_slot(), xol_put_area(), xol_get_area() and > xol_validate_vaddr(). > > Current slot allocation mechanism: > 1. Allocate one dedicated slot per user breakpoint. > 2. If the allocated vma is completely used, expand current vma. > 3. If we cant expand the vma, allocate a new vma. Say what? I see the text, but non of it makes any sense at all. From peterz at infradead.org Thu Jan 14 11:09:54 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:09:54 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> Message-ID: <1263467394.4244.291.camel@laptop> On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > > Uprobes Infrastructure enables user to dynamically establish > probepoints in user applications and collect information by executing > a handler functions when the probepoints are hit. > Please refer Documentation/uprobes.txt for more details. > > This patch provides the core implementation of uprobes. > This patch builds on utrace infrastructure. > > You need to follow this up with the uprobes patch for your > architecture. So all this is basically some glue between what you call ubp (the real userspace breakpoint stuff) and utrace? Or does it do more? From peterz at infradead.org Thu Jan 14 11:12:30 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:12:30 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100112081412.GD28425@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <20100112053559.GL5243@nowhere> <20100112081412.GD28425@in.ibm.com> Message-ID: <1263467550.4244.293.camel@laptop> On Tue, 2010-01-12 at 13:44 +0530, Ananth N Mavinakayanahalli wrote: > > Well, I wonder if perf can ride on utrace's callbacks for the > hook_task() for the clone/fork cases? Well it could, but using all of utrace to simply get simple callbacks from copy_process() is just daft so we're not going to do that. From peterz at infradead.org Thu Jan 14 11:13:51 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:13:51 +0100 Subject: [RFC] [PATCH 5/7] X86 Support for Uprobes In-Reply-To: <20100111122558.22050.431.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122558.22050.431.sendpatchset@srikar.in.ibm.com> Message-ID: <1263467631.4244.294.camel@laptop> On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > [PATCH] x86 support for Uprobes So uhm,.. HAVE_UPROBE is basically HAVE_UBP? > Signed-off-by: Jim Keniston > --- > arch/x86/Kconfig | 1 + > arch/x86/include/asm/uprobes.h | 27 +++++++++++++++++++++++++++ > 2 files changed, 28 insertions(+) > > Index: new_uprobes.git/arch/x86/Kconfig > =================================================================== > --- new_uprobes.git.orig/arch/x86/Kconfig > +++ new_uprobes.git/arch/x86/Kconfig > @@ -51,6 +51,7 @@ config X86 > select HAVE_KERNEL_LZMA > select HAVE_HW_BREAKPOINT > select HAVE_UBP > + select HAVE_UPROBES > select HAVE_ARCH_KMEMCHECK > select HAVE_USER_RETURN_NOTIFIER > > Index: new_uprobes.git/arch/x86/include/asm/uprobes.h > =================================================================== > --- /dev/null > +++ new_uprobes.git/arch/x86/include/asm/uprobes.h > @@ -0,0 +1,27 @@ > +#ifndef _ASM_UPROBES_H > +#define _ASM_UPROBES_H > +/* > + * Userspace Probes (UProbes) > + * uprobes.h > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > + * > + * Copyright (C) IBM Corporation, 2008, 2009 > + */ > +#include > + > +#define BREAKPOINT_SIGNAL SIGTRAP > +#define SSTEP_SIGNAL SIGTRAP > +#endif /* _ASM_UPROBES_H */ From peterz at infradead.org Thu Jan 14 11:23:11 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:23:11 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> Message-ID: <1263468191.4244.300.camel@laptop> On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > This patch implements ftrace plugin for uprobes. Right, like others have said, trace events is a much saner interface. So the easiest way I can see that working is to register uprobes against a file (not a pid). Then on creation it uses rmap to find all current maps of that file and install the probe if there is a consumer for that map. Then for each new mmap() of that file, we also need to check if there's a consumer ready and install the probe. The existence of the uprobe trace-event would keep a ref on the dentry/inode, ensuring it remains around or something. Consumers could be some utrace thing (currently called uprobe -- which is a misnomer imo), or perf, or ftrace like. From peterz at infradead.org Thu Jan 14 11:29:35 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:29:35 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263468191.4244.300.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> Message-ID: <1263468575.4244.306.camel@laptop> On Thu, 2010-01-14 at 12:23 +0100, Peter Zijlstra wrote: > On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > > This patch implements ftrace plugin for uprobes. > > Right, like others have said, trace events is a much saner interface. > > So the easiest way I can see that working is to register uprobes against > a file (not a pid). Just to clarify, this means you can do things like: p:uprobe_event dso:symbol[+offs] Irrespective of whether there are any current user of that file. From fweisbec at gmail.com Thu Jan 14 11:35:12 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Thu, 14 Jan 2010 12:35:12 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263468191.4244.300.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> Message-ID: <20100114113509.GB5033@nowhere> On Thu, Jan 14, 2010 at 12:23:11PM +0100, Peter Zijlstra wrote: > On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > > This patch implements ftrace plugin for uprobes. > > Right, like others have said, trace events is a much saner interface. > > So the easiest way I can see that working is to register uprobes against > a file (not a pid). Then on creation it uses rmap to find all current > maps of that file and install the probe if there is a consumer for that > map. > > Then for each new mmap() of that file, we also need to check if there's > a consumer ready and install the probe. That looks racy. Say you first create a probe on /bin/ls: perf probe p addr_in_ls /bin/ls then something else launches /bin/ls behind you, probe is set on it then you launch: perf record -e "probe:...." /bin/ls Then it goes recording the previous instance. From peterz at infradead.org Thu Jan 14 11:43:01 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 12:43:01 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100114113509.GB5033@nowhere> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> <20100114113509.GB5033@nowhere> Message-ID: <1263469381.4244.308.camel@laptop> On Thu, 2010-01-14 at 12:35 +0100, Frederic Weisbecker wrote: > On Thu, Jan 14, 2010 at 12:23:11PM +0100, Peter Zijlstra wrote: > > On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > > > This patch implements ftrace plugin for uprobes. > > > > Right, like others have said, trace events is a much saner interface. > > > > So the easiest way I can see that working is to register uprobes against > > a file (not a pid). Then on creation it uses rmap to find all current > > maps of that file and install the probe if there is a consumer for that > > map. > > > > Then for each new mmap() of that file, we also need to check if there's > > a consumer ready and install the probe. > > > > That looks racy. > > Say you first create a probe on /bin/ls: > > perf probe p addr_in_ls /bin/ls > > then something else launches /bin/ls behind you, probe > is set on it > > then you launch: > > perf record -e "probe:...." /bin/ls > > Then it goes recording the previous instance. Uhm, why? Only the perf /bin/ls instance has a consumer and will thus have a probe installed. (Or if you want to use ftrace you need to always have all instances probed anyway) From mjw at redhat.com Thu Jan 14 12:16:51 2010 From: mjw at redhat.com (Mark Wielaard) Date: Thu, 14 Jan 2010 13:16:51 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263468575.4244.306.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> <1263468575.4244.306.camel@laptop> Message-ID: <1263471411.23962.13.camel@springer.wildebeest.org> On Thu, 2010-01-14 at 12:29 +0100, Peter Zijlstra wrote: > On Thu, 2010-01-14 at 12:23 +0100, Peter Zijlstra wrote: > > On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > > > This patch implements ftrace plugin for uprobes. > > > > Right, like others have said, trace events is a much saner interface. > > > > So the easiest way I can see that working is to register uprobes against > > a file (not a pid). > > Just to clarify, this means you can do things like: > > p:uprobe_event dso:symbol[+offs] > > Irrespective of whether there are any current user of that file. Yes, that is a good idea, you can then also refine that with a filter on a target pid. That is what systemtap also does, you define files (whether they are executables or shared libraries, etc) plus symbols/offsets/etc as targets and monitor when they get mapped in (either system wide, per executable or pid based). Cheers, Mark From peterz at infradead.org Thu Jan 14 12:19:56 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 13:19:56 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263471411.23962.13.camel@springer.wildebeest.org> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> <1263468575.4244.306.camel@laptop> <1263471411.23962.13.camel@springer.wildebeest.org> Message-ID: <1263471596.4244.310.camel@laptop> On Thu, 2010-01-14 at 13:16 +0100, Mark Wielaard wrote: > On Thu, 2010-01-14 at 12:29 +0100, Peter Zijlstra wrote: > > On Thu, 2010-01-14 at 12:23 +0100, Peter Zijlstra wrote: > > > On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > > > > This patch implements ftrace plugin for uprobes. > > > > > > Right, like others have said, trace events is a much saner interface. > > > > > > So the easiest way I can see that working is to register uprobes against > > > a file (not a pid). > > > > Just to clarify, this means you can do things like: > > > > p:uprobe_event dso:symbol[+offs] > > > > Irrespective of whether there are any current user of that file. > > Yes, that is a good idea, you can then also refine that with a filter on > a target pid. That is what systemtap also does, you define files > (whether they are executables or shared libraries, etc) plus > symbols/offsets/etc as targets and monitor when they get mapped in > (either system wide, per executable or pid based). Well, the pid part is more the concern of the consumer of the trace-event. The event itself is task invariant and only cares about the particular code in question getting executed. From fweisbec at gmail.com Thu Jan 14 12:23:31 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Thu, 14 Jan 2010 13:23:31 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263469381.4244.308.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> <20100114113509.GB5033@nowhere> <1263469381.4244.308.camel@laptop> Message-ID: <20100114122329.GC5033@nowhere> On Thu, Jan 14, 2010 at 12:43:01PM +0100, Peter Zijlstra wrote: > On Thu, 2010-01-14 at 12:35 +0100, Frederic Weisbecker wrote: > > On Thu, Jan 14, 2010 at 12:23:11PM +0100, Peter Zijlstra wrote: > > > On Mon, 2010-01-11 at 17:56 +0530, Srikar Dronamraju wrote: > > > > This patch implements ftrace plugin for uprobes. > > > > > > Right, like others have said, trace events is a much saner interface. > > > > > > So the easiest way I can see that working is to register uprobes against > > > a file (not a pid). Then on creation it uses rmap to find all current > > > maps of that file and install the probe if there is a consumer for that > > > map. > > > > > > Then for each new mmap() of that file, we also need to check if there's > > > a consumer ready and install the probe. > > > > > > > > That looks racy. > > > > Say you first create a probe on /bin/ls: > > > > perf probe p addr_in_ls /bin/ls > > > > then something else launches /bin/ls behind you, probe > > is set on it > > > > then you launch: > > > > perf record -e "probe:...." /bin/ls > > > > Then it goes recording the previous instance. > > Uhm, why? Only the perf /bin/ls instance has a consumer and will thus > have a probe installed. > > (Or if you want to use ftrace you need to always have all instances > probed anyway) I see, so what you suggest is to have the probe set up as generic first. Then the process that activates it becomes a consumer, right? Would work, yeah. From peterz at infradead.org Thu Jan 14 12:29:09 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Thu, 14 Jan 2010 13:29:09 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <20100114122329.GC5033@nowhere> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> <20100114113509.GB5033@nowhere> <1263469381.4244.308.camel@laptop> <20100114122329.GC5033@nowhere> Message-ID: <1263472149.4244.314.camel@laptop> On Thu, 2010-01-14 at 13:23 +0100, Frederic Weisbecker wrote: > > I see, so what you suggest is to have the probe set up > as generic first. Then the process that activates it > becomes a consumer, right? Right, so either we have it always on, for things like ftrace, in which case the creation traverses rmap and installs the probes all existing mmap()s, and a mmap() hook will install it on all new ones. Or they're strictly consumer driver, like perf, in which case the act of enabling the event will install the probe (if its not there yet). From jkenisto at us.ibm.com Thu Jan 14 19:46:06 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Thu, 14 Jan 2010 11:46:06 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263467289.4244.288.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> Message-ID: <1263498366.4875.25.camel@localhost.localdomain> On Thu, 2010-01-14 at 12:08 +0100, Peter Zijlstra wrote: > On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > > User Space Breakpoint Assistance Layer (UBP) > > > > User space breakpointing Infrastructure provides kernel subsystems > > with architecture independent interface to establish breakpoints in > > user applications. This patch provides core implementation of ubp and > > also wrappers for architecture dependent methods. > > So if this is the basic infrastructure to set userspace breakpoints, > then why not call this uprobe? Ubp is for setting and removing breakpoints, and for supporting the two schemes (inline, out of line) for executing the probed instruction after you hit the breakpoint. Uprobes provides a higher-level API and deals with synchronization issues, process-vs-thread issues, execution of the client's (potentially buggy) probe handler, multiple probe clients, multiple probes at the same location, thread- and process-lifetime events, etc. > > > UBP currently supports both single stepping inline and execution out > > of line strategies. Two different probepoints in the same process can > > have two different strategies. > > maybe explain wth these are? > Here's a partial explanation from patch #6,section 1.1: +When a CPU hits the breakpoint instruction, a trap occurs, the CPU's +user-mode registers are saved, and a SIGTRAP signal is generated. +Uprobes intercepts the SIGTRAP and finds the associated uprobe. +It then executes the handler associated with the uprobe, passing the +handler the addresses of the uprobe struct and the saved registers. +... + +Next, Uprobes single-steps its copy of the probed instruction and +resumes execution of the probed process at the instruction following +the probepoint. (It would be simpler to single-step the actual +instruction in place, but then Uprobes would have to temporarily +remove the breakpoint instruction. This would create problems in a +multithreaded application. For example, it would open a time window +when another thread could sail right past the probepoint.) + +Instruction copies to be single-stepped are stored in a per-process +"single-step out of line (XOL) area," which is a little VM area +created by Uprobes in each probed process's address space. This (single-stepping out of line = SSOL) is essentially what kprobes does on most architectures. XOL (execution out of line) is actually a broader category that could include other schemes, discussed elsewhere. Jim From jkenisto at us.ibm.com Thu Jan 14 22:43:17 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Thu, 14 Jan 2010 14:43:17 -0800 Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) In-Reply-To: <1263467318.4244.289.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> <1263467318.4244.289.camel@laptop> Message-ID: <1263508997.4875.32.camel@localhost.localdomain> On Thu, 2010-01-14 at 12:08 +0100, Peter Zijlstra wrote: > On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > > Execution out of line (XOL) > > > > Slot allocation mechanism for Execution Out of Line strategy in User > > space breakpointing Inftrastructure. (XOL) > > > > This patch provides slot allocation mechanism for execution out of > > line strategy for use with user space breakpoint infrastructure. > > This patch requires utrace support in kernel. > > > > This patch provides five functions xol_get_insn_slot(), > > xol_free_insn_slot(), xol_put_area(), xol_get_area() and > > xol_validate_vaddr(). > > > > Current slot allocation mechanism: > > 1. Allocate one dedicated slot per user breakpoint. > > 2. If the allocated vma is completely used, expand current vma. > > 3. If we cant expand the vma, allocate a new vma. > > > Say what? > > I see the text, but non of it makes any sense at all. > Yeah, there's not a lot of context there. I hope it will make more sense if you read section 1.1 of Documentation/uprobes.txt (patch #6). Or look at get_insn_slot() in kprobes, and understand that we're trying to do something similar in uprobes, where the instruction copies have to reside in the user address space of the probed process. Jim From jkenisto at us.ibm.com Thu Jan 14 22:49:40 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Thu, 14 Jan 2010 14:49:40 -0800 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263467394.4244.291.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> Message-ID: <1263509380.4875.35.camel@localhost.localdomain> On Thu, 2010-01-14 at 12:09 +0100, Peter Zijlstra wrote: > On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > > > > Uprobes Infrastructure enables user to dynamically establish > > probepoints in user applications and collect information by executing > > a handler functions when the probepoints are hit. > > Please refer Documentation/uprobes.txt for more details. > > > > This patch provides the core implementation of uprobes. > > This patch builds on utrace infrastructure. > > > > You need to follow this up with the uprobes patch for your > > architecture. > > So all this is basically some glue between what you call ubp (the real > userspace breakpoint stuff) and utrace? Or does it do more? > My reply in http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/02483.html addresses this. Jim From jkenisto at us.ibm.com Thu Jan 14 23:07:35 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Thu, 14 Jan 2010 15:07:35 -0800 Subject: [RFC] [PATCH 5/7] X86 Support for Uprobes In-Reply-To: <1263467631.4244.294.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122558.22050.431.sendpatchset@srikar.in.ibm.com> <1263467631.4244.294.camel@laptop> Message-ID: <1263510455.4875.49.camel@localhost.localdomain> On Thu, 2010-01-14 at 12:13 +0100, Peter Zijlstra wrote: > On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > > [PATCH] x86 support for Uprobes > > So uhm,.. HAVE_UPROBE is basically HAVE_UBP? Certainly ALMOST all the architecture-specific stuff we've factored out of the old uprobes resides in ubp. Because of how it exploits utrace, uprobes also needs to know what signals to expect from breakpoint traps and single-step traps. (E.g., the "breakpoint signal" in s390 is SIGILL.) I have no objection to moving BREAKPOINT_SIGNAL and SSTEP_SIGNAL to .../asm/ubp.h, even though ubp doesn't actually use them. But until we port ubp to other architectures, we won't know for sure whether we've done a complete job of capturing all the arch-specific stuff there. Jim > > > Signed-off-by: Jim Keniston > > --- > > arch/x86/Kconfig | 1 + > > arch/x86/include/asm/uprobes.h | 27 +++++++++++++++++++++++++++ > > 2 files changed, 28 insertions(+) > > > > Index: new_uprobes.git/arch/x86/Kconfig > > =================================================================== > > --- new_uprobes.git.orig/arch/x86/Kconfig > > +++ new_uprobes.git/arch/x86/Kconfig > > @@ -51,6 +51,7 @@ config X86 > > select HAVE_KERNEL_LZMA > > select HAVE_HW_BREAKPOINT > > select HAVE_UBP > > + select HAVE_UPROBES > > select HAVE_ARCH_KMEMCHECK > > select HAVE_USER_RETURN_NOTIFIER > > > > Index: new_uprobes.git/arch/x86/include/asm/uprobes.h > > =================================================================== > > --- /dev/null > > +++ new_uprobes.git/arch/x86/include/asm/uprobes.h > > @@ -0,0 +1,27 @@ > > +#ifndef _ASM_UPROBES_H > > +#define _ASM_UPROBES_H > > +/* > > + * Userspace Probes (UProbes) > > + * uprobes.h > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2 of the License, or > > + * (at your option) any later version. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > + * GNU General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public License > > + * along with this program; if not, write to the Free Software > > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. > > + * > > + * Copyright (C) IBM Corporation, 2008, 2009 > > + */ > > +#include > > + > > +#define BREAKPOINT_SIGNAL SIGTRAP > > +#define SSTEP_SIGNAL SIGTRAP > > +#endif /* _ASM_UPROBES_H */ > > From peterz at infradead.org Fri Jan 15 09:02:55 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 10:02:55 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263498366.4875.25.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> Message-ID: <1263546175.4244.342.camel@laptop> On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > +Instruction copies to be single-stepped are stored in a per-process > +"single-step out of line (XOL) area," which is a little VM area > +created by Uprobes in each probed process's address space. I think tinkering with the probed process's address space is a no-no. Have you ran this by the linux mm folks? I'd be inclined to NAK this straight out. From peterz at infradead.org Fri Jan 15 09:03:48 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 10:03:48 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263498366.4875.25.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> Message-ID: <1263546228.4244.343.camel@laptop> On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > discussed elsewhere. Thanks for the pointer... From peterz at infradead.org Fri Jan 15 09:07:35 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 10:07:35 +0100 Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) In-Reply-To: <1263508997.4875.32.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> <1263467318.4244.289.camel@laptop> <1263508997.4875.32.camel@localhost.localdomain> Message-ID: <1263546455.4244.348.camel@laptop> On Thu, 2010-01-14 at 14:43 -0800, Jim Keniston wrote: > > Yeah, there's not a lot of context there. I hope it will make more > sense if you read section 1.1 of Documentation/uprobes.txt (patch #6). > Or look at get_insn_slot() in kprobes, and understand that we're trying > to do something similar in uprobes, where the instruction copies have to > reside in the user address space of the probed process. That's not the point, changelogs shoulnd not be this cryptic. They should be stand alone and descriptive of what, why and how. If you can't be bothered writing such for something you want reviewed for inclusion then I might not be bothered looking at them at all. From peterz at infradead.org Fri Jan 15 09:10:32 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 10:10:32 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263509380.4875.35.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> Message-ID: <1263546632.4244.352.camel@laptop> On Thu, 2010-01-14 at 14:49 -0800, Jim Keniston wrote: > On Thu, 2010-01-14 at 12:09 +0100, Peter Zijlstra wrote: > > On Mon, 2010-01-11 at 17:55 +0530, Srikar Dronamraju wrote: > > > > > > Uprobes Infrastructure enables user to dynamically establish > > > probepoints in user applications and collect information by executing > > > a handler functions when the probepoints are hit. > > > Please refer Documentation/uprobes.txt for more details. > > > > > > This patch provides the core implementation of uprobes. > > > This patch builds on utrace infrastructure. > > > > > > You need to follow this up with the uprobes patch for your > > > architecture. > > > > So all this is basically some glue between what you call ubp (the real > > userspace breakpoint stuff) and utrace? Or does it do more? > > > > My reply in > http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/02483.html > addresses this. Right, so all that need be done is add the multiple probe stuff to UBP and its a sane interface to use on its own, at which point I'd be inclined to call that uprobes (UBP really is an crap name). Then we can ditch the whole utrace muck as I see no reason to want to use that, whereas the ubp (given a sane name) looks interesting. From divulgacao at refrimur.com Fri Jan 15 09:13:30 2010 From: divulgacao at refrimur.com (Refrimur) Date: Fri, 15 Jan 2010 07:13:30 -0200 Subject: FELIZ 2010 (refrimur janeiro) Message-ID: An HTML attachment was scrubbed... URL: From fche at redhat.com Fri Jan 15 09:26:33 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 15 Jan 2010 04:26:33 -0500 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263546632.4244.352.camel@laptop> (Peter Zijlstra's message of "Fri, 15 Jan 2010 10:10:32 +0100") References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> Message-ID: Peter Zijlstra writes: > [...] > Right, so all that need be done is add the multiple probe stuff to UBP > and its a sane interface to use on its own, at which point I'd be > inclined to call that uprobes (UBP really is an crap name). At one point ubp+uprobes were one piece. They were separated on the suspicion that lkml would like them that way. > Then we can ditch the whole utrace muck as I see no reason to want to > use that, whereas the ubp (given a sane name) looks interesting. Assuming you meant what you write, perhaps you misunderstand the layering relationship of these pieces. utrace underlies uprobes and other process manipulation functionality, present and future. - FChE From peterz at infradead.org Fri Jan 15 09:35:24 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 10:35:24 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> Message-ID: <1263548124.4244.358.camel@laptop> On Fri, 2010-01-15 at 04:26 -0500, Frank Ch. Eigler wrote: > Peter Zijlstra writes: > > > [...] > > Right, so all that need be done is add the multiple probe stuff to UBP > > and its a sane interface to use on its own, at which point I'd be > > inclined to call that uprobes (UBP really is an crap name). > > At one point ubp+uprobes were one piece. They were separated on the > suspicion that lkml would like them that way. Right, good thinking, that way we can use ubp without having to use utrace ;-) > > Then we can ditch the whole utrace muck as I see no reason to want to > > use that, whereas the ubp (given a sane name) looks interesting. > > Assuming you meant what you write, perhaps you misunderstand the > layering relationship of these pieces. utrace underlies uprobes and > other process manipulation functionality, present and future. Why, utrace doesn't at all look to bring a fundamental contribution to all that. If there's a proper kernel interface to install probes on userspace code (ubp seems to be mostly that) then I can use perf/ftrace to do the rest of the state management, no need to use utrace there. You can hardly force me to use utrace there, can you? From ananth at in.ibm.com Fri Jan 15 09:38:31 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 15 Jan 2010 15:08:31 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263546228.4244.343.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> Message-ID: <20100115093831.GC26396@in.ibm.com> On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote: > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > discussed elsewhere. > > Thanks for the pointer... :-) Peter, I think Jim was referring to http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html Ananth From peterz at infradead.org Fri Jan 15 09:50:14 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 10:50:14 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100115093831.GC26396@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> Message-ID: <1263549014.4244.374.camel@laptop> On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote: > On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote: > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > > > discussed elsewhere. > > > > Thanks for the pointer... > > :-) > > Peter, > I think Jim was referring to > http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html That's a 2007 email from some obscure list... that's hardly something that can be referenced to without link. As previously stated, I think poking at a process's address space is an utter no-go. From ananth at in.ibm.com Fri Jan 15 10:10:56 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 15 Jan 2010 15:40:56 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263549014.4244.374.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> Message-ID: <20100115101056.GD26396@in.ibm.com> On Fri, Jan 15, 2010 at 10:50:14AM +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote: > > On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote: > > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > > > > > discussed elsewhere. > > > > > > Thanks for the pointer... > > > > :-) > > > > Peter, > > I think Jim was referring to > > http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html > > That's a 2007 email from some obscure list... that's hardly something > that can be referenced to without link. > > As previously stated, I think poking at a process's address space is an > utter no-go. In which case we'll need to find a different solution to it. The gdb style of 'breakpoint hit' -> 'put original instruction back in place' -> single-step -> 'put back the breakpoint' would be a big limiter, especially for multithreaded cases. The design here is to have a small vma sufficiently high enough in memory a-la vDSO that most apps won't reach, though there is still no ironclad guarantee. Ideally, we will need to single-step on a copy of the instruction, in the user address space of the traced process. Ideas? Ananth From peterz at infradead.org Fri Jan 15 10:13:32 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 11:13:32 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100115101056.GD26396@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <20100115101056.GD26396@in.ibm.com> Message-ID: <1263550412.4244.375.camel@laptop> On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote: > Ideas? emulate the one instruction? From ananth at in.ibm.com Fri Jan 15 10:22:05 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 15 Jan 2010 15:52:05 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263550412.4244.375.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <20100115101056.GD26396@in.ibm.com> <1263550412.4244.375.camel@laptop> Message-ID: <20100115102205.GE26396@in.ibm.com> On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote: > > > Ideas? > > emulate the one instruction? In kernel? Generically? Don't think its that easy for userspace -- you have the full gamut of instructions to emulate (fp, vector, etc); further, the instruction could itself cause a page fault and the like. From srikar at linux.vnet.ibm.com Fri Jan 15 10:26:45 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Fri, 15 Jan 2010 15:56:45 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263546632.4244.352.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> Message-ID: <20100115102645.GA22640@linux.vnet.ibm.com> Hi Peter, > > > > > > > My reply in > > http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/02483.html > > addresses this. > > Right, so all that need be done is add the multiple probe stuff to UBP > and its a sane interface to use on its own, at which point I'd be > inclined to call that uprobes (UBP really is an crap name). I am fine with renaming ubp to a suggested name. The reason for splitting uprobes to two layers was to allow others (currently none) to reuse the current ubp layer. It was felt that there could be multiple clients for ubp who could co-exist. However ubp alone is not enough to provide the userspace tracing. Currently it wouldn't understand synchronization between different threads of a process, process life time issues, context in which the handler has to be run. As pointed out by Jim earlier, we have segregrated that layer which takes care of the above issues into the uprobes layer. For example, while inserting a breakpoint, one of the threads of a process could be running at the same place where we are trying to place a breakpoint. Or there could be two threads that could be racing to insert/delete a breakpoint. These synchronization issues are all handled by the Uprobes layer. Uprobes layer would need to be notified of process life-time events like fork/clone/exec/exit. It also needs to know - when a breakpoint is hit - stop and resume a thread. Uprobes layer uses utrace to be notified of the process life time events and the signal handling part. -- Thanks and Regards Srikar > > Then we can ditch the whole utrace muck as I see no reason to want to > use that, whereas the ubp (given a sane name) looks interesting. > From peterz at infradead.org Fri Jan 15 10:33:27 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 11:33:27 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115102645.GA22640@linux.vnet.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> Message-ID: <1263551607.4244.379.camel@laptop> On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote: > Hi Peter, > > Or there could be two threads that could be racing to > insert/delete a breakpoint. These synchronization issues are all handled > by the Uprobes layer. Shouldn't be hard to put that in the ubp layer, right? > Uprobes layer would need to be notified of process life-time events > like fork/clone/exec/exit. No so much the process lifetimes as the vma life times are interesting, placing a hook in the vm code to track that isn't too hard, > It also needs to know > - when a breakpoint is hit > - stop and resume a thread. A simple hook in the trap code is done quickly enough, and no reason to stop the thread, its not going anywhere when it traps. From peterz at infradead.org Fri Jan 15 10:56:43 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 11:56:43 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100115102205.GE26396@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <20100115101056.GD26396@in.ibm.com> <1263550412.4244.375.camel@laptop> <20100115102205.GE26396@in.ibm.com> Message-ID: <1263553003.4244.385.camel@laptop> On Fri, 2010-01-15 at 15:52 +0530, Ananth N Mavinakayanahalli wrote: > On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote: > > On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote: > > > > > Ideas? > > > > emulate the one instruction? > > In kernel? Generically? Don't think its that easy for userspace -- > you have the full gamut of instructions to emulate (fp, vector, etc); > further, Can't you jit a piece of code that wraps the one instruction, save the full cpu state, set the userspace segments, have it load pt_regs (except for the IP) execute the one ins, save the results, restore the full state? Then replace pt_regs with the saved result and advance the stored IP by the length of that one instruction and return to userspace? All you need to take care of are the priv insns, but doesn't something like kvm already have code to deal with that? > the instruction could itself cause a page fault and the like. Faults aren't a problem, we take faults from kernel space all the time. From peterz at infradead.org Fri Jan 15 11:02:15 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 12:02:15 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263553003.4244.385.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <20100115101056.GD26396@in.ibm.com> <1263550412.4244.375.camel@laptop> <20100115102205.GE26396@in.ibm.com> <1263553003.4244.385.camel@laptop> Message-ID: <1263553335.4244.387.camel@laptop> On Fri, 2010-01-15 at 11:56 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 15:52 +0530, Ananth N Mavinakayanahalli wrote: > > On Fri, Jan 15, 2010 at 11:13:32AM +0100, Peter Zijlstra wrote: > > > On Fri, 2010-01-15 at 15:40 +0530, Ananth N Mavinakayanahalli wrote: > > > > > > > Ideas? > > > > > > emulate the one instruction? > > > > In kernel? Generically? Don't think its that easy for userspace -- > > you have the full gamut of instructions to emulate (fp, vector, etc); > > further, > > Can't you jit a piece of code that wraps the one instruction, save the > full cpu state, set the userspace segments, have it load pt_regs (except > for the IP) execute the one ins, save the results, restore the full > state? Hmm, normally the problem with FP/Vector state is that we don't save/restore it going in/out the kernel, so kernel-space can't use it because it would change the userspace state, but in this case we can simply execute that one instruction and have it change user state, because that's exactly what we want to do. So we don't need to save restore the full cpu state around that JIT'ed piece of code, but just the regular regs. From maneesh at in.ibm.com Fri Jan 15 11:05:47 2010 From: maneesh at in.ibm.com (Maneesh Soni) Date: Fri, 15 Jan 2010 16:35:47 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263551607.4244.379.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> Message-ID: <20100115110547.GB3660@in.ibm.com> On Fri, Jan 15, 2010 at 11:33:27AM +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote: > > Hi Peter, > > > > Or there could be two threads that could be racing to > > insert/delete a breakpoint. These synchronization issues are all handled > > by the Uprobes layer. > > Shouldn't be hard to put that in the ubp layer, right? > > > Uprobes layer would need to be notified of process life-time events > > like fork/clone/exec/exit. > > No so much the process lifetimes as the vma life times are interesting, > placing a hook in the vm code to track that isn't too hard, > I think similar hooks were given thumbs down in the previous incarnation of uprobes (which was implemented without utrace). http://lkml.indiana.edu/hypermail/linux/kernel/0603.2/1254.html Thanks Maneesh -- Maneesh Soni Linux Technology Center IBM India Systems and Technology Lab, Bangalore, India. From srikar at linux.vnet.ibm.com Fri Jan 15 11:12:27 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Fri, 15 Jan 2010 16:42:27 +0530 Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) In-Reply-To: <1263546455.4244.348.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> <1263467318.4244.289.camel@laptop> <1263508997.4875.32.camel@localhost.localdomain> <1263546455.4244.348.camel@laptop> Message-ID: <20100115111227.GB20658@linux.vnet.ibm.com> * Peter Zijlstra [2010-01-15 10:07:35]: > On Thu, 2010-01-14 at 14:43 -0800, Jim Keniston wrote: > > > > Yeah, there's not a lot of context there. I hope it will make more > > sense if you read section 1.1 of Documentation/uprobes.txt (patch #6). > > Or look at get_insn_slot() in kprobes, and understand that we're trying > > to do something similar in uprobes, where the instruction copies have to > > reside in the user address space of the probed process. > > That's not the point, changelogs shoulnd not be this cryptic. They > should be stand alone and descriptive of what, why and how. > > If you can't be bothered writing such for something you want reviewed > for inclusion then I might not be bothered looking at them at all. > Okay shall add to the Changelog with the information providing the context for this patch. From peterz at infradead.org Fri Jan 15 11:12:35 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 12:12:35 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115110547.GB3660@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115110547.GB3660@in.ibm.com> Message-ID: <1263553955.4244.393.camel@laptop> On Fri, 2010-01-15 at 16:35 +0530, Maneesh Soni wrote: > On Fri, Jan 15, 2010 at 11:33:27AM +0100, Peter Zijlstra wrote: > > On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote: > > > Hi Peter, > > > > > > Or there could be two threads that could be racing to > > > insert/delete a breakpoint. These synchronization issues are all handled > > > by the Uprobes layer. > > > > Shouldn't be hard to put that in the ubp layer, right? > > > > > Uprobes layer would need to be notified of process life-time events > > > like fork/clone/exec/exit. > > > > No so much the process lifetimes as the vma life times are interesting, > > placing a hook in the vm code to track that isn't too hard, > > > > I think similar hooks were given thumbs down in the previous incarnation > of uprobes (which was implemented without utrace). > > http://lkml.indiana.edu/hypermail/linux/kernel/0603.2/1254.html I wasn't at all proposing to mess with a_ops, nor do you really need to, I was more thinking of adding a callback like perf_event_mmap() and a corresponding unmap(), that way you can track mapping life-times and add/remove probes accordingly. Adding the probe uses the fact that (most) executable mappings are MAP_PRIVATE and CoWs a private copy of the page with the modified ins, right? What does it do for MAP_SHARED|MAP_EXECUTABLE sections -- simply fail to add the probe? From peterz at infradead.org Fri Jan 15 11:18:04 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 12:18:04 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263553955.4244.393.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115110547.GB3660@in.ibm.com> <1263553955.4244.393.camel@laptop> Message-ID: <1263554284.4244.396.camel@laptop> On Fri, 2010-01-15 at 12:12 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 16:35 +0530, Maneesh Soni wrote: > > On Fri, Jan 15, 2010 at 11:33:27AM +0100, Peter Zijlstra wrote: > > > On Fri, 2010-01-15 at 15:56 +0530, Srikar Dronamraju wrote: > > > > Hi Peter, > > > > > > > > Or there could be two threads that could be racing to > > > > insert/delete a breakpoint. These synchronization issues are all handled > > > > by the Uprobes layer. > > > > > > Shouldn't be hard to put that in the ubp layer, right? > > > > > > > Uprobes layer would need to be notified of process life-time events > > > > like fork/clone/exec/exit. > > > > > > No so much the process lifetimes as the vma life times are interesting, > > > placing a hook in the vm code to track that isn't too hard, > > > > > > > I think similar hooks were given thumbs down in the previous incarnation > > of uprobes (which was implemented without utrace). > > > > http://lkml.indiana.edu/hypermail/linux/kernel/0603.2/1254.html > > I wasn't at all proposing to mess with a_ops, nor do you really need to, > I was more thinking of adding a callback like perf_event_mmap() and a > corresponding unmap(), that way you can track mapping life-times and > add/remove probes accordingly. > > Adding the probe uses the fact that (most) executable mappings are > MAP_PRIVATE and CoWs a private copy of the page with the modified ins, > right? Does it clean up the CoW'ed page on removing the probe? Does that account for userspace having made other changes in between installing and removing the probe (for PROT_WRITE mappings obviously)? From srikar at linux.vnet.ibm.com Fri Jan 15 13:08:02 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Fri, 15 Jan 2010 18:38:02 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263551607.4244.379.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> Message-ID: <20100115130802.GC20658@linux.vnet.ibm.com> * Peter Zijlstra [2010-01-15 11:33:27]: > > > Uprobes layer would need to be notified of process life-time events > > like fork/clone/exec/exit. > > No so much the process lifetimes as the vma life times are interesting, > placing a hook in the vm code to track that isn't too hard, > > > It also needs to know > > - when a breakpoint is hit > > - stop and resume a thread. > > A simple hook in the trap code is done quickly enough, and no reason to > stop the thread, its not going anywhere when it traps. > > Some of the threads could be executing in the vicinity of the breakpoint when it is getting inserted or deleted. Wont we need to stop/quiesce those threads? From fche at redhat.com Fri Jan 15 13:10:37 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 15 Jan 2010 08:10:37 -0500 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263548124.4244.358.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> Message-ID: <20100115131037.GP4822@redhat.com> Hi - > > > Then we can ditch the whole utrace muck as I see no reason to want to > > > use that, whereas the ubp (given a sane name) looks interesting. > > > > Assuming you meant what you write, perhaps you misunderstand the > > layering relationship of these pieces. utrace underlies uprobes and > > other process manipulation functionality, present and future. > > Why, utrace doesn't at all look to bring a fundamental contribution to > all that. If there's a proper kernel interface to install probes on > userspace code (ubp seems to be mostly that) then I can use perf/ftrace > to do the rest of the state management, no need to use utrace there. > You can hardly force me to use utrace there, can you? utrace is not a form of punishment inflicted upon the undeserving. It is a service layer that uprobes et alii are built upon. You as a potential uprobes client need not also talk directly to it, if you wish to reimplement task-finder-like services some other way. - FChE From peterz at infradead.org Fri Jan 15 13:16:29 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 14:16:29 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115130802.GC20658@linux.vnet.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115130802.GC20658@linux.vnet.ibm.com> Message-ID: <1263561389.4244.410.camel@laptop> On Fri, 2010-01-15 at 18:38 +0530, Srikar Dronamraju wrote: > * Peter Zijlstra [2010-01-15 11:33:27]: > > > > > > Uprobes layer would need to be notified of process life-time events > > > like fork/clone/exec/exit. > > > > No so much the process lifetimes as the vma life times are interesting, > > placing a hook in the vm code to track that isn't too hard, > > > > > It also needs to know > > > - when a breakpoint is hit > > > - stop and resume a thread. > > > > A simple hook in the trap code is done quickly enough, and no reason to > > stop the thread, its not going anywhere when it traps. > > > > > > Some of the threads could be executing in the vicinity of the > breakpoint when it is getting inserted or deleted. Wont we need to > stop/quiesce those threads? The easy answer it so use kstopmachine to patch the code, the slightly more complex would be using something like: http://lkml.org/lkml/2010/1/12/300 From peterz at infradead.org Fri Jan 15 13:25:30 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 14:25:30 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115131037.GP4822@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> Message-ID: <1263561930.4244.417.camel@laptop> On Fri, 2010-01-15 at 08:10 -0500, Frank Ch. Eigler wrote: > Hi - > > > > > Then we can ditch the whole utrace muck as I see no reason to want to > > > > use that, whereas the ubp (given a sane name) looks interesting. > > > > > > Assuming you meant what you write, perhaps you misunderstand the > > > layering relationship of these pieces. utrace underlies uprobes and > > > other process manipulation functionality, present and future. > > > > Why, utrace doesn't at all look to bring a fundamental contribution to > > all that. If there's a proper kernel interface to install probes on > > userspace code (ubp seems to be mostly that) then I can use perf/ftrace > > to do the rest of the state management, no need to use utrace there. > > > You can hardly force me to use utrace there, can you? > > utrace is not a form of punishment inflicted upon the undeserving. It > is a service layer that uprobes et alii are built upon. You as a > potential uprobes client need not also talk directly to it, if you > wish to reimplement task-finder-like services some other way. I said I wanted to, I think the whole task orientation of user-space probing is wrong, its about text mappings. But yes, I think that for most purposes utrace is a punishment, its way too heavy, I mean, trap, generate a signal, catch the signal, that's like an insane amount of code to jump through in order to get that trap. Furthermore it requires stopping and resuming tasks and nonsense like that, that's unwanted in many cases, just run stuff from the trap site and you're done. Yes you can do this with utrace, and I'm not going to stop people from using utrace for his, I'm just saying I'm not going to. From fche at redhat.com Fri Jan 15 13:38:25 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 15 Jan 2010 08:38:25 -0500 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263561930.4244.417.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> Message-ID: <20100115133825.GQ4822@redhat.com> Hi - On Fri, Jan 15, 2010 at 02:25:30PM +0100, Peter Zijlstra wrote: > [...] > > utrace is not a form of punishment inflicted upon the undeserving. It > > is a service layer that uprobes et alii are built upon. You as a > > potential uprobes client need not also talk directly to it, if you > > wish to reimplement task-finder-like services some other way. > > [...] > But yes, I think that for most purposes utrace is a punishment, its way > too heavy, I mean, trap, generate a signal, catch the signal, that's > like an insane amount of code to jump through in order to get that trap. At the bottom, there will be an int3 in the userspace text page. There will be a trap taken, no matter what. Code must figure out whether this trap came from an in-kernel client such as uprobes, or whether it is to be passed through to a userspace debugger via ptrace (or the gdbstub). This part is unavoidable if you wish to be compatible. I'm not sure, but it sounds like the part you're complaining about is how utrace ultimately reports the trap to uprobes: i.e., utrace_get_signal()? Is that the "insane amount of code"? > Furthermore it requires stopping and resuming tasks and nonsense like > that, that's unwanted in many cases, just run stuff from the trap site > and you're done. I don't know what you mean exactly. A trap already stopped task. utrace merely allows various clients to inspect/manipulate the state of the task at that moment. It does not add any context switches or spurious stop/resumue operations. - FChE From peterz at infradead.org Fri Jan 15 13:38:04 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 14:38:04 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263561389.4244.410.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115130802.GC20658@linux.vnet.ibm.com> <1263561389.4244.410.camel@laptop> Message-ID: <1263562684.4244.419.camel@laptop> On Fri, 2010-01-15 at 14:16 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 18:38 +0530, Srikar Dronamraju wrote: > > * Peter Zijlstra [2010-01-15 11:33:27]: > > > > > > > > > Uprobes layer would need to be notified of process life-time events > > > > like fork/clone/exec/exit. > > > > > > No so much the process lifetimes as the vma life times are interesting, > > > placing a hook in the vm code to track that isn't too hard, > > > > > > > It also needs to know > > > > - when a breakpoint is hit > > > > - stop and resume a thread. > > > > > > A simple hook in the trap code is done quickly enough, and no reason to > > > stop the thread, its not going anywhere when it traps. > > > > > > > > > > Some of the threads could be executing in the vicinity of the > > breakpoint when it is getting inserted or deleted. Wont we need to > > stop/quiesce those threads? > > The easy answer it so use kstopmachine to patch the code, the slightly > more complex would be using something like: > > http://lkml.org/lkml/2010/1/12/300 Also, since its userspace, can't you simply play games with the pagetables? CoW a new private copy of the page and flip the pagetables around to the new one, then flush the pagetables on all relevant cpus and bob's your uncle. (You might have to play some games with making the page RO to trap intermediate accesses, but that should work I think) From peterz at infradead.org Fri Jan 15 13:47:56 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 14:47:56 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115133825.GQ4822@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> Message-ID: <1263563276.4244.426.camel@laptop> On Fri, 2010-01-15 at 08:38 -0500, Frank Ch. Eigler wrote: > Hi - > > On Fri, Jan 15, 2010 at 02:25:30PM +0100, Peter Zijlstra wrote: > > [...] > > > utrace is not a form of punishment inflicted upon the undeserving. It > > > is a service layer that uprobes et alii are built upon. You as a > > > potential uprobes client need not also talk directly to it, if you > > > wish to reimplement task-finder-like services some other way. > > > > [...] > > But yes, I think that for most purposes utrace is a punishment, its way > > too heavy, I mean, trap, generate a signal, catch the signal, that's > > like an insane amount of code to jump through in order to get that trap. > > At the bottom, there will be an int3 in the userspace text page. > There will be a trap taken, no matter what. Code must figure out > whether this trap came from an in-kernel client such as uprobes, or > whether it is to be passed through to a userspace debugger via ptrace > (or the gdbstub). This part is unavoidable if you wish to be > compatible. Sure, a lookup against existing probe sites on trap is unavoidable, if you find a match, you call a probe specific handler and deal with it there, if you don't you'll eventually generate a SIGTRAP and fall back to userspace. Thing is, utrace doesn't do that (nor should it), its something the uprobe interface should implement just like kprobes does. > I'm not sure, but it sounds like the part you're complaining about is > how utrace ultimately reports the trap to uprobes: i.e., > utrace_get_signal()? Is that the "insane amount of code"? Well when tracing/profiling every instruction is too much. Having to needlessly raise a signal only to catch it again a short bit later sounds like obvious waste to me. > > Furthermore it requires stopping and resuming tasks and nonsense like > > that, that's unwanted in many cases, just run stuff from the trap site > > and you're done. > > I don't know what you mean exactly. A trap already stopped task. > utrace merely allows various clients to inspect/manipulate the state > of the task at that moment. It does not add any context switches or > spurious stop/resumue operations. Srikar seemed to suggest it needed stop/resume. From fche at redhat.com Fri Jan 15 14:00:42 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 15 Jan 2010 09:00:42 -0500 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263563276.4244.426.camel@laptop> References: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> Message-ID: <20100115140042.GR4822@redhat.com> Hi - On Fri, Jan 15, 2010 at 02:47:56PM +0100, Peter Zijlstra wrote: > [...] > > I'm not sure, but it sounds like the part you're complaining about is > > how utrace ultimately reports the trap to uprobes: i.e., > > utrace_get_signal()? Is that the "insane amount of code"? > > Well when tracing/profiling every instruction is too much. Having to > needlessly raise a signal only to catch it again a short bit later > sounds like obvious waste to me. Well, I'm not in a position to argue line by line about the necessity or the cost of utrace low level guts, but this may represent the most practical engineering balance between functionality / modularity / undesirably intrusive modifications. Perhaps there exists a tool with which one can confirm your worry about excess cost of this particular piece. > > > Furthermore it requires stopping and resuming tasks and nonsense like > > > that, that's unwanted in many cases, just run stuff from the trap site > > > and you're done. > > > > I don't know what you mean exactly. A trap already stopped task. > > utrace merely allows various clients to inspect/manipulate the state > > of the task at that moment. It does not add any context switches or > > spurious stop/resumue operations. > > Srikar seemed to suggest it needed stop/resume. You may be confusing breakpoint insertion/removal operations versus breakpoint hits. - FChE From peterz at infradead.org Fri Jan 15 14:06:43 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 15:06:43 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115140042.GR4822@redhat.com> References: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> <20100115140042.GR4822@redhat.com> Message-ID: <1263564403.4244.430.camel@laptop> On Fri, 2010-01-15 at 09:00 -0500, Frank Ch. Eigler wrote: > Well, I'm not in a position to argue line by line about the necessity > or the cost of utrace low level guts, but this may represent the most > practical engineering balance between functionality / modularity / > undesirably intrusive modifications. How intrusive and non-modular is installing a DIE_INT3 notifier? From srikar at linux.vnet.ibm.com Fri Jan 15 14:20:07 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Fri, 15 Jan 2010 19:50:07 +0530 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263563276.4244.426.camel@laptop> References: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> Message-ID: <20100115142007.GA1628@linux.vnet.ibm.com> > > > > Furthermore it requires stopping and resuming tasks and nonsense like > > > that, that's unwanted in many cases, just run stuff from the trap site > > > and you're done. > > > > I don't know what you mean exactly. A trap already stopped task. > > utrace merely allows various clients to inspect/manipulate the state > > of the task at that moment. It does not add any context switches or > > spurious stop/resumue operations. > > Srikar seemed to suggest it needed stop/resume. > If process traps, We dont need to stop/resume other threads. uprobes needs threads to quiesce when inserting/deleting the breakpoint. From fche at redhat.com Fri Jan 15 14:22:13 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 15 Jan 2010 09:22:13 -0500 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263564403.4244.430.camel@laptop> References: <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> <20100115140042.GR4822@redhat.com> <1263564403.4244.430.camel@laptop> Message-ID: <20100115142213.GS4822@redhat.com> Hi - > > Well, I'm not in a position to argue line by line about the necessity > > or the cost of utrace low level guts, but this may represent the most > > practical engineering balance between functionality / modularity / > > undesirably intrusive modifications. > > How intrusive and non-modular is installing a DIE_INT3 notifier? I'm not sure about all the reasons pro/con, but it looks like installing such a systemwide hook would force every userspace breakpoint or kprobe event machine wide to pass through the hypothetical uprobes layer, whether or not applicable to a current task. - FChE From peterz at infradead.org Fri Jan 15 14:25:09 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 15:25:09 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115142007.GA1628@linux.vnet.ibm.com> References: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> <20100115142007.GA1628@linux.vnet.ibm.com> Message-ID: <1263565509.4244.432.camel@laptop> On Fri, 2010-01-15 at 19:50 +0530, Srikar Dronamraju wrote: > > Srikar seemed to suggest it needed stop/resume. > > > > If process traps, We dont need to stop/resume other threads. > uprobes needs threads to quiesce when inserting/deleting the breakpoint. Right, I don't think we need to at all. See the CoW thing from previous emails. From peterz at infradead.org Fri Jan 15 14:40:36 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 15:40:36 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115142213.GS4822@redhat.com> References: <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> <20100115140042.GR4822@redhat.com> <1263564403.4244.430.camel@laptop> <20100115142213.GS4822@redhat.com> Message-ID: <1263566436.4244.435.camel@laptop> On Fri, 2010-01-15 at 09:22 -0500, Frank Ch. Eigler wrote: > Hi - > > > > Well, I'm not in a position to argue line by line about the necessity > > > or the cost of utrace low level guts, but this may represent the most > > > practical engineering balance between functionality / modularity / > > > undesirably intrusive modifications. > > > > How intrusive and non-modular is installing a DIE_INT3 notifier? > > I'm not sure about all the reasons pro/con, but it looks like > installing such a systemwide hook would force every userspace > breakpoint or kprobe event machine wide to pass through the > hypothetical uprobes layer, whether or not applicable to a current > task. Well, we'll have to pass through the global die notifier anyway, but a quick per task filter sounds like a good idea, we can do that by keeping a per-task count of the number of uprobes in use. Then the uprobe code can avoid the lookup if there are no task users and no global users. The advantage of this construct is that is easily allows for global users, whereas a utrace based one doesn't. From tfxh at uicm.com Fri Jan 15 18:02:44 2010 From: tfxh at uicm.com (=?GB2312?B?x+vXqtPQudjIy8rC?=) Date: Sat, 16 Jan 2010 02:02:44 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVstNO8vMr119/P8rncwO0=?= Message-ID: <201001151802.o0FI2T7R012273@mx1.redhat.com> utrace-devel??????? ?????2010?1?25?26? ?? ?????2010?1?28?29? ?? ?????3200????????1600?????????/???????????????????? ???????CEO/?????????/???????/???????????/??????????? ????????PMO???????????????????????? ????????????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comction Plan???? ????????????????? --------------------------------------------------------------------------------------------------- ???? ???????0.5? 1)???????????? ????????????????????1.5? 1)?????????????????? 2)?????????????? 3)?????????????? 4)?????????????????????????????????????????????????????????????????????????????? 5)???????????? 6)??????? 7)?????????????? 8??????????????????????? 9???????? ????????????????3.5? 1)???????? 2)????? 3)????????? 4)????????? 5)????????? 6)????????? 7)????????? ???????????????1.5? 1)????????????????????????? 2)????????????????????????????? 3)????????? 4)????????????? 5)??????? 6)????????????? 7)?????????????????????? 8)???????? 9)??????????? 10)?????????????? 11)??????????? 12)?????????? 13)????????????? 14)???????????????? 15)?????????????????????????????????????????? 16)????????? 17)???????????????????? 18)????????? 19)?????????????????????? ???????????????????????????1.0? 1)???????? 2)???????????? 3)????????????????????????? 4)????????????????? 5)???????????? 6)?????????SMART??????????????PBC?? 7)?????????????SMART 8)?????????SMART???????????SMART 9)???????PDCA?? 10)????????????????????????????????? 11)?????????? 12)??????????? 13)PERT??????GANNT 14)???????????PERT? 15)?????????????????????????????? 16)???????????? 17)????????? 18)???????????? 19)??????????????????? ?????????????????????????????2.0? 1)???????????? 2)??????????? 3)???????????? 4)???????????? 5)????????????????? 6)????????? 7)???????? 8)???????? 9)???????/???? 10)??????????? 11)????????????? 12)???????????? 13)????????????????????????? 14)????????????????????????????? 15)???????????????????????????????????????????????????????????????????????? 16)??????????????????????????????? 17)???? 30 ???????????????????????????????????????????????????????????? 18)?????????????????????????????????? 19)????????????????????????????????? 20)????????????????????????????? 21)??????????????????????? 22)??????????????????? ???????????????????????????1.5? 1)??????????? 2)?????????????? 3)????????? 4)?????????????????????? 5)???????????????????????? 6)?????????????????????? 7)??????????????????????????? 8)???????????????????????? 9)?????????????????????????? 10)?????????????????????? 11)????????????????????????? 12)??????????????????????PCB? 13)????????????????? 14)?????????????? 15)???????????? 16)??????????????????? 17)?????????? 18)??????????????????? 19)???????????????? 20)???????????? 21)??????????????????? 22)????????????????????????? 23)??????? ???????????????????????????2.0? 1)?????????? 2)???????????? 3)?????????????????????? 4)?????????????????? 5)???????? 6)???????????????? 7)???????????????? 8)???????????????????????? 9)????????????????? 10)???????????????????? 11)????? ???????????????????0.5? 1)????????? 2)??????? 3)????????????????? 4)??????????? 5)????????????????????? -------------------------------------------------------------------------------------------------------- ???? Gilesrom jkenisto at us.ibm.com Fri Jan 15 20:18:51 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 12:18:51 -0800 Subject: [RFC] [PATCH 3/7] Execution out of line (XOL) In-Reply-To: <1263546455.4244.348.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122545.22050.64994.sendpatchset@srikar.in.ibm.com> <1263467318.4244.289.camel@laptop> <1263508997.4875.32.camel@localhost.localdomain> <1263546455.4244.348.camel@laptop> Message-ID: <1263586731.5007.2.camel@localhost.localdomain> On Fri, 2010-01-15 at 10:07 +0100, Peter Zijlstra wrote: > On Thu, 2010-01-14 at 14:43 -0800, Jim Keniston wrote: > > > > Yeah, there's not a lot of context there. I hope it will make more > > sense if you read section 1.1 of Documentation/uprobes.txt (patch #6). > > Or look at get_insn_slot() in kprobes, and understand that we're trying > > to do something similar in uprobes, where the instruction copies have to > > reside in the user address space of the probed process. > > That's not the point, changelogs shoulnd not be this cryptic. They > should be stand alone and descriptive of what, why and how. Point taken. > > If you can't be bothered writing such for something you want reviewed > for inclusion then I might not be bothered looking at them at all. > We appreciate your persistence wrt this patch set. :-} Jim From skrb at dsgt.com Fri Jan 15 20:36:41 2010 From: skrb at dsgt.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Sat, 16 Jan 2010 04:36:41 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs0dC3os/uxL+5pL7f0+vEo7DlyrXO8Q==?= Message-ID: <201001152036.o0FKaam2005383@mx1.redhat.com> utrace-devel?????????????????????? ??????????2010??1??15-16?????? ?????????????? ??????????2010??1??18-19?????? ?????????????? ??????????2010??1??25-26?????? ?????????????? ??????????2010??1??28-29?????? ?????????????? ??????????3200??/????.???????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????? ??????????????CEO/??????????????????/??????????????/??????????????????????/??????????PMO ????????????????????????????????????????QA???? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc21 at 126.comi.?????????????????????? ii.?????????????????????? iii.?????????????? ivn?????????? n?????????? n?????????????????? 3)???????????????????????????????? 2???????????????? 1)?????????????????????????????????????????? 2)?????????????????? 3)?????????????????????? 4)?????????????????? 5)?????????????????? 6)?????????????????????????????? 3?????????????????????????????? 1)WBS?????????????????? 2)WBS?????????????? 3)PBS??WBS??OBS??RBS?????????????? 4)?????????????????? 5)?????????????????????? 6)PERT???????? 7)???????????????????? n?????????? nl???????????????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? 8)???????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? l???????????????????????????????? lrom jkenisto at us.ibm.com Fri Jan 15 21:07:14 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 13:07:14 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263546175.4244.342.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> Message-ID: <1263589634.5007.34.camel@localhost.localdomain> On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote: > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > +Instruction copies to be single-stepped are stored in a per-process > > +"single-step out of line (XOL) area," which is a little VM area > > +created by Uprobes in each probed process's address space. > > I think tinkering with the probed process's address space is a no-no. > Have you ran this by the linux mm folks? Sort of. Back in 2007 (!), we were getting ready to post uprobes (which was then essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and waiting for utrace to get pulled back into the -mm tree. (It turned out to be a long wait.) I emailed Andrew Morton, inquiring about the prospects for utrace and giving him a preview of utrace-based uprobes. He expressed openness to the idea of allocating a piece of the user address space for the XOL area, a la the vdso page. With advice and review from Dave Hansen, we implemented an XOL page, set up for every process (probed or not) along the same lines as the vdso page. About that time, Roland McGrath suggested using do_mmap_pgoff() to create a separate vma on demand. This was the seed of the current implementation. It had the advantages of being architecture-independent, affecting only probed processes, and allowing the allocation of more XOL slots. (Uprobes can make do with a fixed number of XOL slots -- allowing one probepoint to steal another's slot -- but it isn't pretty.) As I recall, Dave preferred the other idea (1 XOL page for every process, probed or not) -- mostly because he didn't like the idea of a new vma popping into existence when the process gets probed -- but was OK with us going ahead with Roland's idea. (I'm not a VM guy; pardon any imprecision in my language.) Jim > I'd be inclined to NAK this > straight out. > From jkenisto at us.ibm.com Fri Jan 15 21:19:31 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 13:19:31 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263549014.4244.374.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> Message-ID: <1263590371.5007.44.camel@localhost.localdomain> On Fri, 2010-01-15 at 10:50 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 15:08 +0530, Ananth N Mavinakayanahalli wrote: > > On Fri, Jan 15, 2010 at 10:03:48AM +0100, Peter Zijlstra wrote: > > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > > > > > discussed elsewhere. > > > > > > Thanks for the pointer... > > > > :-) > > > > Peter, > > I think Jim was referring to > > http://sources.redhat.com/ml/systemtap/2007-q1/msg00571.html > > That's a 2007 email from some obscure list... that's hardly something > that can be referenced to without link. Actually, I was referring to this http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01120.html from earlier (Monday) in this same discussion. But I'll be sure to include pointers in the future. For more thoughts on this approach, see http://sourceware.org/bugzilla/show_bug.cgi?id=5509 (And no, I don't expect you to have seen that before. :-)) Most of the troublesome issues mentioned in that enhancement request have since been resolved. Jim From peterz at infradead.org Fri Jan 15 21:49:52 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 15 Jan 2010 22:49:52 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263589634.5007.34.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> <1263589634.5007.34.camel@localhost.localdomain> Message-ID: <1263592192.4244.488.camel@laptop> On Fri, 2010-01-15 at 13:07 -0800, Jim Keniston wrote: > On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote: > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > > > +Instruction copies to be single-stepped are stored in a per-process > > > +"single-step out of line (XOL) area," which is a little VM area > > > +created by Uprobes in each probed process's address space. > > > > I think tinkering with the probed process's address space is a no-no. > > Have you ran this by the linux mm folks? > > Sort of. > > Back in 2007 (!), we were getting ready to post uprobes (which was then > essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and > waiting for utrace to get pulled back into the -mm tree. (It turned out > to be a long wait.) I emailed Andrew Morton, inquiring about the > prospects for utrace and giving him a preview of utrace-based uprobes. > He expressed openness to the idea of allocating a piece of the user > address space for the XOL area, a la the vdso page. > > With advice and review from Dave Hansen, we implemented an XOL page, set > up for every process (probed or not) along the same lines as the vdso > page. > > About that time, Roland McGrath suggested using do_mmap_pgoff() to > create a separate vma on demand. This was the seed of the current > implementation. It had the advantages of being > architecture-independent, affecting only probed processes, and allowing > the allocation of more XOL slots. (Uprobes can make do with a fixed > number of XOL slots -- allowing one probepoint to steal another's slot > -- but it isn't pretty.) > > As I recall, Dave preferred the other idea (1 XOL page for every > process, probed or not) -- mostly because he didn't like the idea of a > new vma popping into existence when the process gets probed -- but was > OK with us going ahead with Roland's idea. Well, I think its all very gross, I would really like people to try and 'emulate' or plain execute those original instructions from kernel space. As to the privileged instructions, I think qemu/kvm like projects should have pretty much all of that covered. Nor do I think we need utrace at all to make user space probes useful. Even stronger, I think the focus on utrace made you get some fundamentals wrong. Its not mainly about task state, but like said, its about text mappings, which is something utrace knows nothing about. That is not to say you cannot build a useful interface from uprobes and utrace, but its not at all required or natural. From jkenisto at us.ibm.com Fri Jan 15 22:27:51 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 14:27:51 -0800 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263554284.4244.396.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115110547.GB3660@in.ibm.com> <1263553955.4244.393.camel@laptop> <1263554284.4244.396.camel@laptop> Message-ID: <1263594471.5007.57.camel@localhost.localdomain> On Fri, 2010-01-15 at 12:18 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 12:12 +0100, Peter Zijlstra wrote: ... > > > > Adding the probe uses the fact that (most) executable mappings are > > MAP_PRIVATE and CoWs a private copy of the page with the modified ins, > > right? We've just used access_process_vm() to insert the breakpoint instruction. (If there are situations where that's not appropriate, please advise.) > > Does it clean up the CoW'ed page on removing the probe? If I understand your question, the answer is no. We make no attempt to reclaim COW'ed pages, even after all the probes have been removed. In fact, once the first probe is hit and the XOL vma is created, the XOL vma hangs around for the life of the process. > Does that > account for userspace having made other changes in between installing > and removing the probe (for PROT_WRITE mappings obviously)? We don't attempt the aforementioned cleanup, so I think the answer is "N/A." Jim From jkenisto at us.ibm.com Fri Jan 15 23:11:25 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 15:11:25 -0800 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <20100115142007.GA1628@linux.vnet.ibm.com> References: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> <20100115142007.GA1628@linux.vnet.ibm.com> Message-ID: <1263597085.5007.82.camel@localhost.localdomain> On Fri, 2010-01-15 at 19:50 +0530, Srikar Dronamraju wrote: > > > > > > Furthermore it requires stopping and resuming tasks and nonsense like > > > > that, that's unwanted in many cases, just run stuff from the trap site > > > > and you're done. > > > > > > I don't know what you mean exactly. A trap already stopped task. > > > utrace merely allows various clients to inspect/manipulate the state > > > of the task at that moment. It does not add any context switches or > > > spurious stop/resumue operations. > > > > Srikar seemed to suggest it needed stop/resume. > > > > If process traps, We dont need to stop/resume other threads. > uprobes needs threads to quiesce when inserting/deleting the breakpoint. > Years ago, we had pre-utrace versions of uprobes where the uprobes breakpoint-handler code was dispatched from the die_notifier, before the int3 turned into a SIGTRAP. I believe that's what Peter is recommending. On my old Pentium M... - a pre-utrace uprobe hit cost about 1 usec; - a utrace-based uprobe hit cost about 3 usec; - and an unboosted kprobe hit cost 0.57 usec. So yeah, learning about the int3 via utrace after the SIGTRAP gets created adds some overhead to uprobes. But as previously discussed in this thread -- e.g., http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/02969.html -- there are ways to avoid the 2nd (single-step) trap, which should cut overhead in half. Jim From jkenisto at us.ibm.com Fri Jan 15 23:44:45 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 15:44:45 -0800 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263553955.4244.393.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115110547.GB3660@in.ibm.com> <1263553955.4244.393.camel@laptop> Message-ID: <1263599085.5007.88.camel@localhost.localdomain> On Fri, 2010-01-15 at 12:12 +0100, Peter Zijlstra wrote: ... > > Adding the probe uses the fact that (most) executable mappings are > MAP_PRIVATE and CoWs a private copy of the page with the modified ins, > right? > > What does it do for MAP_SHARED|MAP_EXECUTABLE sections -- simply fail to > add the probe? If the vma containing the instruction to be probed has the VM_EXEC flag set (and it's not in the XOL area) we go ahead and try to probe it. I'm not familar with the implications of MAP_SHARED|MAP_EXECUTABLE -- how you would get such a combo, or what access_process_vm() would do with it. Jim From jkenisto at us.ibm.com Sat Jan 16 00:58:23 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 15 Jan 2010 16:58:23 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263592192.4244.488.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> <1263589634.5007.34.camel@localhost.localdomain> <1263592192.4244.488.camel@laptop> Message-ID: <1263603503.5007.134.camel@localhost.localdomain> On Fri, 2010-01-15 at 22:49 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-15 at 13:07 -0800, Jim Keniston wrote: > > On Fri, 2010-01-15 at 10:02 +0100, Peter Zijlstra wrote: > > > On Thu, 2010-01-14 at 11:46 -0800, Jim Keniston wrote: > > > > > > > > +Instruction copies to be single-stepped are stored in a per-process > > > > +"single-step out of line (XOL) area," which is a little VM area > > > > +created by Uprobes in each probed process's address space. > > > > > > I think tinkering with the probed process's address space is a no-no. > > > Have you ran this by the linux mm folks? > > > > Sort of. > > > > Back in 2007 (!), we were getting ready to post uprobes (which was then > > essentially uprobes+xol+upb) to LKML, pondering XOL alternatives and > > waiting for utrace to get pulled back into the -mm tree. (It turned out > > to be a long wait.) I emailed Andrew Morton, inquiring about the > > prospects for utrace and giving him a preview of utrace-based uprobes. > > He expressed openness to the idea of allocating a piece of the user > > address space for the XOL area, a la the vdso page. > > > > With advice and review from Dave Hansen, we implemented an XOL page, set > > up for every process (probed or not) along the same lines as the vdso > > page. > > > > About that time, Roland McGrath suggested using do_mmap_pgoff() to > > create a separate vma on demand. This was the seed of the current > > implementation. It had the advantages of being > > architecture-independent, affecting only probed processes, and allowing > > the allocation of more XOL slots. (Uprobes can make do with a fixed > > number of XOL slots -- allowing one probepoint to steal another's slot > > -- but it isn't pretty.) > > > > As I recall, Dave preferred the other idea (1 XOL page for every > > process, probed or not) -- mostly because he didn't like the idea of a > > new vma popping into existence when the process gets probed -- but was > > OK with us going ahead with Roland's idea. > > Well, I think its all very gross, I would really like people to try and > 'emulate' or plain execute those original instructions from kernel > space. > > As to the privileged instructions, I think qemu/kvm like projects should > have pretty much all of that covered. I hear (er, read) you. Emulation may turn out to be the answer for some architectures. But here are some things to keep in mind about the various approaches: 1. Single-stepping inline is easiest: you need to know very little about the instruction set you're probing. But it's inadequate for multithreaded apps. 2. Single-stepping out of line solves the multithreading issue (as do #3 and #4), but requires more knowledge of the instruction set. (In particular, calls, jumps, and returns need special care; as do rip-relative instructions in x86_64.) I count 9 architectures that support kprobes. I think most of these do SSOL. 3. "Boosted" probes (where an appended jump instruction removes the need for the single-step trap on many instructions) require even more knowledge of the instruction set, and like SSOL, require XOL slots. Right now, as far as I know, x86 is the only architecture with boosted kprobes. 4. Emulation removes the need for the XOL area, but requires pretty much total knowledge of the instruction set. It's also a performance win for architectures that can't do #3. I see kvm implemented on 4 architectures (ia64, powerpc, s390, x86). Coincidentally, those are the architectures to which uprobes (old uprobes, with ubp and xol bundled in) has already been ported (though Intel hasn't been maintaining their ia64 port). So it sort of comes down to how objectionable the XOL vma (or page) really is. Regarding your suggestion about executing the probed instruction in the kernel, how widely do you think that can be applied: which architectures? how much of the instruction set? > > Nor do I think we need utrace at all to make user space probes useful. > Even stronger, I think the focus on utrace made you get some > fundamentals wrong. Its not mainly about task state, but like said, its > about text mappings, which is something utrace knows nothing about. I think that's a useful insight. As mentioned, long ago we offered up a version of uprobes where probes were per-executable rather than per-process. The feedback from LKML was, in no uncertain terms, that they should be per-process, and use access_process_vm(). Of course -- as we then argued -- sometimes you want to probe a process from the very start, so the SystemTap folks had to invent the task-finder to allow that. > > That is not to say you cannot build a useful interface from uprobes and > utrace, but its not at all required or natural. > Thanks again for your advice and ideas. Jim From info at clapfilmesnewsletter.com Sat Jan 16 05:35:21 2010 From: info at clapfilmesnewsletter.com (Clap Filmes) Date: Sat, 16 Jan 2010 05:35:21 +0000 Subject: =?UTF-8?Q?O_La=C3=A7o_Branco,_um_filme_de_Michael_Haneke?= Message-ID: Se n?o visualizar esta p?gina correctamente, clique aqui Para garantir que recebe sempre os nossos emails, adicione-nos ? sua safe-list ! O LA?O BRANCO DE MICHAEL HANEKE ESTREIA A 14 DE JANEIRO DO REALIZADOR DO FILME A PIANISTA ?Espl?ndido? LE MONDE ?Misturando austeridade com humor cortante, o cineasta austr?aco faz de uma aldeia alem? um observat?rio do mal? LES INROCKUPTIBLES ?Imaculadamente concebido e totalmente absorvente? VARIETY ?Trabalho cinematogr?fico soberbo? THE HOLLYWOOD REPORTER NO DIA 18 DE JANEIRO, ?S 18H30, NA FNAC DO CHIADO Conversa com Fernanda C?ncio (Jornalista) e Nuno Artur Silva (ficcionista) moderada por Jo?o Lopes (jornalista e cr?tico de cinema). VEJA O TRAILER DO FILME AQUI xxxxMAIS INFORMA??O EM http://www.atalantafilmes.pt -- Para RE-ENVIAR / To FORWARD - http://www.clapfilmesnewsletter.com/phplist/?p=forward&uid=8796d6f78d5efbb8958965a0e70ab9c8&mid=5 Para REMOVER / To REMOVE - http://www.clapfilmesnewsletter.com/phplist/?p=unsubscribe&uid=8796d6f78d5efbb8958965a0e70ab9c8 Para MODIFICAR / To MODIFY - http://www.clapfilmesnewsletter.com/phplist/?p=preferences&uid=8796d6f78d5efbb8958965a0e70ab9c8 -- Powered by PHPlist, www.phplist.com -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: powerphplist.png Type: image/png Size: 2408 bytes Desc: not available URL: From peterz at infradead.org Sat Jan 16 10:04:09 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Sat, 16 Jan 2010 11:04:09 +0100 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263599085.5007.88.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <20100115102645.GA22640@linux.vnet.ibm.com> <1263551607.4244.379.camel@laptop> <20100115110547.GB3660@in.ibm.com> <1263553955.4244.393.camel@laptop> <1263599085.5007.88.camel@localhost.localdomain> Message-ID: <1263636249.4244.525.camel@laptop> On Fri, 2010-01-15 at 15:44 -0800, Jim Keniston wrote: > On Fri, 2010-01-15 at 12:12 +0100, Peter Zijlstra wrote: > .... > > > > Adding the probe uses the fact that (most) executable mappings are > > MAP_PRIVATE and CoWs a private copy of the page with the modified ins, > > right? > > > > What does it do for MAP_SHARED|MAP_EXECUTABLE sections -- simply fail to > > add the probe? > > If the vma containing the instruction to be probed has the VM_EXEC flag > set (and it's not in the XOL area) we go ahead and try to probe it. I'm > not familar with the implications of MAP_SHARED|MAP_EXECUTABLE -- how > you would get such a combo, or what access_process_vm() would do with > it. I'm not sure how you'd get one, the user has to explicitly create one I think, regular loaders don't create such things, but maybe JITs do. The problem is that for MAP_SHARED you cannot CoW the page, you have to modify the original page, which might get written back into a file if its file based, not something you'd want to have happen I guess. From peterz at infradead.org Sat Jan 16 10:33:17 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Sat, 16 Jan 2010 11:33:17 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263603503.5007.134.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> <1263589634.5007.34.camel@localhost.localdomain> <1263592192.4244.488.camel@laptop> <1263603503.5007.134.camel@localhost.localdomain> Message-ID: <1263637997.4244.555.camel@laptop> On Fri, 2010-01-15 at 16:58 -0800, Jim Keniston wrote: > But here are some things to keep in mind about the > various approaches: > > 1. Single-stepping inline is easiest: you need to know very little about > the instruction set you're probing. But it's inadequate for > multithreaded apps. > 2. Single-stepping out of line solves the multithreading issue (as do #3 > and #4), but requires more knowledge of the instruction set. (In > particular, calls, jumps, and returns need special care; as do > rip-relative instructions in x86_64.) I count 9 architectures that > support kprobes. I think most of these do SSOL. > 3. "Boosted" probes (where an appended jump instruction removes the need > for the single-step trap on many instructions) require even more > knowledge of the instruction set, and like SSOL, require XOL slots. > Right now, as far as I know, x86 is the only architecture with boosted > kprobes. > 4. Emulation removes the need for the XOL area, but requires pretty much > total knowledge of the instruction set. It's also a performance win for > architectures that can't do #3. I see kvm implemented on 4 > architectures (ia64, powerpc, s390, x86). Coincidentally, those are the > architectures to which uprobes (old uprobes, with ubp and xol bundled > in) has already been ported (though Intel hasn't been maintaining their > ia64 port). Right, so I was thinking a combination of 4 and execute from kernel space would be feasible. I would think most regular instructions are runnable from kernel space given that we provide the proper pt_regs environment. Although I just realize we need to fully emulate the address computation step for all memory writes, otherwise a wild userspace pointer might end up writing in your kernel image. Also, don't we already need full knowledge of the instruction set in order to decode the instruction stream and find instruction boundaries. > So it sort of comes down to how objectionable the XOL vma (or page) > really is. Well, I really hate touching the address space, and the fact that it permutates the probed application in very obvious ways. FWIW, I think the VDSO is ugly too and would have objected to it were it proposed now -- there's much better solutions for that (/sys/lib/libkernel.so comes to mind). > Regarding your suggestion about executing the probed instruction in the > kernel, how widely do you think that can be applied: which > architectures? how much of the instruction set? I only know some of x86, I really couldn't tell for any other arch. From fche at redhat.com Sat Jan 16 15:50:48 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 16 Jan 2010 10:50:48 -0500 Subject: [RFC] [PATCH 4/7] Uprobes Implementation In-Reply-To: <1263597085.5007.82.camel@localhost.localdomain> (Jim Keniston's message of "Fri, 15 Jan 2010 15:11:25 -0800") References: <20100111122553.22050.46895.sendpatchset@srikar.in.ibm.com> <1263467394.4244.291.camel@laptop> <1263509380.4875.35.camel@localhost.localdomain> <1263546632.4244.352.camel@laptop> <1263548124.4244.358.camel@laptop> <20100115131037.GP4822@redhat.com> <1263561930.4244.417.camel@laptop> <20100115133825.GQ4822@redhat.com> <1263563276.4244.426.camel@laptop> <20100115142007.GA1628@linux.vnet.ibm.com> <1263597085.5007.82.camel@localhost.localdomain> Message-ID: Jim Keniston writes: > [...] > Years ago, we had pre-utrace versions of uprobes where the uprobes > breakpoint-handler code was dispatched from the die_notifier, before the > int3 turned into a SIGTRAP. I believe that's what Peter is > recommending. On my old Pentium M... > - a pre-utrace uprobe hit cost about 1 usec; > - a utrace-based uprobe hit cost about 3 usec; > [...] > So yeah, learning about the int3 via utrace after the SIGTRAP gets > created adds some overhead to uprobes. [...] Was this test comparing likewise fruit? For example, did it account for factors where other processes were gdb-int3-instrumented or with lots of kprobes active? Differently multithreaded? Demultiplexing probes amongst multiple processes? (It's counterintuitive that the utrace/kernel int3->sigtrap dispatching code alone should cause thousands of extra instructions.) - FChE From vbie at qerg.com Sat Jan 16 19:46:18 2010 From: vbie at qerg.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Sat, 16 Jan 2010 19:46:18 -0000 Subject: =?GB2312?B?RDg6dXRyYWNlLWRldmVs16jStVBQVNPrRXhjZWzTptPD?= Message-ID: <201001161945.o0GJjrxS021926@mx1.redhat.com> utrace-devel??????????????????????PPT+Excel??????????v3.0 ??????????2010??1??22-23?? ???? ??????????2010??1??29-30?? ???? ?? ????2600??/?????????????????????????????????? ?????????????????????????????????????????????????????????????????????? ??????????020-80560638??020-85917945??????????????????????????????????chinammc2010 at 126.com?????? -------------------------------------------------------------------------------------- ????????: ?????????????????????????? ????????Excel?????????????? PPT??????????????????PPT?????????????????????????? ????????????????BladeOffice???????????????????????????????????? ????????????????????????Excle-PPT????????????????????BladeOffice???????????????????????????? ??????BladeOffice?????????????????????????????????????? ------------------------------------------------------------------------------------------ ????????: ???????????????????????? ?????????????????????????? 1.?????????????????????????????????????????????????????? 2.????????????????????????500???????????????????????????????? 3.???????????????????????????????????????????????????????????????????? 4.?????????????????????????????????????????????????????????????? ?????????????????????? 1.???????????????????????????????????? 2.????????????????????????????????????150?????????????????? 3.???????????????????????????????????????????????????????? 3-D ????*???????????? 4.???????????????????????????????? ???????????????????????????? 1.?????????????????????????????? 2.??????????/?????????????????????? 3.?????????????????????????????????? 4.????????????????40M??????????6M 5.?????????????????? ????????????????SmartArt???????? 1.?????????????????????????????? 2.SmartART???????????? 3.??????????????SmartART?????? 4.????????SmartART??????SmartArt??????????????????????????????????SmartArt ???????????????????????????????? 1.?????????????????????????????????????? 2.?????????????????? 3.????????????????????5???????? 4.???????????????????? 5.?????????????????????????????? 6.?????????????????????????????????? ?????????????????????????????????????? 1.?????????????????????? 2.?????????????????????????????????????? 3.???????????????????? 4.???????????????? 5.?????????????? 6.?????????? 7.??????????????????PPT?? 8.???????????????????????? ????????????color see see???????????????? 1.????????????Powerpoint?????????? 2.??????CI?????????????????? 3.?????????????????? 4.?????????????????? 5.?????????????????????????????? 6.?????????????????????? ??????Excel?????????????? ?????????????????? 1.?????????????????????????????? 2.???????????????????????? 3.?????????????????????????????? ???????????? 1.????????Bladeoffice?????????????????????? 2.?????????????? 3.?????????????????????????????? 4.????????????????????????????????Excelxcel2007???????????? 4.?????????????????????????????? 5.?????????????????????????????? ?????????????????????? 1.?????????????? 2.?????????????? 3.???????????? 4.???????????? 5.?????????????????????????????????????????? 6.???????????????????? a)?????????? b)?????????? c)?????? dxcelrom alip at exherbo.org Sat Jan 16 21:51:24 2010 From: alip at exherbo.org (Ali Polatel) Date: Sat, 16 Jan 2010 23:51:24 +0200 Subject: PTRACE_SYSCALL_ENTRY/EXIT Message-ID: <20100116215124.GA963@harikalardiyari> Hello, Do you guys plan to add ptrace requests possibly named PTRACE_SYSCALL_ENTRY and PTRACE_SYSCALL_EXIT at some point akin to FreeBSD's? PT_TO_SCE and PT_TO_SCX? ?: http://www.freebsd.org/cgi/man.cgi?query=ptrace&apropos=0&sektion=0&manpath=FreeBSD+8.0-RELEASE&format=ascii -- Regards, Ali Polatel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From jkenisto at us.ibm.com Sat Jan 16 23:48:33 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Sat, 16 Jan 2010 18:48:33 -0500 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) Message-ID: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> Quoting Peter Zijlstra : > On Fri, 2010-01-15 at 16:58 -0800, Jim Keniston wrote: >> But here are some things to keep in mind about the >> various approaches: >> >> 1. Single-stepping inline is easiest: you need to know very little about >> the instruction set you're probing. But it's inadequate for >> multithreaded apps. >> 2. Single-stepping out of line solves the multithreading issue (as do #3 >> and #4), but requires more knowledge of the instruction set. (In >> particular, calls, jumps, and returns need special care; as do >> rip-relative instructions in x86_64.) I count 9 architectures that >> support kprobes. I think most of these do SSOL. >> 3. "Boosted" probes (where an appended jump instruction removes the need >> for the single-step trap on many instructions) require even more >> knowledge of the instruction set, and like SSOL, require XOL slots. >> Right now, as far as I know, x86 is the only architecture with boosted >> kprobes. >> 4. Emulation removes the need for the XOL area, but requires pretty much >> total knowledge of the instruction set. It's also a performance win for >> architectures that can't do #3. I see kvm implemented on 4 >> architectures (ia64, powerpc, s390, x86). Coincidentally, those are the >> architectures to which uprobes (old uprobes, with ubp and xol bundled >> in) has already been ported (though Intel hasn't been maintaining their >> ia64 port). > > Right, so I was thinking a combination of 4 and execute from kernel > space would be feasible. I would think most regular instructions are > runnable from kernel space given that we provide the proper pt_regs > environment. > > Although I just realize we need to fully emulate the address computation > step for all memory writes, otherwise a wild userspace pointer might end > up writing in your kernel image. Correct. > > Also, don't we already need full knowledge of the instruction set in > order to decode the instruction stream and find instruction boundaries. Not really. For #3 (boosting), you need to know everything for #2, plus be able to compute the length of each instruction -- which we can now do for x86. To emulate an instruction (#4), you need to replicate what it does, side-effects and all. The x86 instruction set seems to be adding new floating-point instructions all the time, and I bet even Masami doesn't know what they all do, but so far, they all seem to adhere to the instruction-length rules encoded in Masami's instruction decoder. As you may have noted before, I think FP would be a special problem for your approach. I'm not sure how folks would react to the idea of executing FP instructions in kernel space. But emulating them is also tough. There's an IEEE FP emulation package somewhere in one of the Linux arch directories, but I'm not sure how precise it is, and dropping even 1 bit of precision is unacceptable for many applications, since such errors tend to grow in complex computations employing many FP instructions. Jim From bdonlan at gmail.com Sun Jan 17 00:12:28 2010 From: bdonlan at gmail.com (Bryan Donlan) Date: Sat, 16 Jan 2010 19:12:28 -0500 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263603503.5007.134.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> <1263589634.5007.34.camel@localhost.localdomain> <1263592192.4244.488.camel@laptop> <1263603503.5007.134.camel@localhost.localdomain> Message-ID: <3e8340491001161612x11873abdi4b74e47309e5bdfd@mail.gmail.com> On Fri, Jan 15, 2010 at 7:58 PM, Jim Keniston wrote: > 4. Emulation removes the need for the XOL area, but requires pretty much > total knowledge of the instruction set. ?It's also a performance win for > architectures that can't do #3. ?I see kvm implemented on 4 > architectures (ia64, powerpc, s390, x86). ?Coincidentally, those are the > architectures to which uprobes (old uprobes, with ubp and xol bundled > in) has already been ported (though Intel hasn't been maintaining their > ia64 port). ?So it sort of comes down to how objectionable the XOL vma > (or page) really is. On x86 at least, wouldn't one option to be to run the instruction to be emulated in CPL ('ring') 2, from a XOL page above the user-kernel split, not accessible to userspace at CPL 3? Linux hasn't traditionally used anything other than CPL 0 and CPL 3 (plus CPL 1 on Xen), but it would seem to avoid many of the problems here - it's invisible to normal userspace code and so doesn't pollute userspace memory maps with kernel-private stuff, but since it's running at a higher CPL than the kernel, we can still protect kernel memory and protect against privileged instructions. From news at maisservicos.com Sun Jan 17 14:09:37 2010 From: news at maisservicos.com (Casino Rewards) Date: Sun, 17 Jan 2010 14:09:37 +0000 Subject: Temos 1250 Euros a sua espera!!! Message-ID: <20100117140936.1C0C523B0B@server7.nortenet.pt> N?o pode ver as imagens? A aten??o de: *So para novos jogadores Sua privacidade ? muito importante para n?s e tenhemos a uma pol?tica rigorosa do e-mail. Se voc? sentir que foram enviados esta comunica??o por erro e n?o deseja mais receb?-lo. Denunciar qualquer abuso do e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From avi at redhat.com Sun Jan 17 14:37:07 2010 From: avi at redhat.com (Avi Kivity) Date: Sun, 17 Jan 2010 16:37:07 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263603503.5007.134.camel@localhost.localdomain> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> <1263589634.5007.34.camel@localhost.localdomain> <1263592192.4244.488.camel@laptop> <1263603503.5007.134.camel@localhost.localdomain> Message-ID: <4B532093.5080600@redhat.com> On 01/16/2010 02:58 AM, Jim Keniston wrote: > > I hear (er, read) you. Emulation may turn out to be the answer for some > architectures. But here are some things to keep in mind about the > various approaches: > > 1. Single-stepping inline is easiest: you need to know very little about > the instruction set you're probing. But it's inadequate for > multithreaded apps. > 2. Single-stepping out of line solves the multithreading issue (as do #3 > and #4), but requires more knowledge of the instruction set. (In > particular, calls, jumps, and returns need special care; as do > rip-relative instructions in x86_64.) I count 9 architectures that > support kprobes. I think most of these do SSOL. > 3. "Boosted" probes (where an appended jump instruction removes the need > for the single-step trap on many instructions) require even more > knowledge of the instruction set, and like SSOL, require XOL slots. > Right now, as far as I know, x86 is the only architecture with boosted > kprobes. > 4. Emulation removes the need for the XOL area, but requires pretty much > total knowledge of the instruction set. It's also a performance win for > architectures that can't do #3. I see kvm implemented on 4 > architectures (ia64, powerpc, s390, x86). Coincidentally, those are the > architectures to which uprobes (old uprobes, with ubp and xol bundled > in) has already been ported (though Intel hasn't been maintaining their > ia64 port). So it sort of comes down to how objectionable the XOL vma > (or page) really is. > The kvm emulator emulates only a subset of the x86 instruction set (basically mmio instructions and commonly-used page-table manipulation instructions, as well as some privileged instructions). It would take a lot of work to expand it to be completely generic; and even then it will fail if userspace uses an instruction set extension the kernel is not aware of. To me, boosted probes with a fallback to single-stepping seems to be the better option by far. -- error compiling committee.c: too many arguments to function From avi at redhat.com Sun Jan 17 14:39:56 2010 From: avi at redhat.com (Avi Kivity) Date: Sun, 17 Jan 2010 16:39:56 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263549014.4244.374.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> Message-ID: <4B53213C.9050303@redhat.com> On 01/15/2010 11:50 AM, Peter Zijlstra wrote: > As previously stated, I think poking at a process's address space is an > utter no-go. > Why not reserve an address space range for this, somewhere near the top of memory? It doesn't have to be populated if it isn't used. -- error compiling committee.c: too many arguments to function From peterz at infradead.org Sun Jan 17 14:52:19 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Sun, 17 Jan 2010 15:52:19 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B53213C.9050303@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> Message-ID: <1263739939.557.20938.camel@twins> On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote: > On 01/15/2010 11:50 AM, Peter Zijlstra wrote: > > As previously stated, I think poking at a process's address space is an > > utter no-go. > > > > Why not reserve an address space range for this, somewhere near the top > of memory? It doesn't have to be populated if it isn't used. Because I think poking at a process's address space like that is gross. Also, if its fixed size you're imposing artificial limits on the number of possible probes. From avi at redhat.com Sun Jan 17 14:56:08 2010 From: avi at redhat.com (Avi Kivity) Date: Sun, 17 Jan 2010 16:56:08 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263739939.557.20938.camel@twins> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> Message-ID: <4B532508.4000806@redhat.com> On 01/17/2010 04:52 PM, Peter Zijlstra wrote: > On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote: > >> On 01/15/2010 11:50 AM, Peter Zijlstra wrote: >> >>> As previously stated, I think poking at a process's address space is an >>> utter no-go. >>> >>> >> Why not reserve an address space range for this, somewhere near the top >> of memory? It doesn't have to be populated if it isn't used. >> > Because I think poking at a process's address space like that is gross. > If it's reserved, it's no longer the process' address space. > Also, if its fixed size you're imposing artificial limits on the number > of possible probes. > Obviously we'll need a limit, a uprobe will also take kernel memory, we can't allow people to exhaust it. -- error compiling committee.c: too many arguments to function From avi at redhat.com Sun Jan 17 14:59:27 2010 From: avi at redhat.com (Avi Kivity) Date: Sun, 17 Jan 2010 16:59:27 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263739939.557.20938.camel@twins> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> Message-ID: <4B5325CF.5000001@redhat.com> On 01/17/2010 04:52 PM, Peter Zijlstra wrote: > On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote: > >> On 01/15/2010 11:50 AM, Peter Zijlstra wrote: >> >>> As previously stated, I think poking at a process's address space is an >>> utter no-go. >>> >>> >> Why not reserve an address space range for this, somewhere near the top >> of memory? It doesn't have to be populated if it isn't used. >> > Because I think poking at a process's address space like that is gross. > Also, if its fixed size you're imposing artificial limits on the number > of possible probes. > btw, an alternative is to require the caller to provide the address space for this. If the caller is in another process, we need to allow it to play with the target's address space (i.e. mmap_process()). I don't think uprobes justifies this by itself, but mmap_process() can be very useful for sandboxing with seccomp. -- error compiling committee.c: too many arguments to function From peterz at infradead.org Sun Jan 17 15:01:46 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Sun, 17 Jan 2010 16:01:46 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B532508.4000806@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B532508.4000806@redhat.com> Message-ID: <1263740506.557.20963.camel@twins> On Sun, 2010-01-17 at 16:56 +0200, Avi Kivity wrote: > On 01/17/2010 04:52 PM, Peter Zijlstra wrote: > > Also, if its fixed size you're imposing artificial limits on the number > > of possible probes. > > > > Obviously we'll need a limit, a uprobe will also take kernel memory, we > can't allow people to exhaust it. Only if its unprivilidged, kernel and root should be able to place as many probes until the machine keels over. From peterz at infradead.org Sun Jan 17 15:03:13 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Sun, 17 Jan 2010 16:03:13 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5325CF.5000001@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> Message-ID: <1263740593.557.20967.camel@twins> On Sun, 2010-01-17 at 16:59 +0200, Avi Kivity wrote: > On 01/17/2010 04:52 PM, Peter Zijlstra wrote: > > On Sun, 2010-01-17 at 16:39 +0200, Avi Kivity wrote: > > > >> On 01/15/2010 11:50 AM, Peter Zijlstra wrote: > >> > >>> As previously stated, I think poking at a process's address space is an > >>> utter no-go. > >>> > >>> > >> Why not reserve an address space range for this, somewhere near the top > >> of memory? It doesn't have to be populated if it isn't used. > >> > > Because I think poking at a process's address space like that is gross. > > Also, if its fixed size you're imposing artificial limits on the number > > of possible probes. > > > > btw, an alternative is to require the caller to provide the address > space for this. If the caller is in another process, we need to allow > it to play with the target's address space (i.e. mmap_process()). I > don't think uprobes justifies this by itself, but mmap_process() can be > very useful for sandboxing with seccomp. mmap_process() sounds utterly gross, one process playing with another process's address space.. yuck! From avi at redhat.com Sun Jan 17 19:33:46 2010 From: avi at redhat.com (Avi Kivity) Date: Sun, 17 Jan 2010 21:33:46 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263740593.557.20967.camel@twins> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> Message-ID: <4B53661A.9090907@redhat.com> On 01/17/2010 05:03 PM, Peter Zijlstra wrote: > >> btw, an alternative is to require the caller to provide the address >> space for this. If the caller is in another process, we need to allow >> it to play with the target's address space (i.e. mmap_process()). I >> don't think uprobes justifies this by itself, but mmap_process() can be >> very useful for sandboxing with seccomp. >> > mmap_process() sounds utterly gross, one process playing with another > process's address space.. yuck! > This is debugging. We're playing with registers, we're playing with the cpu, we're playing with memory contents. Why not the address space as well? For seccomp, this really should be generalized. Run a system call on behalf of another process, but don't let that process do anything to affect it. I think Google is doing something clever with one thread in seccomp mode and another unconstrained, but that's very hacky - you have to stop the constrained thread so it can't interfere with the live one. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. From crg at ewuy.com Sun Jan 17 22:51:14 2010 From: crg at ewuy.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Mon, 18 Jan 2010 06:51:14 +0800 Subject: =?GB2312?B?QTg6dXRyYWNlLWRldmVstPO/zbun06rP+rLfwtQ=?= Message-ID: <201001172251.o0HMp4Ne015075@mx1.redhat.com> utrace-devel????????????????????????? ?????2010?1?15?16?17? ? ? ?????2010?1?22?23?24? ? ? ????????????????????????????! ????????????????????????/????????????????????????? ?----??3900?/?(?????????????????) ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comrom erg at cyus.com Mon Jan 18 05:46:39 2010 From: erg at cyus.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Mon, 18 Jan 2010 13:46:39 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsus/NrLncwO3T67rPzay358/Vueax3A==?= Message-ID: <201001180546.o0I5kUl8011402@mx1.redhat.com> utrace-devel?????????????????????? ??????????2010??1??21-22?? ???? ??????????2010??1??23-24?? ???? ??????????2010??1??30-31?? ???? ???????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ????????????????2800??/?????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.comhomasrom peterz at infradead.org Mon Jan 18 07:23:17 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 08:23:17 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> References: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> Message-ID: <1263799397.4283.2.camel@laptop> On Sat, 2010-01-16 at 18:48 -0500, Jim Keniston wrote: > As you may have noted before, I think FP would be a special problem > for your approach. I'm not sure how folks would react to the idea of > executing FP instructions in kernel space. But emulating them is also > tough. There's an IEEE FP emulation package somewhere in one of the > Linux arch directories, but I'm not sure how precise it is, and > dropping even 1 bit of precision is unacceptable for many > applications, since such errors tend to grow in complex computations > employing many FP instructions. Well, we have kernel space using FP/MMX/SSE like things, its not hard if you really need it, but in this case I think its easier than normal, because we'll just allow it to change the userspace state because that is exactly what we want it to do. From peterz at infradead.org Mon Jan 18 07:37:15 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 08:37:15 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <3e8340491001161612x11873abdi4b74e47309e5bdfd@mail.gmail.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546175.4244.342.camel@laptop> <1263589634.5007.34.camel@localhost.localdomain> <1263592192.4244.488.camel@laptop> <1263603503.5007.134.camel@localhost.localdomain> <3e8340491001161612x11873abdi4b74e47309e5bdfd@mail.gmail.com> Message-ID: <1263800235.4283.10.camel@laptop> On Sat, 2010-01-16 at 19:12 -0500, Bryan Donlan wrote: > On Fri, Jan 15, 2010 at 7:58 PM, Jim Keniston wrote: > > > 4. Emulation removes the need for the XOL area, but requires pretty much > > total knowledge of the instruction set. It's also a performance win for > > architectures that can't do #3. I see kvm implemented on 4 > > architectures (ia64, powerpc, s390, x86). Coincidentally, those are the > > architectures to which uprobes (old uprobes, with ubp and xol bundled > > in) has already been ported (though Intel hasn't been maintaining their > > ia64 port). So it sort of comes down to how objectionable the XOL vma > > (or page) really is. > > On x86 at least, wouldn't one option to be to run the instruction to > be emulated in CPL ('ring') 2, from a XOL page above the user-kernel > split, not accessible to userspace at CPL 3? Linux hasn't > traditionally used anything other than CPL 0 and CPL 3 (plus CPL 1 on > Xen), but it would seem to avoid many of the problems here - it's > invisible to normal userspace code and so doesn't pollute userspace > memory maps with kernel-private stuff, but since it's running at a > higher CPL than the kernel, we can still protect kernel memory and > protect against privileged instructions. Another option is to go play games with the RPL of the user data segments when we load them. But yeah, something like this seems to nicely deal with the protection issues. From peterz at infradead.org Mon Jan 18 07:45:52 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 08:45:52 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B53661A.9090907@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> Message-ID: <1263800752.4283.19.camel@laptop> On Sun, 2010-01-17 at 21:33 +0200, Avi Kivity wrote: > On 01/17/2010 05:03 PM, Peter Zijlstra wrote: > > > >> btw, an alternative is to require the caller to provide the address > >> space for this. If the caller is in another process, we need to allow > >> it to play with the target's address space (i.e. mmap_process()). I > >> don't think uprobes justifies this by itself, but mmap_process() can be > >> very useful for sandboxing with seccomp. > >> > > mmap_process() sounds utterly gross, one process playing with another > > process's address space.. yuck! > > > > This is debugging. We're playing with registers, we're playing with the > cpu, we're playing with memory contents. Why not the address space as well? Because you want thins go to be as transparent as possible in order to avoid heisenbugs. Sure we cannot avoid everything, but we should avoid everything we possibly can. Also, aside of the VDSO, we simply do not force map things into address spaces (and like said before, I think the VDSO stinks for doing that) and I think we don't want to create (more) precedents in this case. From avi at redhat.com Mon Jan 18 11:01:39 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 13:01:39 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263800752.4283.19.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> Message-ID: <4B543F93.3060509@redhat.com> On 01/18/2010 09:45 AM, Peter Zijlstra wrote: > >> This is debugging. We're playing with registers, we're playing with the >> cpu, we're playing with memory contents. Why not the address space as well? >> > Because you want thins go to be as transparent as possible in order to > avoid heisenbugs. Sure we cannot avoid everything, but we should avoid > everything we possibly can. > If we reserve some address space, you don't add any heisenbugs (at least, not any additional ones over emulation). Even if we don't, address space layout randomization means we're not keeping the address space layout constant between runs anyway. > Also, aside of the VDSO, we simply do not force map things into address > spaces (and like said before, I think the VDSO stinks for doing that) > and I think we don't want to create (more) precedents in this case. > You've made it clear that you don't like it, but not why. The kernel already manages the user's address space (except for MAP_FIXED which is unreliable unless you've already reserved the address space). I don't see why adding a vma for debugging is so horrible. -- error compiling committee.c: too many arguments to function From peterz at infradead.org Mon Jan 18 11:44:32 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 12:44:32 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B543F93.3060509@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> Message-ID: <1263815072.4283.305.camel@laptop> On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote: > > You've made it clear that you don't like it, but not why. > > The kernel already manages the user's address space (except for > MAP_FIXED which is unreliable unless you've already reserved the address > space). I don't see why adding a vma for debugging is so horrible. Well, the kernel only does what the user (and loader) tell it through mmap(). Other than that we never (except this VDSO thing) inject vmas, and I see no reason to start doing that now. From peterz at infradead.org Mon Jan 18 11:45:40 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 12:45:40 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B543F93.3060509@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> Message-ID: <1263815140.4283.309.camel@laptop> On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote: > If we reserve some address space, you don't add any heisenbugs (at > least, not any additional ones over emulation). Even if we don't, > address space layout randomization means we're not keeping the address > space layout constant between runs anyway. Well, it still limits the number of probes to the reserved area. If you want more you need to grow the area.. which then changes the state. From avi at redhat.com Mon Jan 18 12:01:00 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 14:01:00 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263815072.4283.305.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> Message-ID: <4B544D7C.2060708@redhat.com> On 01/18/2010 01:44 PM, Peter Zijlstra wrote: > On Mon, 2010-01-18 at 13:01 +0200, Avi Kivity wrote: > >> You've made it clear that you don't like it, but not why. >> >> The kernel already manages the user's address space (except for >> MAP_FIXED which is unreliable unless you've already reserved the address >> space). I don't see why adding a vma for debugging is so horrible. >> > Well, the kernel only does what the user (and loader) tell it through > mmap(). What I meant was that the kernel chooses the addresses (unless you go the MAP_FIXED way). From the user's point of view, there is no change in behaviour: the kernel picks an address. If the constraints have changed (because we reserve a range), that doesn't affect the user. > Other than that we never (except this VDSO thing) inject vmas, > and I see no reason to start doing that now. > Maybe you place no value on uprobes. But people who debug userspace likely will see a reason. -- error compiling committee.c: too many arguments to function From peterz at infradead.org Mon Jan 18 12:06:36 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 13:06:36 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B544D7C.2060708@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> Message-ID: <1263816396.4283.361.camel@laptop> On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote: > > Maybe you place no value on uprobes. But people who debug userspace > likely will see a reason. I do see value in uprobes, I just don't like it mucking about with the address space. Nor does it appear required. From avi at redhat.com Mon Jan 18 12:09:50 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 14:09:50 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263816396.4283.361.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> Message-ID: <4B544F8E.1080603@redhat.com> On 01/18/2010 02:06 PM, Peter Zijlstra wrote: > On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote: > >> Maybe you place no value on uprobes. But people who debug userspace >> likely will see a reason. >> > I do see value in uprobes, I just don't like it mucking about with the > address space. Nor does it appear required. > Well, the alternatives are very unappealing. Emulation and single-stepping are going to be very slow compared to a couple of jumps. -- error compiling committee.c: too many arguments to function From penberg at cs.helsinki.fi Mon Jan 18 12:13:25 2010 From: penberg at cs.helsinki.fi (Pekka Enberg) Date: Mon, 18 Jan 2010 14:13:25 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B544F8E.1080603@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> Message-ID: <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> Hi Avi, On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote: >>> Maybe you place no value on uprobes. ?But people who debug userspace >>> likely will see a reason. On 01/18/2010 02:06 PM, Peter Zijlstra wrote: >> I do see value in uprobes, I just don't like it mucking about with the >> address space. Nor does it appear required. On Mon, Jan 18, 2010 at 2:09 PM, Avi Kivity wrote: > Well, the alternatives are very unappealing. ?Emulation and single-stepping > are going to be very slow compared to a couple of jumps. So how big chunks of the address space are we talking here for uprobes? From peterz at infradead.org Mon Jan 18 12:14:17 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 13:14:17 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B544F8E.1080603@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> Message-ID: <1263816857.4283.381.camel@laptop> On Mon, 2010-01-18 at 14:09 +0200, Avi Kivity wrote: > On 01/18/2010 02:06 PM, Peter Zijlstra wrote: > > On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote: > > > >> Maybe you place no value on uprobes. But people who debug userspace > >> likely will see a reason. > >> > > I do see value in uprobes, I just don't like it mucking about with the > > address space. Nor does it appear required. > > > > Well, the alternatives are very unappealing. Emulation and > single-stepping are going to be very slow compared to a couple of jumps. With CPL2 or RPL on user segments the protection issue seems to be manageable for running the instructions from kernel space. From avi at redhat.com Mon Jan 18 12:17:10 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 14:17:10 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> Message-ID: <4B545146.3080001@redhat.com> On 01/18/2010 02:13 PM, Pekka Enberg wrote: > So how big chunks of the address space are we talking here for uprobes? > That's for the authors to answer, but at a guess, 32 bytes per probe (largest x86 instruction is 15 bytes), so 32 MB will give you a million probes. That's a piece of cake for x86-64, probably harder to justify for i386. -- error compiling committee.c: too many arguments to function From penberg at cs.helsinki.fi Mon Jan 18 12:24:19 2010 From: penberg at cs.helsinki.fi (Pekka Enberg) Date: Mon, 18 Jan 2010 14:24:19 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B545146.3080001@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> Message-ID: <84144f021001180424h54ce7970ra7dda2ff8a3be277@mail.gmail.com> On 01/18/2010 02:13 PM, Pekka Enberg wrote: >> So how big chunks of the address space are we talking here for uprobes? On Mon, Jan 18, 2010 at 2:17 PM, Avi Kivity wrote: > That's for the authors to answer, but at a guess, 32 bytes per probe > (largest x86 instruction is 15 bytes), so 32 MB will give you a million > probes. ?That's a piece of cake for x86-64, probably harder to justify for > i386. Yup, it's 32-bit that I worry about. From peterz at infradead.org Mon Jan 18 12:24:05 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 13:24:05 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B545146.3080001@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> Message-ID: <1263817445.4283.408.camel@laptop> On Mon, 2010-01-18 at 14:17 +0200, Avi Kivity wrote: > On 01/18/2010 02:13 PM, Pekka Enberg wrote: > > So how big chunks of the address space are we talking here for uprobes? > > > > That's for the authors to answer, but at a guess, 32 bytes per probe > (largest x86 instruction is 15 bytes), so 32 MB will give you a million > probes. That's a piece of cake for x86-64, probably harder to justify > for i386. Yeah, I'm aware of people turning off address space randomization to gain more virtual space on i386, I'm pretty sure those folks aren't going to be happy if we shrink it. Let alone them trying to probe their app. From avi at redhat.com Mon Jan 18 12:37:19 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 14:37:19 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263816857.4283.381.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <1263816857.4283.381.camel@laptop> Message-ID: <4B5455FF.7010409@redhat.com> On 01/18/2010 02:14 PM, Peter Zijlstra wrote: > >> Well, the alternatives are very unappealing. Emulation and >> single-stepping are going to be very slow compared to a couple of jumps. >> > With CPL2 or RPL on user segments the protection issue seems to be > manageable for running the instructions from kernel space. > CPL2 gives unrestricted access to the kernel address space; and RPL does not affect page level protection. Segment limits don't work on x86-64. But perhaps I missed something - these things are tricky. It should be possible to translate the instruction into an address space check, followed by the action, but that's still slower due to privilege level switches. -- error compiling committee.c: too many arguments to function From srikar at linux.vnet.ibm.com Mon Jan 18 12:44:19 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Mon, 18 Jan 2010 18:14:19 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B545146.3080001@redhat.com> References: <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> Message-ID: <20100118124419.GC1628@linux.vnet.ibm.com> * Avi Kivity [2010-01-18 14:17:10]: > On 01/18/2010 02:13 PM, Pekka Enberg wrote: > >So how big chunks of the address space are we talking here for uprobes? > > That's for the authors to answer, but at a guess, 32 bytes per probe > (largest x86 instruction is 15 bytes), so 32 MB will give you a > million probes. That's a piece of cake for x86-64, probably harder > to justify for i386. On x86, each probe takes 16 bytes. In the current implementation of XOL, the first hit of a breakpoint, requires us to allocate a page. If that page does get full with "active" breakpoints, we expand / add a page. There is a bit map that keeps a check to see if a previously used breakpoint is removed and hence that slot can be reused. By active breakpoints, I refer to those that are inserted, and has been trapped atleast once but not yet removed. Jim did try a few other allocation techniques but those that involved slot stealing did end up having locking. People who did look at that code did advise us to reduce the locking and keep the allocation simple (atleast for the first cut). -- Thanks and Regards Srikar > > -- > error compiling committee.c: too many arguments to function > From penberg at cs.helsinki.fi Mon Jan 18 12:51:06 2010 From: penberg at cs.helsinki.fi (Pekka Enberg) Date: Mon, 18 Jan 2010 14:51:06 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100118124419.GC1628@linux.vnet.ibm.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> Message-ID: <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> On Mon, Jan 18, 2010 at 2:44 PM, Srikar Dronamraju wrote: > * Avi Kivity [2010-01-18 14:17:10]: > >> On 01/18/2010 02:13 PM, Pekka Enberg wrote: >> >So how big chunks of the address space are we talking here for uprobes? >> >> That's for the authors to answer, but at a guess, 32 bytes per probe >> (largest x86 instruction is 15 bytes), so 32 MB will give you a >> million probes. ?That's a piece of cake for x86-64, probably harder >> to justify for i386. > > On x86, each probe takes 16 bytes. And how many probes do we expected to be live at the same time in real-world scenarios? I guess Avi's "one million" is more than enough? From avi at redhat.com Mon Jan 18 12:53:30 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 14:53:30 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> Message-ID: <4B5459CA.9060603@redhat.com> On 01/18/2010 02:51 PM, Pekka Enberg wrote: > > And how many probes do we expected to be live at the same time in > real-world scenarios? I guess Avi's "one million" is more than enough? > I don't think a user will ever come close to a million, but we can expect some inflation from inlined functions (I don't know if uprobes replicates such probes, but if it doesn't, it should). -- error compiling committee.c: too many arguments to function From penberg at cs.helsinki.fi Mon Jan 18 12:57:51 2010 From: penberg at cs.helsinki.fi (Pekka Enberg) Date: Mon, 18 Jan 2010 14:57:51 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5459CA.9060603@redhat.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> Message-ID: <4B545ACF.40203@cs.helsinki.fi> On 01/18/2010 02:51 PM, Pekka Enberg wrote: >> And how many probes do we expected to be live at the same time in >> real-world scenarios? I guess Avi's "one million" is more than enough? Avi Kivity kirjoitti: > I don't think a user will ever come close to a million, but we can > expect some inflation from inlined functions (I don't know if uprobes > replicates such probes, but if it doesn't, it should). Right. I guess we're looking at few megabytes of the address space for normal scenarios which doesn't seem too excessive. However, as Peter pointed out, the bigger problem is that now we're opening the door for other features to steal chunks of the address space. And I think it's a legitimate worry that it's going to cause problems for 32-bit in the future. I don't like the idea but if the performance benefits are real (are they?), maybe it's a worthwhile trade-off. Dunno. Pekka From fweisbec at gmail.com Mon Jan 18 13:00:51 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Mon, 18 Jan 2010 14:00:51 +0100 Subject: [RFC] [PATCH 7/7] Ftrace plugin for Uprobes In-Reply-To: <1263472149.4244.314.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122608.22050.94088.sendpatchset@srikar.in.ibm.com> <1263468191.4244.300.camel@laptop> <20100114113509.GB5033@nowhere> <1263469381.4244.308.camel@laptop> <20100114122329.GC5033@nowhere> <1263472149.4244.314.camel@laptop> Message-ID: <20100118130048.GA10364@nowhere> On Thu, Jan 14, 2010 at 01:29:09PM +0100, Peter Zijlstra wrote: > On Thu, 2010-01-14 at 13:23 +0100, Frederic Weisbecker wrote: > > > > I see, so what you suggest is to have the probe set up > > as generic first. Then the process that activates it > > becomes a consumer, right? > > Right, so either we have it always on, for things like ftrace, > > in which case the creation traverses rmap and installs the probes > all existing mmap()s, and a mmap() hook will install it on all new > ones. > > Or they're strictly consumer driver, like perf, in which case the act of > enabling the event will install the probe (if its not there yet). > Looks like a good plan. From peterz at infradead.org Mon Jan 18 13:05:10 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 14:05:10 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5459CA.9060603@redhat.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> Message-ID: <1263819910.4283.478.camel@laptop> On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote: > On 01/18/2010 02:51 PM, Pekka Enberg wrote: > > > > And how many probes do we expected to be live at the same time in > > real-world scenarios? I guess Avi's "one million" is more than enough? > > > > I don't think a user will ever come close to a million, but we can > expect some inflation from inlined functions (I don't know if uprobes > replicates such probes, but if it doesn't, it should). That's up to the userspace creating the probes but yes, agreed. From avi at redhat.com Mon Jan 18 13:06:45 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 15:06:45 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B545ACF.40203@cs.helsinki.fi> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> Message-ID: <4B545CE5.6090506@redhat.com> On 01/18/2010 02:57 PM, Pekka Enberg wrote: > On 01/18/2010 02:51 PM, Pekka Enberg wrote: >>> And how many probes do we expected to be live at the same time in >>> real-world scenarios? I guess Avi's "one million" is more than enough? > > Avi Kivity kirjoitti: >> I don't think a user will ever come close to a million, but we can >> expect some inflation from inlined functions (I don't know if uprobes >> replicates such probes, but if it doesn't, it should). > > Right. I guess we're looking at few megabytes of the address space for > normal scenarios which doesn't seem too excessive. > > However, as Peter pointed out, the bigger problem is that now we're > opening the door for other features to steal chunks of the address > space. And I think it's a legitimate worry that it's going to cause > problems for 32-bit in the future. > > I don't like the idea but if the performance benefits are real (are > they?), maybe it's a worthwhile trade-off. Dunno. If uprobes can trace to buffer memory in the process address space, I think the win can be dramatic. Incidentally it will require injecting even more vmas into a process. Basically it means very low cost tracing, like the kernel tracers. -- error compiling committee.c: too many arguments to function From peterz at infradead.org Mon Jan 18 13:15:51 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 18 Jan 2010 14:15:51 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5455FF.7010409@redhat.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <1263816857.4283.381.camel@laptop> <4B5455FF.7010409@redhat.com> Message-ID: <1263820551.4283.499.camel@laptop> On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote: > On 01/18/2010 02:14 PM, Peter Zijlstra wrote: > > > >> Well, the alternatives are very unappealing. Emulation and > >> single-stepping are going to be very slow compared to a couple of jumps. > >> > > With CPL2 or RPL on user segments the protection issue seems to be > > manageable for running the instructions from kernel space. > > > > CPL2 gives unrestricted access to the kernel address space; and RPL does > not affect page level protection. Segment limits don't work on x86-64. > But perhaps I missed something - these things are tricky. So setting RPL to 3 on the user segments allows access to kernel pages just fine? How useful.. :/ > It should be possible to translate the instruction into an address space > check, followed by the action, but that's still slower due to privilege > level switches. Well, if you manage to do the address validation you don't need the priv level switch anymore, right? Are the ins encodings sane enough to recognize mem parameters without needing to know the actual ins? How about using a hw-breakpoint to close the gap for the inline single step? You could even re-insert the int3 lazily when you need the hw-breakpoint again. It would consume one hw-breakpoint register for each task/cpu that has probes though.. From avi at redhat.com Mon Jan 18 13:33:21 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 15:33:21 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263820551.4283.499.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <1263816857.4283.381.camel@laptop> <4B5455FF.7010409@redhat.com> <1263820551.4283.499.camel@laptop> Message-ID: <4B546321.60607@redhat.com> On 01/18/2010 03:15 PM, Peter Zijlstra wrote: > On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote: > >> On 01/18/2010 02:14 PM, Peter Zijlstra wrote: >> >>> >>>> Well, the alternatives are very unappealing. Emulation and >>>> single-stepping are going to be very slow compared to a couple of jumps. >>>> >>>> >>> With CPL2 or RPL on user segments the protection issue seems to be >>> manageable for running the instructions from kernel space. >>> >>> >> CPL2 gives unrestricted access to the kernel address space; and RPL does >> not affect page level protection. Segment limits don't work on x86-64. >> But perhaps I missed something - these things are tricky. >> > So setting RPL to 3 on the user segments allows access to kernel pages > just fine? How useful.. :/ > The further we stay away from segmentation, the better. Thankfully AMD removed hardware task switching from x86-64 so we can't even think about that. >> It should be possible to translate the instruction into an address space >> check, followed by the action, but that's still slower due to privilege >> level switches. >> > Well, if you manage to do the address validation you don't need the priv > level switch anymore, right? > Right. > Are the ins encodings sane enough to recognize mem parameters without > needing to know the actual ins? > No. You need to know whether the instruction accesses memory or not. Look at the tables at the beginning of arch/x86/kvm/emulate.c. Opcodes marked with ModRM, BitOp, MemAbs, String, Stack are all different styles of memory instructions. You need to know the operand size for the edge cases. And there are probably a few special cases in the code. > How about using a hw-breakpoint to close the gap for the inline single > step? You could even re-insert the int3 lazily when you need the > hw-breakpoint again. It would consume one hw-breakpoint register for > each task/cpu that has probes though.. > If you have more than four threads, it breaks, no? And you need an IPI each time you hit the breakpoint. Ultimately I'd like to see the breakpoint avoided as well, use a jump to the XOL area and trace in ~20 cycles instead of ~1000. -- error compiling committee.c: too many arguments to function From mjw at redhat.com Mon Jan 18 13:34:40 2010 From: mjw at redhat.com (Mark Wielaard) Date: Mon, 18 Jan 2010 14:34:40 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5459CA.9060603@redhat.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> Message-ID: <1263821680.2946.85.camel@springer.wildebeest.org> On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote: > On 01/18/2010 02:51 PM, Pekka Enberg wrote: > > > > And how many probes do we expected to be live at the same time in > > real-world scenarios? I guess Avi's "one million" is more than enough? > > > I don't think a user will ever come close to a million, but we can > expect some inflation from inlined functions (I don't know if uprobes > replicates such probes, but if it doesn't, it should). SystemTap by default places probes on all instances of an inlined function. It is still hard to get to a million probes though. $ stap -v -l 'process("/usr/bin/emacs").function("*")' [...] Pass 2: analyzed script: 4359 probe(s) You can try probing all statements (for every function, in every file, on every line of source code), but even that only adds up to ten thousands of probes: $ stap -v -l 'process("/usr/bin/emacs").statement("*@*:*")' [...] Pass 2: analyzed script: 39603 probe(s) So a million is pretty far out, even if you add larger programs and all the shared libraries they are using. As Srikar said the current allocation technique is the simplest you can do, one xol slot for each uprobe. But there are other techniques that you can use. Theoretically you only need a xol slot for each thread of a process that simultaneously hits a uprobe instance. That requires a bit more bookkeeping. The variant of uprobes that systemtap uses at the moment does that. But the locking in that case is pretty tricky, so it seemed easier to first get the code with the simplest xol allocation technique upstream. But if you do that than you can use a very small xol area to support millions of uprobes and only have to expand it when there are hundreds of threads in a process all hitting the probes simultaneously. Cheers, Mark From prasad at linux.vnet.ibm.com Mon Jan 18 13:34:45 2010 From: prasad at linux.vnet.ibm.com (K.Prasad) Date: Mon, 18 Jan 2010 19:04:45 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263820551.4283.499.camel@laptop> References: <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <1263816857.4283.381.camel@laptop> <4B5455FF.7010409@redhat.com> <1263820551.4283.499.camel@laptop> Message-ID: <20100118133444.GA23680@in.ibm.com> On Mon, Jan 18, 2010 at 02:15:51PM +0100, Peter Zijlstra wrote: > On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote: > > On 01/18/2010 02:14 PM, Peter Zijlstra wrote: > > > > > >> Well, the alternatives are very unappealing. Emulation and > > >> single-stepping are going to be very slow compared to a couple of jumps. > > >> > > > With CPL2 or RPL on user segments the protection issue seems to be > > > manageable for running the instructions from kernel space. > > > > > > > CPL2 gives unrestricted access to the kernel address space; and RPL does > > not affect page level protection. Segment limits don't work on x86-64. > > But perhaps I missed something - these things are tricky. > > So setting RPL to 3 on the user segments allows access to kernel pages > just fine? How useful.. :/ > > > It should be possible to translate the instruction into an address space > > check, followed by the action, but that's still slower due to privilege > > level switches. > > Well, if you manage to do the address validation you don't need the priv > level switch anymore, right? > > Are the ins encodings sane enough to recognize mem parameters without > needing to know the actual ins? > > How about using a hw-breakpoint to close the gap for the inline single > step? You could even re-insert the int3 lazily when you need the > hw-breakpoint again. It would consume one hw-breakpoint register for > each task/cpu that has probes though.. > A very scarce resource that it is, well, sometimes all that we might have is just one hw-breakpoint register (like older PPC64 with 1 IABR) in the system. If one process/thread consumes it, then all other contenders (from both kernel and user-space) are prevented from acquiring it. Also to mention the existence of processors with no support for instruction breakpoints. Thanks, K.Prasad From mldireto at tudoemoferta.com.br Mon Jan 18 13:13:28 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 18 Jan 2010 11:13:28 -0200 Subject: Semana da mobilidade TudoemOferta Message-ID: <0ed901344447277ac2b1b6a300103535@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From ananth at in.ibm.com Mon Jan 18 15:43:23 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 18 Jan 2010 21:13:23 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> References: <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> Message-ID: <20100118154323.GA4424@in.ibm.com> On Mon, Jan 18, 2010 at 02:13:25PM +0200, Pekka Enberg wrote: > Hi Avi, > > On Mon, 2010-01-18 at 14:01 +0200, Avi Kivity wrote: > >>> Maybe you place no value on uprobes. ?But people who debug userspace > >>> likely will see a reason. > > On 01/18/2010 02:06 PM, Peter Zijlstra wrote: > >> I do see value in uprobes, I just don't like it mucking about with the > >> address space. Nor does it appear required. > > On Mon, Jan 18, 2010 at 2:09 PM, Avi Kivity wrote: > > Well, the alternatives are very unappealing. ?Emulation and single-stepping > > are going to be very slow compared to a couple of jumps. > > So how big chunks of the address space are we talking here for uprobes? As Srikar mentioned, the least we start with is 1 page. Though you can have as many probes as you want, there are certain optimizations we can do, depending on the most common usecases. For eg., if you'd consider the start of a routine to be the most commonly traced location, most routines in a binary would generally start with the same instruction (say push %ebp), and we can refcount a slot with that instruction to be used for all probes of the same instruction. Ananth From mhiramat at redhat.com Mon Jan 18 15:58:25 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Mon, 18 Jan 2010 10:58:25 -0500 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> References: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> Message-ID: <4B548521.7000704@redhat.com> Jim Keniston wrote: > Not really. For #3 (boosting), you need to know everything for #2, > plus be able to compute the length of each instruction -- which we can > now do for x86. To emulate an instruction (#4), you need to replicate > what it does, side-effects and all. The x86 instruction set seems to > be adding new floating-point instructions all the time, and I bet even > Masami doesn't know what they all do, but so far, they all seem to > adhere to the instruction-length rules encoded in Masami's instruction > decoder. Actually, current x86 decoder doesn't support FP(x87) instructions.(even it already supported AVX) But I think it's not so hard to add it. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From avi at redhat.com Mon Jan 18 16:52:32 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 18 Jan 2010 18:52:32 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100118154323.GA4424@in.ibm.com> References: <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <20100118154323.GA4424@in.ibm.com> Message-ID: <4B5491D0.20501@redhat.com> On 01/18/2010 05:43 PM, Ananth N Mavinakayanahalli wrote: >> >>> Well, the alternatives are very unappealing. Emulation and single-stepping >>> are going to be very slow compared to a couple of jumps. >>> >> So how big chunks of the address space are we talking here for uprobes? >> > As Srikar mentioned, the least we start with is 1 page. Though you can > have as many probes as you want, there are certain optimizations we can > do, depending on the most common usecases. > > For eg., if you'd consider the start of a routine to be the most > commonly traced location, most routines in a binary would generally > start with the same instruction (say push %ebp), and we can refcount a > slot with that instruction to be used for all probes of the same > instruction. > But then you can't follow the instruction with a jump back to the code... -- error compiling committee.c: too many arguments to function From ananth at in.ibm.com Mon Jan 18 17:10:38 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 18 Jan 2010 22:40:38 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5491D0.20501@redhat.com> References: <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <20100118154323.GA4424@in.ibm.com> <4B5491D0.20501@redhat.com> Message-ID: <20100118171038.GB4424@in.ibm.com> On Mon, Jan 18, 2010 at 06:52:32PM +0200, Avi Kivity wrote: > On 01/18/2010 05:43 PM, Ananth N Mavinakayanahalli wrote: >>> >>>> Well, the alternatives are very unappealing. Emulation and single-stepping >>>> are going to be very slow compared to a couple of jumps. >>>> >>> So how big chunks of the address space are we talking here for uprobes? >>> >> As Srikar mentioned, the least we start with is 1 page. Though you can >> have as many probes as you want, there are certain optimizations we can >> do, depending on the most common usecases. >> >> For eg., if you'd consider the start of a routine to be the most >> commonly traced location, most routines in a binary would generally >> start with the same instruction (say push %ebp), and we can refcount a >> slot with that instruction to be used for all probes of the same >> instruction. >> > > But then you can't follow the instruction with a jump back to the code... Right. This will work only for the non boosted case where single-stepping is mandatory. I guess the tradeoff is vma space and speed. Ananth From jkenisto at us.ibm.com Mon Jan 18 19:21:22 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Mon, 18 Jan 2010 11:21:22 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B548521.7000704@redhat.com> References: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> <4B548521.7000704@redhat.com> Message-ID: <1263842482.5059.9.camel@localhost.localdomain> On Mon, 2010-01-18 at 10:58 -0500, Masami Hiramatsu wrote: > Jim Keniston wrote: > > Not really. For #3 (boosting), you need to know everything for #2, > > plus be able to compute the length of each instruction -- which we can > > now do for x86. To emulate an instruction (#4), you need to replicate > > what it does, side-effects and all. The x86 instruction set seems to > > be adding new floating-point instructions all the time, and I bet even > > Masami doesn't know what they all do, but so far, they all seem to > > adhere to the instruction-length rules encoded in Masami's instruction > > decoder. > > Actually, current x86 decoder doesn't support FP(x87) instructions.(even > it already supported AVX) But I think it's not so hard to add it. > At one point I verified that it worked for all the x87 instructions in libm: https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html I'm pretty sure I tested mmx instructions as well. But I guess this was before you rearranged the opcode tables. Yeah, it wouldn't be hard to add back in, at least for purposes of computing instruction lengths. Jim From jkenisto at us.ibm.com Mon Jan 18 19:49:52 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Mon, 18 Jan 2010 11:49:52 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263821680.2946.85.camel@springer.wildebeest.org> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <1263821680.2946.85.camel@springer.wildebeest.org> Message-ID: <1263844192.5059.29.camel@localhost.localdomain> On Mon, 2010-01-18 at 14:34 +0100, Mark Wielaard wrote: > On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote: > > On 01/18/2010 02:51 PM, Pekka Enberg wrote: > > > > > > And how many probes do we expected to be live at the same time in > > > real-world scenarios? I guess Avi's "one million" is more than enough? > > > > > I don't think a user will ever come close to a million, but we can > > expect some inflation from inlined functions (I don't know if uprobes > > replicates such probes, but if it doesn't, it should). > > SystemTap by default places probes on all instances of an inlined > function. It is still hard to get to a million probes though. > $ stap -v -l 'process("/usr/bin/emacs").function("*")' > [...] > Pass 2: analyzed script: 4359 probe(s) > > You can try probing all statements (for every function, in every file, > on every line of source code), but even that only adds up to ten > thousands of probes: > $ stap -v -l 'process("/usr/bin/emacs").statement("*@*:*")' > [...] > Pass 2: analyzed script: 39603 probe(s) > > So a million is pretty far out, even if you add larger programs and all > the shared libraries they are using. Thanks, Mark. One correction, below. > > As Srikar said the current allocation technique is the simplest you can > do, one xol slot for each uprobe. But there are other techniques that > you can use. Theoretically you only need a xol slot for each thread of a > process that simultaneously hits a uprobe instance. That requires a bit > more bookkeeping. The variant of uprobes that systemtap uses at the > moment does that. Actually, it's per-probepoint, with a fixed number of slots. If the probepoint you just hit doesn't have a slot, and none are free, you steal a slot from another probepoint. Yeah, it's messy. We considered allocating slots per-thread, hoping to make it basically lockless, but that way there's more likely to be constant scribbling on the XOL area, as a thread with n slots cycles through n+m probepoints. And of course, it gets dicey as the process clones more threads. I guess the point is, there are a lot of ways to allocate slots, and we haven't found the perfect algorithm yet, even if you accept the existence of (and need for) the XOL area. Keep the ideas coming. > But the locking in that case is pretty tricky, so it > seemed easier to first get the code with the simplest xol allocation > technique upstream. But if you do that than you can use a very small xol > area to support millions of uprobes and only have to expand it when > there are hundreds of threads in a process all hitting the probes > simultaneously. > > Cheers, > > Mark > Jim From tiao.fst at aol.com Tue Jan 19 21:02:25 2010 From: tiao.fst at aol.com (tiao.fst at aol.com) Date: Wed, 20 Jan 2010 05:02:25 +0800 Subject: =?GB2312?B?0dC3os/uxL+5pL7f0+vEo7DlyrXO8Q==?= Message-ID: <201001182110.o0ILAbC4003043@mx1.redhat.com> ================================================================================ ?????????????????????? ================================================================================ ??.??.??.??.???? 2010??01??15-16?? ???? 2010??01??18-19?? ???? 2010??01??25-26?? ???? 2010??01??28-29?? ???? ??.??.??.??.??.??.??.?????????????????????????? ??.??.??.??.??.???????????????????????????? ??.??.??.??.??.???????????????????????????? ??.????rdwork@ 126.comi.?????????????????????? ii.?????????????????????? iii.?????????????? ivharlesrdwork@ 126.com ================================================================================ From mhiramat at redhat.com Mon Jan 18 21:20:54 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Mon, 18 Jan 2010 16:20:54 -0500 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263842482.5059.9.camel@localhost.localdomain> References: <20100116184833.2s0zihwbggkgccsk@imap.linux.ibm.com> <4B548521.7000704@redhat.com> <1263842482.5059.9.camel@localhost.localdomain> Message-ID: <4B54D0B6.4030706@redhat.com> Jim Keniston wrote: > On Mon, 2010-01-18 at 10:58 -0500, Masami Hiramatsu wrote: >> Jim Keniston wrote: >>> Not really. For #3 (boosting), you need to know everything for #2, >>> plus be able to compute the length of each instruction -- which we can >>> now do for x86. To emulate an instruction (#4), you need to replicate >>> what it does, side-effects and all. The x86 instruction set seems to >>> be adding new floating-point instructions all the time, and I bet even >>> Masami doesn't know what they all do, but so far, they all seem to >>> adhere to the instruction-length rules encoded in Masami's instruction >>> decoder. >> >> Actually, current x86 decoder doesn't support FP(x87) instructions.(even >> it already supported AVX) But I think it's not so hard to add it. >> > > At one point I verified that it worked for all the x87 instructions in > libm: > https://www.redhat.com/archives/utrace-devel/2009-March/msg00031.html > I'm pretty sure I tested mmx instructions as well. But I guess this was > before you rearranged the opcode tables. > > Yeah, it wouldn't be hard to add back in, at least for purposes of > computing instruction lengths. objdump -d /lib/libm.so.6 | awk -f arch/x86/tools/distill.awk | ./test_get_len Succeed: decoded and checked 37198 instructions Hmm, yeah, that's already supported :-D. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From jkenisto at us.ibm.com Mon Jan 18 22:15:57 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Mon, 18 Jan 2010 14:15:57 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B545ACF.40203@cs.helsinki.fi> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> Message-ID: <1263852957.2266.38.camel@localhost.localdomain> On Mon, 2010-01-18 at 14:57 +0200, Pekka Enberg wrote: > On 01/18/2010 02:51 PM, Pekka Enberg wrote: > >> And how many probes do we expected to be live at the same time in > >> real-world scenarios? I guess Avi's "one million" is more than enough? > > Avi Kivity kirjoitti: > > I don't think a user will ever come close to a million, but we can > > expect some inflation from inlined functions (I don't know if uprobes > > replicates such probes, but if it doesn't, it should). > > Right. I guess we're looking at few megabytes of the address space for > normal scenarios which doesn't seem too excessive. > > However, as Peter pointed out, the bigger problem is that now we're > opening the door for other features to steal chunks of the address > space. And I think it's a legitimate worry that it's going to cause > problems for 32-bit in the future. > > I don't like the idea but if the performance benefits are real (are > they?), Based on what seems to be the closest thing to an apples-to-apples comparison -- counting the number of calls to a specified function -- uprobes is 6-7 times faster than the ptrace-based equivalent, ltrace -c. And of course, uprobes provides much, much more flexibility, appears to scale better, and works with multithreaded apps. Likewise, FWIW, utrace is more than 10x faster than strace -c in counting system calls. > maybe it's a worthwhile trade-off. Dunno. > > Pekka Jim From mldireto at tudoemoferta.com.br Mon Jan 18 19:02:58 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 18 Jan 2010 17:02:58 -0200 Subject: Semana da mobilidade TudoemOferta Message-ID: <8aa23ee6b58b2357dae9cf3d001a6be8@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From roland at redhat.com Tue Jan 19 02:02:15 2010 From: roland at redhat.com (Roland McGrath) Date: Mon, 18 Jan 2010 18:02:15 -0800 (PST) Subject: PTRACE_SYSCALL_ENTRY/EXIT In-Reply-To: Ali Polatel's message of Saturday, 16 January 2010 23:51:24 +0200 <20100116215124.GA963@harikalardiyari> References: <20100116215124.GA963@harikalardiyari> Message-ID: <20100119020216.01CEB506@magilla.sf.frob.com> We don't have any particular plans to extend the ptrace interface. I strongly doubt we would even try to do anything like that until the utrace-based ptrace interface is merged into Linux and the old ptrace implementation gone. In general, we are not looking for extensions to the ptrace interface. It is an ugly hairball already and we are more interested in having the utrace API layer available inside the kernel and then embarking on new and sane userland interfaces instead of shoehorning more into ptrace. That said, some particular kinds of simple enhancements to ptrace are really quite trivial to implement in the new utrace-based implementation. The particular area you suggest is one of these. What I would expect is not new variants of the one-shot interface like PTRACE_SYSCALL. Rather, I would envision new PTRACE_O_* options to enable syscall entry and exit tracing analogous to the PTRACE_EVENT_* events you can now enable. This means that you make one PTRACE_SETOPTIONS call to enable the set of events you want, and then use plain PTRACE_CONT (or whatever). If you really want exactly the one-shot behavior instead, then we could consider that. But, like I said, we are not looking to add much in the way of new wrinkles to the dismal ptrace userland interface. Thanks, Roland From avi at redhat.com Tue Jan 19 08:07:49 2010 From: avi at redhat.com (Avi Kivity) Date: Tue, 19 Jan 2010 10:07:49 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263852957.2266.38.camel@localhost.localdomain> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> Message-ID: <4B556855.6040800@redhat.com> On 01/19/2010 12:15 AM, Jim Keniston wrote: > >> I don't like the idea but if the performance benefits are real (are >> they?), >> > Based on what seems to be the closest thing to an apples-to-apples > comparison -- counting the number of calls to a specified function -- > uprobes is 6-7 times faster than the ptrace-based equivalent, ltrace -c. > And of course, uprobes provides much, much more flexibility, appears to > scale better, and works with multithreaded apps. > > Likewise, FWIW, utrace is more than 10x faster than strace -c in > counting system calls. > > This is still with a kernel entry, yes? Do you have plans for a variant that's completely in userspace? -- error compiling committee.c: too many arguments to function From jkenisto at us.ibm.com Tue Jan 19 17:47:45 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Tue, 19 Jan 2010 09:47:45 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B556855.6040800@redhat.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> Message-ID: <1263923265.4998.28.camel@localhost.localdomain> On Tue, 2010-01-19 at 10:07 +0200, Avi Kivity wrote: > On 01/19/2010 12:15 AM, Jim Keniston wrote: > > > >> I don't like the idea but if the performance benefits are real (are > >> they?), > >> > > Based on what seems to be the closest thing to an apples-to-apples > > comparison -- counting the number of calls to a specified function -- > > uprobes is 6-7 times faster than the ptrace-based equivalent, ltrace -c. > > And of course, uprobes provides much, much more flexibility, appears to > > scale better, and works with multithreaded apps. > > > > Likewise, FWIW, utrace is more than 10x faster than strace -c in > > counting system calls. > > > > > > This is still with a kernel entry, yes? Yes, this involves setting a breakpoint and trapping into the kernel when it's hit. The 6-7x figure is with the current 2-trap approach (breakpoint, single-step). Boosting could presumably make that more like 12-14x. > Do you have plans for a variant > that's completely in userspace? I don't know of any such plans, but I'd be interested to read more of your thoughts here. As I understand it, you've suggested replacing the probed instruction with a jump into an instrumentation vma (the XOL area, or something similar). Masami has demonstrated -- through his djprobes enhancement to kprobes -- that this can be done for many x86 instructions. What does the code in the jumped-to vma do? Is the instrumentation code that corresponds to the uprobe handlers encoded in an ad hoc .so? BTW, when some people say "completely in userspace," they mean something like ptrace, where the kernel is still heavily involved but the instrumentation code runs in user space. The ubp layer is intended to support that model as well. In our various implementations of the XOL vma/address area, however, the XOL area is either created on exec or created/expanded only by the probed process. Jim From fweisbec at gmail.com Tue Jan 19 18:06:12 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Tue, 19 Jan 2010 19:06:12 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263923265.4998.28.camel@localhost.localdomain> References: <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> Message-ID: <20100119180610.GA11005@nowhere> On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote: > > Do you have plans for a variant > > that's completely in userspace? > > I don't know of any such plans, but I'd be interested to read more of > your thoughts here. As I understand it, you've suggested replacing the > probed instruction with a jump into an instrumentation vma (the XOL > area, or something similar). Masami has demonstrated -- through his > djprobes enhancement to kprobes -- that this can be done for many x86 > instructions. > > What does the code in the jumped-to vma do? Is the instrumentation code > that corresponds to the uprobe handlers encoded in an ad hoc .so? Once the instrumentation is requested by a process that is not the instrumented one, this looks impossible to set a uprobe without a minimal voluntary collaboration from the instrumented process (events sent through IPC or whatever). So that looks too limited, this is not anymore a true dynamic uprobe. From fche at redhat.com Tue Jan 19 21:16:46 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 19 Jan 2010 16:16:46 -0500 Subject: linux-next: add utrace tree Message-ID: <20100119211646.GF16096@redhat.com> Hi - Having been reviewed a couple of times, and we hope being a good candidate for merging next time, please start pulling git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch master This repo contains frequent merges from Linus' tree. If you'd prefer a cleaner rebase-based branch to pull from, we can make one of those too. Thanks! - FChE From sfr at canb.auug.org.au Wed Jan 20 00:12:20 2010 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Jan 2010 11:12:20 +1100 Subject: linux-next: add utrace tree In-Reply-To: <20100119211646.GF16096@redhat.com> References: <20100119211646.GF16096@redhat.com> Message-ID: <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> Hi Frank, On Tue, 19 Jan 2010 16:16:46 -0500 "Frank Ch. Eigler" wrote: > > Having been reviewed a couple of times, and we hope being a good > candidate for merging next time, please start pulling > > git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch master I have added this from today with you and utrace-devel as the contacts. I have cc'd the wider community on this email so that people are aware that this has been included. > This repo contains frequent merges from Linus' tree. If you'd prefer > a cleaner rebase-based branch to pull from, we can make one of those too. For now it is OK, but you might like to ask Linus if he would like it cleaned up before submission since it seems to have history right back to 2.6.29 and (as you say) lots of merges with his tree. You should also add a commit with an entry in MAINTAINERS. [Standard boilerplate] Thanks for adding your subsystem tree as a participant of linux-next. As you may know, this is not a judgment of your code. The purpose of linux-next is for integration testing and to lower the impact of conflicts between subsystems in the next merge window. You will need to ensure that the patches/commits in your tree/series have been: * submitted under GPL v2 (or later) and include the Contributor's Signed-off-by, * posted to the relevant mailing list, * reviewed by you (or another maintainer of your subsystem tree), * successfully unit tested, and * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch). It is allowed to be rebased if you deem it necessary. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au Legal Stuff: By participating in linux-next, your subsystem tree contributions are public and will be included in the linux-next trees. You may be sent e-mail messages indicating errors or other issues when the patches/commits from your subsystem tree are merged and tested in linux-next. These messages may also be cross-posted to the linux-next mailing list, the linux-kernel mailing list, etc. The linux-next tree project and IBM (my employer) make no warranties regarding the linux-next project, the testing procedures, the results, the e-mails, etc. If you don't agree to these ground rules, let me know and I'll remove your tree from participation in linux-next. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From sfr at canb.auug.org.au Wed Jan 20 05:49:45 2010 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Jan 2010 16:49:45 +1100 Subject: linux-next: manual merge of the utrace tree with the fsnotify tree Message-ID: <20100120164945.4c6f018e.sfr@canb.auug.org.au> Hi all, Today's linux-next merge of the utrace tree got a conflict in kernel/Makefile between commit 9878914352df8ccfbad1307d51ca05706d50cae4 ("Audit: split audit watch Kconfig") from the fsnotify tree and commit f357a74067bc548772166a4817d5f2c32005a449 ("utrace core") from the utrace tree. Just context changes. I fixed it up (see below) and can carry the fix as necessary. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au diff --cc kernel/Makefile index 702260a,8a0185e..0000000 --- a/kernel/Makefile +++ b/kernel/Makefile @@@ -69,13 -69,13 +69,14 @@@ obj-$(CONFIG_IKCONFIG) += configs. obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o + obj-$(CONFIG_UTRACE) += utrace.o -obj-$(CONFIG_AUDIT) += audit.o auditfilter.o audit_watch.o +obj-$(CONFIG_AUDIT) += audit.o auditfilter.o obj-$(CONFIG_AUDITSYSCALL) += auditsc.o -obj-$(CONFIG_GCOV_KERNEL) += gcov/ +obj-$(CONFIG_AUDIT_WATCH) += audit_watch.o obj-$(CONFIG_AUDIT_TREE) += audit_tree.o +obj-$(CONFIG_GCOV_KERNEL) += gcov/ obj-$(CONFIG_KPROBES) += kprobes.o -obj-$(CONFIG_KGDB) += kgdb.o +obj-$(CONFIG_KGDB) += debug/ obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ From mingo at elte.hu Wed Jan 20 05:49:50 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 20 Jan 2010 06:49:50 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> Message-ID: <20100120054950.GB27108@elte.hu> * Stephen Rothwell wrote: > Hi Frank, > > On Tue, 19 Jan 2010 16:16:46 -0500 "Frank Ch. Eigler" wrote: > > > > Having been reviewed a couple of times, and we hope being a good > > candidate for merging next time, please start pulling > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch master > > I have added this from today with you and utrace-devel as the contacts. > I have cc'd the wider community on this email so that people are aware > that this has been included. > > > This repo contains frequent merges from Linus' tree. If you'd prefer > > a cleaner rebase-based branch to pull from, we can make one of those too. > > For now it is OK, but you might like to ask Linus if he would like it > cleaned up before submission since it seems to have history right back to > 2.6.29 and (as you say) lots of merges with his tree. > > You should also add a commit with an entry in MAINTAINERS. Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, etc.) can go uptream in its present form. IMHO the far more important thing to address beyond formalities and workflow cleanliness are the (many) technical observations and objections offered by Peter Zijstra on lkml. Not just the git history but also the abstractions and concepts are messy and should be reworked IMO, and also good and working perf events integration should be achieved, etc. The fact that there's a well established upstream workflow for instrumentation patches, which is being routed around by the utrace/uprobes/systemtap code here is not a good sign in terms of reaching a good upstream solution. Lets hope it works out well though. Thanks, Ingo From ananth at in.ibm.com Wed Jan 20 06:15:51 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Wed, 20 Jan 2010 11:45:51 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100120054950.GB27108@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> Message-ID: <20100120061551.GB6588@in.ibm.com> On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: Ingo, > Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, > etc.) can go uptream in its present form. Agreed, uprobes is still not upstream ready -- it was an RFC. We are working through the comments there to get it ready for merger. > IMHO the far more important thing to address beyond formalities and workflow > cleanliness are the (many) technical observations and objections offered by > Peter Zijstra on lkml. Not just the git history but also the abstractions and > concepts are messy and should be reworked IMO, and also good and working perf > events integration should be achieved, etc. I think Oleg addressed most of Peter's concerns on utrace when the ptrace/utrace patchset was reposted. Perf integration with uprobes will be done and discussions have started with Masami and Frederic. There are a couple of fundamental technical aspects (XOL vma vs. emulation; breakpoint insertion through CoW and not through quiesce) that need resolution. > The fact that there's a well established upstream workflow for instrumentation > patches, which is being routed around by the utrace/uprobes/systemtap code > here is not a good sign in terms of reaching a good upstream solution. Lets > hope it works out well though. Agreed. On the other hand, having ptrace/utrace in the -next tree will give it a lot more testing, while any outstanding technical issues are being addressed. Stephen, To exercise ptrace/utrace, it would be very useful if you pulled in git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git branch utrace-ptrace instead of 'master'. Thanks, Ananth From mingo at elte.hu Wed Jan 20 06:28:34 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 20 Jan 2010 07:28:34 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100120061551.GB6588@in.ibm.com> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> Message-ID: <20100120062834.GB12165@elte.hu> * Ananth N Mavinakayanahalli wrote: > On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: > > Ingo, > > > Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, > > etc.) can go uptream in its present form. > > Agreed, uprobes is still not upstream ready -- it was an RFC. We are > working through the comments there to get it ready for merger. > > > IMHO the far more important thing to address beyond formalities and workflow > > cleanliness are the (many) technical observations and objections offered by > > Peter Zijstra on lkml. Not just the git history but also the abstractions and > > concepts are messy and should be reworked IMO, and also good and working perf > > events integration should be achieved, etc. > > I think Oleg addressed most of Peter's concerns on utrace when the > ptrace/utrace patchset was reposted. Peter is Cc:-ed and he might want to chime in. > Perf integration with uprobes will be done and discussions have started with > Masami and Frederic. There are a couple of fundamental technical aspects > (XOL vma vs. emulation; breakpoint insertion through CoW and not through > quiesce) that need resolution. > > > The fact that there's a well established upstream workflow for instrumentation > > patches, which is being routed around by the utrace/uprobes/systemtap code > > here is not a good sign in terms of reaching a good upstream solution. Lets > > hope it works out well though. > > Agreed. > > On the other hand, having ptrace/utrace in the -next tree will give it a > lot more testing, while any outstanding technical issues are being addressed. Including experimental code that is RFC and which is not certain to go upstream is certainly not the purpose of linux-next though. It will cause conflicts with various other trees and increases the overhead all around. It also causes us to trust linux-next bugreports less - as it's not the 'next Linux' anymore. Also, there's virtually no high-level technical review done in linux-next: the trees are implicitly trusted (because they are pushed by maintainers), bugs and conflicts are reported but otherwise it's a neutral tree that includes pretty much any commit indiscriminately. If you need review and testing there's a number of trees you can get inclusion into. Ingo From srikar at linux.vnet.ibm.com Wed Jan 20 06:36:20 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Wed, 20 Jan 2010 12:06:20 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100119180610.GA11005@nowhere> References: <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <20100119180610.GA11005@nowhere> Message-ID: <20100120063620.GA30109@linux.vnet.ibm.com> * Frederic Weisbecker [2010-01-19 19:06:12]: > On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote: > > > > What does the code in the jumped-to vma do? Is the instrumentation code > > that corresponds to the uprobe handlers encoded in an ad hoc .so? > > > Once the instrumentation is requested by a process that is not the > instrumented one, this looks impossible to set a uprobe without a > minimal voluntary collaboration from the instrumented process > (events sent through IPC or whatever). So that looks too limited, > this is not anymore a true dynamic uprobe. I dont see a case where the thread being debugged refuses to place a probe unless the process is exiting. The traced process doesnt decide if it wants to be probed or not. There could be a slight delay from the time the tracer requested to the time the probe is placed. But this delay in only affecting the tracer and the tracee. This is in contract to say stop_machine where the threads of other applications are also affected. From ananth at in.ibm.com Wed Jan 20 06:40:26 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Wed, 20 Jan 2010 12:10:26 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100120062834.GB12165@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> Message-ID: <20100120064026.GC6588@in.ibm.com> On Wed, Jan 20, 2010 at 07:28:34AM +0100, Ingo Molnar wrote: > > * Ananth N Mavinakayanahalli wrote: > > > On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: ... > > On the other hand, having ptrace/utrace in the -next tree will give it a > > lot more testing, while any outstanding technical issues are being addressed. > > Including experimental code that is RFC and which is not certain to go > upstream is certainly not the purpose of linux-next though. OK. > It will cause conflicts with various other trees and increases the overhead > all around. It also causes us to trust linux-next bugreports less - as it's > not the 'next Linux' anymore. Also, there's virtually no high-level technical > review done in linux-next: the trees are implicitly trusted (because they are > pushed by maintainers), bugs and conflicts are reported but otherwise it's a > neutral tree that includes pretty much any commit indiscriminately. > > If you need review and testing there's a number of trees you can get inclusion > into. So would -tip be one of them? If so could you pull the utrace-ptrace branch in? Or did you intend some other tree (random-tracing)? (Though I think a ptrace reimplementation isn't 'random'-tracing :-)) Ananth From sfr at canb.auug.org.au Wed Jan 20 06:59:59 2010 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 20 Jan 2010 17:59:59 +1100 Subject: linux-next: add utrace tree In-Reply-To: <20100120062834.GB12165@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> Message-ID: <20100120175959.59daa481.sfr@canb.auug.org.au> Hi Frank, On Wed, 20 Jan 2010 07:28:34 +0100 Ingo Molnar wrote: > > Including experimental code that is RFC and which is not certain to go > upstream is certainly not the purpose of linux-next though. Ingo is correct in what he says here. See the boilerplate: " * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch)." I will remove this tree from linux-next tomorrow and wait until it is more ready for mainline inclusion. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From mingo at elte.hu Wed Jan 20 07:29:25 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 20 Jan 2010 08:29:25 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100120062834.GB12165@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> Message-ID: <20100120072925.GA11395@elte.hu> * Ingo Molnar wrote: > > * Ananth N Mavinakayanahalli wrote: > > > On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: > > > > Ingo, > > > > > Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, > > > etc.) can go uptream in its present form. > > > > Agreed, uprobes is still not upstream ready -- it was an RFC. We are > > working through the comments there to get it ready for merger. > > > > > IMHO the far more important thing to address beyond formalities and workflow > > > cleanliness are the (many) technical observations and objections offered by > > > Peter Zijstra on lkml. Not just the git history but also the abstractions and > > > concepts are messy and should be reworked IMO, and also good and working perf > > > events integration should be achieved, etc. > > > > I think Oleg addressed most of Peter's concerns on utrace when the > > ptrace/utrace patchset was reposted. > > Peter is Cc:-ed and he might want to chime in. > > > Perf integration with uprobes will be done and discussions have started with > > Masami and Frederic. There are a couple of fundamental technical aspects > > (XOL vma vs. emulation; breakpoint insertion through CoW and not through > > quiesce) that need resolution. > > > > > The fact that there's a well established upstream workflow for instrumentation > > > patches, which is being routed around by the utrace/uprobes/systemtap code > > > here is not a good sign in terms of reaching a good upstream solution. Lets > > > hope it works out well though. > > > > Agreed. > > > > On the other hand, having ptrace/utrace in the -next tree will give it a > > lot more testing, while any outstanding technical issues are being addressed. > > Including experimental code that is RFC and which is not certain to go > upstream is certainly not the purpose of linux-next though. > > It will cause conflicts with various other trees and increases the overhead > all around. It also causes us to trust linux-next bugreports less - as it's > not the 'next Linux' anymore. Also, there's virtually no high-level > technical review done in linux-next: the trees are implicitly trusted > (because they are pushed by maintainers), bugs and conflicts are reported > but otherwise it's a neutral tree that includes pretty much any commit > indiscriminately. > > If you need review and testing there's a number of trees you can get > inclusion into. Btw., the utrace code has lived in -mm for quite some time - that's an excellent route as Andrew does thorough review and testing. If Andrew agrees with this particular tree as-is and wants these bits to live in linux-next and have it in -mm that way then that's a fair approach obviously and i have no objections ... The point is to have at least one relevant maintainer request and track it and then supervise the completion of it (which includes the resolution of all outstanding objections) and then push it to Linus. Ingo From peterz at infradead.org Wed Jan 20 08:52:52 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 20 Jan 2010 09:52:52 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100120062834.GB12165@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> Message-ID: <1263977572.4283.820.camel@laptop> On Wed, 2010-01-20 at 07:28 +0100, Ingo Molnar wrote: > > I think Oleg addressed most of Peter's concerns on utrace when the > > ptrace/utrace patchset was reposted. > > Peter is Cc:-ed and he might want to chime in. Yeah, I'll make some time to go through the latest code again.. if only there was a clone() for humans ;-) From avi at redhat.com Wed Jan 20 09:43:03 2010 From: avi at redhat.com (Avi Kivity) Date: Wed, 20 Jan 2010 11:43:03 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263923265.4998.28.camel@localhost.localdomain> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> Message-ID: <4B56D027.3010808@redhat.com> On 01/19/2010 07:47 PM, Jim Keniston wrote: > >> This is still with a kernel entry, yes? >> > Yes, this involves setting a breakpoint and trapping into the kernel > when it's hit. The 6-7x figure is with the current 2-trap approach > (breakpoint, single-step). Boosting could presumably make that more > like 12-14x. > A trap is IIRC ~1000 cycles, we can reduce this to ~50 (totally negligible from the executed code's point of view). >> Do you have plans for a variant >> that's completely in userspace? >> > I don't know of any such plans, but I'd be interested to read more of > your thoughts here. As I understand it, you've suggested replacing the > probed instruction with a jump into an instrumentation vma (the XOL > area, or something similar). Masami has demonstrated -- through his > djprobes enhancement to kprobes -- that this can be done for many x86 > instructions. > > What does the code in the jumped-to vma do? 1. Write a trace entry into shared memory, trap into the kernel on overflow. 2. Trap if a condition is satisfied (fast watchpoint implementation). > Is the instrumentation code > that corresponds to the uprobe handlers encoded in an ad hoc .so? > Looks like a good idea, but it doesn't matter much to me. -- error compiling committee.c: too many arguments to function From peterz at infradead.org Wed Jan 20 09:57:52 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 20 Jan 2010 10:57:52 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B56D027.3010808@redhat.com> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> Message-ID: <1263981472.4283.843.camel@laptop> On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote: > 1. Write a trace entry into shared memory, trap into the kernel on overflow. > 2. Trap if a condition is satisfied (fast watchpoint implementation). So now you want to consume more of a process' address space to store trace data as well? Not to mention that that process could wreck the trace data rendering it utterly unreliable. From fweisbec at gmail.com Wed Jan 20 10:43:24 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Wed, 20 Jan 2010 11:43:24 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100120064026.GC6588@in.ibm.com> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120064026.GC6588@in.ibm.com> Message-ID: <20100120104322.GA5149@nowhere> On Wed, Jan 20, 2010 at 12:10:26PM +0530, Ananth N Mavinakayanahalli wrote: > > It will cause conflicts with various other trees and increases the overhead > > all around. It also causes us to trust linux-next bugreports less - as it's > > not the 'next Linux' anymore. Also, there's virtually no high-level technical > > review done in linux-next: the trees are implicitly trusted (because they are > > pushed by maintainers), bugs and conflicts are reported but otherwise it's a > > neutral tree that includes pretty much any commit indiscriminately. > > > > If you need review and testing there's a number of trees you can get inclusion > > into. > > So would -tip be one of them? If so could you pull the utrace-ptrace > branch in? > > Or did you intend some other tree (random-tracing)? (Though I think a > ptrace reimplementation isn't 'random'-tracing :-)) Heh. No this is a tree I use for, well, random tracing patches indeed, which has extended to random tracing/perf/* patches by the time. I sometimes relay other's patches to Ingo toward this tree but this is usually about small volumes and for small term storage: patches that have been reviewed/acked already. utrace/uprobe is about high volume and longer time debate/review/maintainance and I won't have the time to carry this. > Ananth From srikar at linux.vnet.ibm.com Wed Jan 20 10:45:41 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Wed, 20 Jan 2010 16:15:41 +0530 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B56D027.3010808@redhat.com> References: <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> Message-ID: <20100120104541.GB30109@linux.vnet.ibm.com> > > > >What does the code in the jumped-to vma do? > > 1. Write a trace entry into shared memory, trap into the kernel on overflow. > 2. Trap if a condition is satisfied (fast watchpoint implementation). > > >Is the instrumentation code > >that corresponds to the uprobe handlers encoded in an ad hoc .so? > > Looks like a good idea, but it doesn't matter much to me. > That looks to be a nice idea. We should certainly look into this possibility. However can we look at this option probably a little later? Our plan was to do one step at a time i.e have the basic uprobes in first and target the booster (i.e jump to the next instruction without the need for single-stepping next). We could look at this option of using jump instead of int3 after we are done with the booster. Hope that's okay. -- Thanks and Regards Srikar From fweisbec at gmail.com Wed Jan 20 10:51:57 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Wed, 20 Jan 2010 11:51:57 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100120063620.GA30109@linux.vnet.ibm.com> References: <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <20100119180610.GA11005@nowhere> <20100120063620.GA30109@linux.vnet.ibm.com> Message-ID: <20100120105155.GC5149@nowhere> On Wed, Jan 20, 2010 at 12:06:20PM +0530, Srikar Dronamraju wrote: > * Frederic Weisbecker [2010-01-19 19:06:12]: > > > On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote: > > > > > > What does the code in the jumped-to vma do? Is the instrumentation code > > > that corresponds to the uprobe handlers encoded in an ad hoc .so? > > > > > > Once the instrumentation is requested by a process that is not the > > instrumented one, this looks impossible to set a uprobe without a > > minimal voluntary collaboration from the instrumented process > > (events sent through IPC or whatever). So that looks too limited, > > this is not anymore a true dynamic uprobe. > > I dont see a case where the thread being debugged refuses to place a > probe unless the process is exiting. The traced process doesnt decide > if it wants to be probed or not. There could be a slight delay from the > time the tracer requested to the time the probe is placed. But this > delay in only affecting the tracer and the tracee. This is in contract > to say stop_machine where the threads of other applications are also > affected. I did not think about a kind of trace point inserted in a shared memory. I was just confused :) From avi at redhat.com Wed Jan 20 12:22:32 2010 From: avi at redhat.com (Avi Kivity) Date: Wed, 20 Jan 2010 14:22:32 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263981472.4283.843.camel@laptop> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> Message-ID: <4B56F588.2060109@redhat.com> On 01/20/2010 11:57 AM, Peter Zijlstra wrote: > On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote: > >> 1. Write a trace entry into shared memory, trap into the kernel on overflow. >> 2. Trap if a condition is satisfied (fast watchpoint implementation). >> > So now you want to consume more of a process' address space to store > trace data as well? Yes. I know I'm bad. > Not to mention that that process could wreck the > trace data rendering it utterly unreliable. > It could, but it also might not. Are we going to deny high performance tracing to users just because it doesn't work in all cases? Note this applies to any kind of monitoring or debugging technology. A process can be influenced by the debugger and render any debug info you get out of it unreliable. One non-timing example is a process using a checksum of its text as an input to some algorithm. -- error compiling committee.c: too many arguments to function From avi at redhat.com Wed Jan 20 12:23:58 2010 From: avi at redhat.com (Avi Kivity) Date: Wed, 20 Jan 2010 14:23:58 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100120104541.GB30109@linux.vnet.ibm.com> References: <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <20100120104541.GB30109@linux.vnet.ibm.com> Message-ID: <4B56F5DE.1060405@redhat.com> On 01/20/2010 12:45 PM, Srikar Dronamraju wrote: >>> What does the code in the jumped-to vma do? >>> >> 1. Write a trace entry into shared memory, trap into the kernel on overflow. >> 2. Trap if a condition is satisfied (fast watchpoint implementation). >> > That looks to be a nice idea. We should certainly look into this > possibility. However can we look at this option probably a little later? > > Our plan was to do one step at a time i.e have the basic uprobes in > first and target the booster (i.e jump to the next instruction without > the need for single-stepping next). > > We could look at this option of using jump instead of int3 after we are > done with the booster. Hope that's okay. > I'm all for incremental development and merging, as long as we keep the interfaces flexible enough for the future. -- error compiling committee.c: too many arguments to function From fche at redhat.com Wed Jan 20 13:01:29 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Wed, 20 Jan 2010 08:01:29 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100120062834.GB12165@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> Message-ID: <20100120130129.GH16096@redhat.com> Hi - On Wed, Jan 20, 2010 at 07:28:34AM +0100, Ingo Molnar wrote: > [...] > > On the other hand, having ptrace/utrace in the -next tree will give it a > > lot more testing, while any outstanding technical issues are being addressed. > > Including experimental code that is RFC and which is not certain to go > upstream is certainly not the purpose of linux-next though. > [...] Ingo, you are mistaken. The utrace core and utrace/ptrace code were submitted and reviewed together on lkml, and are not considered experimental. I know the names may be confusing, but it is unnecessary to bring up uprobes and other RFC/experimental items. None of these are included in the branch I pointed sfr to. - FChE From fche at redhat.com Wed Jan 20 13:24:48 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Wed, 20 Jan 2010 08:24:48 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100120175959.59daa481.sfr@canb.auug.org.au> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120175959.59daa481.sfr@canb.auug.org.au> Message-ID: <20100120132448.GI16096@redhat.com> Hi - On Wed, Jan 20, 2010 at 05:59:59PM +1100, Stephen Rothwell wrote: > [...] > > Including experimental code that is RFC and which is not certain to go > > upstream is certainly not the purpose of linux-next though. > > Ingo is correct in what he says here. See the boilerplate: > [...] > Basically, this should be just what you would send to Linus (or ask him > to fetch)." > I will remove this tree from linux-next tomorrow and wait until it is > more ready for mainline inclusion. Please reconsider. Ingo mistook what was being proposed. We request merge/integration testing for just the set of patches posted , which was in response to peterz's earlier review comments, and none of which is labeled or considered RFC or experimental. Ananth was right that the utrace-ptrace git branch represents this rather than master. - FChE -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From sfr at canb.auug.org.au Wed Jan 20 14:38:22 2010 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Thu, 21 Jan 2010 01:38:22 +1100 Subject: linux-next: add utrace tree In-Reply-To: <20100120072925.GA11395@elte.hu> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> Message-ID: <20100121013822.28781960.sfr@canb.auug.org.au> Hi Ingo, Andrew, On Wed, 20 Jan 2010 08:29:25 +0100 Ingo Molnar wrote: > > > * Ingo Molnar wrote: > > > > > * Ananth N Mavinakayanahalli wrote: > > > > > On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: > > > > > > Ingo, > > > > > > > Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, > > > > etc.) can go uptream in its present form. > > > > > > Agreed, uprobes is still not upstream ready -- it was an RFC. We are > > > working through the comments there to get it ready for merger. > > > > > > > IMHO the far more important thing to address beyond formalities and workflow > > > > cleanliness are the (many) technical observations and objections offered by > > > > Peter Zijstra on lkml. Not just the git history but also the abstractions and > > > > concepts are messy and should be reworked IMO, and also good and working perf > > > > events integration should be achieved, etc. > > > > > > I think Oleg addressed most of Peter's concerns on utrace when the > > > ptrace/utrace patchset was reposted. > > > > Peter is Cc:-ed and he might want to chime in. > > > > > Perf integration with uprobes will be done and discussions have started with > > > Masami and Frederic. There are a couple of fundamental technical aspects > > > (XOL vma vs. emulation; breakpoint insertion through CoW and not through > > > quiesce) that need resolution. > > > > > > > The fact that there's a well established upstream workflow for instrumentation > > > > patches, which is being routed around by the utrace/uprobes/systemtap code > > > > here is not a good sign in terms of reaching a good upstream solution. Lets > > > > hope it works out well though. > > > > > > Agreed. > > > > > > On the other hand, having ptrace/utrace in the -next tree will give it a > > > lot more testing, while any outstanding technical issues are being addressed. > > > > Including experimental code that is RFC and which is not certain to go > > upstream is certainly not the purpose of linux-next though. > > > > It will cause conflicts with various other trees and increases the overhead > > all around. It also causes us to trust linux-next bugreports less - as it's > > not the 'next Linux' anymore. Also, there's virtually no high-level > > technical review done in linux-next: the trees are implicitly trusted > > (because they are pushed by maintainers), bugs and conflicts are reported > > but otherwise it's a neutral tree that includes pretty much any commit > > indiscriminately. > > > > If you need review and testing there's a number of trees you can get > > inclusion into. > > Btw., the utrace code has lived in -mm for quite some time - that's an > excellent route as Andrew does thorough review and testing. > > If Andrew agrees with this particular tree as-is and wants these bits to live > in linux-next and have it in -mm that way then that's a fair approach > obviously and i have no objections ... So, what is it to be? In or out? Frank, please be clear as to which branch you want included (master or utrace-ptrace). Also note that neither of those branches matches what was posted in the sense that they both have lots of history and merges not represented in the patches. (I assume that they do produce the same final source tree, though). > The point is to have at least one relevant maintainer request and track it and > then supervise the completion of it (which includes the resolution of all > outstanding objections) and then push it to Linus. If we do include it, it is still possible for people to decide (when the next merge window opens) that it is still not ready. It adds a bit of maybe unneeded complication to linux-next, but we had the same problem in this merge window and we have all survived. :-) In the end, Linus is the final arbitrator of course. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From mel at csn.ul.ie Wed Jan 20 15:57:53 2010 From: mel at csn.ul.ie (Mel Gorman) Date: Wed, 20 Jan 2010 15:57:53 +0000 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263820551.4283.499.camel@laptop> References: <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <1263816857.4283.381.camel@laptop> <4B5455FF.7010409@redhat.com> <1263820551.4283.499.camel@laptop> Message-ID: <20100120155753.GF5154@csn.ul.ie> On Mon, Jan 18, 2010 at 02:15:51PM +0100, Peter Zijlstra wrote: > On Mon, 2010-01-18 at 14:37 +0200, Avi Kivity wrote: > > On 01/18/2010 02:14 PM, Peter Zijlstra wrote: > > > > > >> Well, the alternatives are very unappealing. Emulation and > > >> single-stepping are going to be very slow compared to a couple of jumps. > > >> > > > With CPL2 or RPL on user segments the protection issue seems to be > > > manageable for running the instructions from kernel space. > > > > > > > CPL2 gives unrestricted access to the kernel address space; and RPL does > > not affect page level protection. Segment limits don't work on x86-64. > > But perhaps I missed something - these things are tricky. > > So setting RPL to 3 on the user segments allows access to kernel pages > just fine? How useful.. :/ > > > It should be possible to translate the instruction into an address space > > check, followed by the action, but that's still slower due to privilege > > level switches. > > Well, if you manage to do the address validation you don't need the priv > level switch anymore, right? > It also starts becoming very x86-centric though, doesn't it? It might kick other ports later. What is there at the moment is storing the copied instructions in a VMA. The most unpalatable part of that to me is that it's visible to userspace, probably via /proc/ and I didn't check, but I hope an munmap() from userspace cannot delete it. What the VMA has going for it is that it *appears* to be easier to port to other architectures than the alternatives, certainly easier to handle than instruction emulation. > Are the ins encodings sane enough to recognize mem parameters without > needing to know the actual ins? > > How about using a hw-breakpoint to close the gap for the inline single > step? You could even re-insert the int3 lazily when you need the > hw-breakpoint again. It would consume one hw-breakpoint register for > each task/cpu that has probes though.. > This feels very racy. Along with that, making these sort of changes was considered a risky venture on x86 and needed strong verification from elsewhere (http://lkml.org/lkml/2010/1/12/300). There are probably similar concerns on other architectures that would make a reliable port difficult. Right now the approach is with VMAs. The alternatives are 1. reserved XOL page (similar disadvantages to the VMA) 2. emulated instructions This is an emulation bug waiting to happen in my opinion and makes porting uprobes a significantly more difficult undertaking than either the XOL-VMA or XOL-page approach 3. XOL page in kernel space available at a different CPL This assumes all target architectures have a usable privilege ring which may be the case. However, I would guess that it is going to perform worse than the current approach because of the change in privilege level. No idea what the cost of a privilege level change is, but I doubt it's free 4. Boosted probes (arch-specific, apparently only x86 does this for kprobes) As unpalatable as the VMA is, I am failing to see why it's not a reasonable starting point with an understanding that 2 or 3 would be implemented in the future after the other architecture ports are in place and the reliability of the options as well as the performance can be measured. There would appear to be two classes of application that might suffer from the VMA. The first which need absolutly every single ounce of address space. The second which introspects itself via /proc/self/maps and makes decisions based on that. The first is unfortunate but should be a limited number of use cases. The second could be fudged by simply not exporting the information via /proc. I'm of the opinion it would be reasonable to let the VMA go ahead, look at the ports for the other architectures and revisit options 2 and 3 above to see if the VMA can really be removed with performance or reliability penalty. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab From andi at firstfloor.org Wed Jan 20 18:31:43 2010 From: andi at firstfloor.org (Andi Kleen) Date: Wed, 20 Jan 2010 19:31:43 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263923265.4998.28.camel@localhost.localdomain> (Jim Keniston's message of "Tue, 19 Jan 2010 09:47:45 -0800") References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> Message-ID: <87wrzc39ww.fsf@basil.nowhere.org> Jim Keniston writes: > > I don't know of any such plans, but I'd be interested to read more of > your thoughts here. As I understand it, you've suggested replacing the > probed instruction with a jump into an instrumentation vma (the XOL > area, or something similar). Masami has demonstrated -- through his > djprobes enhancement to kprobes -- that this can be done for many x86 > instructions. The big problem when doing this in user space is that for 64bit it has to be within 2GB of the probed code, otherwise you would need to rewrite the instruction to not use any rip relative addressing, which can be rather complicated (needs registers, but the instruction might already use them, so you would need a register allocator/spilling etc.) And that 2GB can be anywhere in the address space for shared libraries, which might well be already used. A lot of programs need large VM areas without holes. Also I personally would be unconfortable to let the instruction decoder be used by unpriviledged code. Who knows how many buffer overflows it has? In general the trend has been also to make traps faster in the CPU, make sure you're not optimizing for some old CPU here. -Andi -- ak at linux.intel.com -- Speaking for myself only. From andi at firstfloor.org Wed Jan 20 18:32:57 2010 From: andi at firstfloor.org (Andi Kleen) Date: Wed, 20 Jan 2010 19:32:57 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263816857.4283.381.camel@laptop> (Peter Zijlstra's message of "Mon, 18 Jan 2010 13:14:17 +0100") References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B5325CF.5000001@redhat.com> <1263740593.557.20967.camel@twins> <4B53661A.9090907@redhat.com> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <1263816857.4283.381.camel@laptop> Message-ID: <87ska039uu.fsf@basil.nowhere.org> Peter Zijlstra writes: > > With CPL2 or RPL on user segments the protection issue seems to be > manageable for running the instructions from kernel space. Nope -- it doesn't work on 64bit and even on 32bit can have large costs on some CPUs. Also designing 32bit only features in 2010 would seem rather .... unfortunate. -Andi -- ak at linux.intel.com -- Speaking for myself only. From mhiramat at redhat.com Wed Jan 20 19:31:40 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Wed, 20 Jan 2010 14:31:40 -0500 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100119180610.GA11005@nowhere> References: <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <20100119180610.GA11005@nowhere> Message-ID: <4B575A1C.6010206@redhat.com> Frederic Weisbecker wrote: > On Tue, Jan 19, 2010 at 09:47:45AM -0800, Jim Keniston wrote: >>> Do you have plans for a variant >>> that's completely in userspace? >> >> I don't know of any such plans, but I'd be interested to read more of >> your thoughts here. As I understand it, you've suggested replacing the >> probed instruction with a jump into an instrumentation vma (the XOL >> area, or something similar). Masami has demonstrated -- through his >> djprobes enhancement to kprobes -- that this can be done for many x86 >> instructions. >> >> What does the code in the jumped-to vma do? Is the instrumentation code >> that corresponds to the uprobe handlers encoded in an ad hoc .so? > > > Once the instrumentation is requested by a process that is not the > instrumented one, this looks impossible to set a uprobe without a > minimal voluntary collaboration from the instrumented process > (events sent through IPC or whatever). So that looks too limited, > this is not anymore a true dynamic uprobe. Agreed. Since uprobe's handler must be running in kernel, we need to jump into kernel space anyway. "Booster" (just skips a single-stepping(trap) exception) may be useful for improving uprobe performance. And also as Andi said, using jump instead of int3 in userspace has 2GB address space limitation. It's not a problem for kernel inside, but a big problem in userspace. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From jkenisto at us.ibm.com Wed Jan 20 19:34:12 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 20 Jan 2010 11:34:12 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <87wrzc39ww.fsf@basil.nowhere.org> References: <1263740593.557.20967.camel@twins> <1263800752.4283.19.camel@laptop> <4B543F93.3060509@redhat.com> <1263815072.4283.305.camel@laptop> <4B544D7C.2060708@redhat.com> <1263816396.4283.361.camel@laptop> <4B544F8E.1080603@redhat.com> <84144f021001180413w76a8ca2axb0b9f07ee4dea67e@mail.gmail.com> <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <87wrzc39ww.fsf@basil.nowhere.org> Message-ID: <1264016052.5122.40.camel@localhost.localdomain> On Wed, 2010-01-20 at 19:31 +0100, Andi Kleen wrote: > Jim Keniston writes: > > > > I don't know of any such plans, but I'd be interested to read more of > > your thoughts here. As I understand it, you've suggested replacing the > > probed instruction with a jump into an instrumentation vma (the XOL > > area, or something similar). Masami has demonstrated -- through his > > djprobes enhancement to kprobes -- that this can be done for many x86 > > instructions. > > The big problem when doing this in user space is that for 64bit > it has to be within 2GB of the probed code, otherwise you would > need to rewrite the instruction to not use any rip relative addressing, > which can be rather complicated (needs registers, but the instruction > might already use them, so you would need a register allocator/spilling etc.) I'm probably telling you stuff you already know, but... Re: jumps longer than 2GB: The following 14-byte sequence seems to work: jmpq *(%rip) .quad next_insn where next_insn is the address of the instruction to which we want to jump. We'd need this for boosting, anyway -- to jump from the XOL area back to the probed instruction stream. I think djprobes inserts a 5-byte jump at the probepoint; I don't know whether a 14-byte jump would introduce new difficulties. Re: rewriting instructions that use rip-relative addressing. We do that now. See handle_riprel_insn() in patch #2. (As far as we can tell, it works, but we'd appreciate your review of it.) > > And that 2GB can be anywhere in the address space for shared > libraries, which might well be already used. A lot of programs > need large VM areas without holes. > > Also I personally would be unconfortable to let the instruction > decoder be used by unpriviledged code. Who knows how > many buffer overflows it has? The instruction decoder is used only during instruction analysis, while registering the probe -- i.e., in kernel space. > > In general the trend has been also to make traps faster in the CPU, make > sure you're not optimizing for some old CPU here. I won't argue with that. What Avi seems to be proposing buys us a speedup, but at the cost of increased complexity -- among other things, splitting the instrumentation code between user space (in the "XOL" area -- which would then be used for much more than XOL instruction slots) and kernel space. The splitting would presumably be handled by higher-level code -- SystemTap, perf, or whatever. It's a neat idea, but it seems like a v2 kind of feature. > > -Andi Jim From andi at firstfloor.org Wed Jan 20 19:58:26 2010 From: andi at firstfloor.org (Andi Kleen) Date: Wed, 20 Jan 2010 20:58:26 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1264016052.5122.40.camel@localhost.localdomain> References: <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <87wrzc39ww.fsf@basil.nowhere.org> <1264016052.5122.40.camel@localhost.localdomain> Message-ID: <20100120195826.GB24355@basil.fritz.box> > Re: rewriting instructions that use rip-relative addressing. We do that > now. See handle_riprel_insn() in patch #2. (As far as we can tell, it > works, but we'd appreciate your review of it.) Yes, but how do you get within 2GB of it? Add lots of holes in the address space? > The instruction decoder is used only during instruction analysis, while > registering the probe -- i.e., in kernel space. Registering the user probe? That means if there's a buffer overflow in there it would be exploitable. > > > > In general the trend has been also to make traps faster in the CPU, make > > sure you're not optimizing for some old CPU here. > > I won't argue with that. What Avi seems to be proposing buys us a > speedup, but at the cost of increased complexity -- among other things, > splitting the instrumentation code between user space (in the "XOL" area > -- which would then be used for much more than XOL instruction slots) You can't have a single XOL area, at least not if you want to support shared libraries on 64bit & rip relative. > and kernel space. The splitting would presumably be handled by > higher-level code -- SystemTap, perf, or whatever. It's a neat idea, > but it seems like a v2 kind of feature. I'm not sure it can even work, unless you severly limited the allowed instructions. -Andi -- ak at linux.intel.com -- Speaking for myself only. From jkenisto at us.ibm.com Wed Jan 20 20:28:45 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 20 Jan 2010 12:28:45 -0800 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100120195826.GB24355@basil.fritz.box> References: <4B545146.3080001@redhat.com> <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <87wrzc39ww.fsf@basil.nowhere.org> <1264016052.5122.40.camel@localhost.localdomain> <20100120195826.GB24355@basil.fritz.box> Message-ID: <1264019325.5122.62.camel@localhost.localdomain> On Wed, 2010-01-20 at 20:58 +0100, Andi Kleen wrote: > > Re: rewriting instructions that use rip-relative addressing. We do that > > now. See handle_riprel_insn() in patch #2. (As far as we can tell, it > > works, but we'd appreciate your review of it.) > > Yes, but how do you get within 2GB of it? I'm not sure what you're asking. To jump between the probed instruction stream and the XOL area, I've proposed jmpq *(%rip) .quad next_insn next_insn is a 64-bit address, which presumably allows you to jump to anywhere in the address space. To read/write the memory addressed by a rip-relative instruction, we convert the rip-relative addressing to indirect addressing through a 64-bit scratch register (whose saved value we restore before returning to the probed instruction stream). > Add lots of holes > in the address space? No. > > > The instruction decoder is used only during instruction analysis, while > > registering the probe -- i.e., in kernel space. > > Registering the user probe? That means if there's a buffer overflow > in there it would be exploitable. Certainly a poorly written probe handler would be a problem. Could you explain further what you mean? Are you talking about a buffer overflow in the probed program? in the probe handler? in uprobes? > > > > > > > In general the trend has been also to make traps faster in the CPU, make > > > sure you're not optimizing for some old CPU here. > > > > I won't argue with that. What Avi seems to be proposing buys us a > > speedup, but at the cost of increased complexity -- among other things, > > splitting the instrumentation code between user space (in the "XOL" area > > -- which would then be used for much more than XOL instruction slots) > > You can't have a single XOL area, at least not if you want to support > shared libraries on 64bit & rip relative. I disagree. See above. > > > and kernel space. The splitting would presumably be handled by > > higher-level code -- SystemTap, perf, or whatever. It's a neat idea, > > but it seems like a v2 kind of feature. > > I'm not sure it can even work, unless you severly limited the allowed > instructions. I'm not sure it can work, either. But I still believe that we've addressed the known issues wrt the big x86_64 address space. > > -Andi > Thanks. Jim From roland at redhat.com Thu Jan 21 01:22:38 2010 From: roland at redhat.com (Roland McGrath) Date: Wed, 20 Jan 2010 17:22:38 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: Stephen Rothwell's message of Thursday, 21 January 2010 01:38:22 +1100 <20100121013822.28781960.sfr@canb.auug.org.au> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> Message-ID: <20100121012238.4172F1566@magilla.sf.frob.com> > Frank, please be clear as to which branch you want included (master or > utrace-ptrace). Also note that neither of those branches matches what > was posted in the sense that they both have lots of history and merges > not represented in the patches. (I assume that they do produce the same > final source tree, though). Yes, the trees do match. I certainly never expected our ancient git history to get merged in directly upstream. I've made a new branch on: git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git called: next/master (Actually it's on master.kernel.org and the public mirror is being a little slow as I write this.) This starts from v2.6.33-rc4 and then has commits for the 7 patches that Oleg posted in December. Beyond that, we've added one follow-on patch to fix a bug Oleg just tracked down (Oleg will post that patch soon). And I've added one more commit with a MAINTAINERS update, shown below. You can also find the same stuff from the series file and patch files in: http://people.redhat.com/utrace/2.6-next/ If it makes things easier for linux-next to have this git branch either rebased or merged from a different fork point, please let me know. Thanks, Roland --- [PATCH] MAINTAINERS: add utrace This updates the ptrace entry to cover utrace too. They are part of the same maintenance effort. Also add the utrace mailing list. Signed-off-by: Roland McGrath --- MAINTAINERS | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index c8f47bf..8da2a0a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4375,15 +4375,18 @@ M: Jim Paris L: cbe-oss-dev at ozlabs.org S: Maintained -PTRACE SUPPORT +PTRACE AND UTRACE SUPPORT M: Roland McGrath M: Oleg Nesterov +L: utrace-devel at redhat.com S: Maintained F: include/asm-generic/syscall.h F: include/linux/ptrace.h F: include/linux/regset.h F: include/linux/tracehook.h -F: kernel/ptrace.c +F: include/linux/utrace.h +F: kernel/ptrace* +F: kernel/utrace* PVRUSB2 VIDEO4LINUX DRIVER M: Mike Isely From cornel at upload-ro.ro Thu Jan 21 17:15:43 2010 From: cornel at upload-ro.ro (cornel) Date: Thu, 21 Jan 2010 09:15:43 -0800 Subject: Untitled-1 Message-ID: <20100120.IJZQYMPYZZUBSIMP@upload-ro.ro> An HTML attachment was scrubbed... URL: From envoi at bdop89.info Thu Jan 21 06:10:09 2010 From: envoi at bdop89.info (Celine de Fizeo) Date: Thu, 21 Jan 2010 08:10:09 +0200 Subject: =?UTF-8?Q?Localisez_vos_v=C3=A9hicules_en_temps_r=C3=A9el?= Message-ID: An HTML attachment was scrubbed... URL: From oleg at redhat.com Thu Jan 21 16:39:21 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 21 Jan 2010 17:39:21 +0100 Subject: [PATCH -mm] utrace: fix utrace_maybe_reap() vs find_matching_engine() race Message-ID: <20100121163921.GA10096@redhat.com> (on top of utrace-core.patch) The comment in utrace_maybe_reap() correctly explains why utrace_attach_task/utrace_control/etc can't modify or use attaching/attached lists. But find_matching_engine() can scan ->attached under utrace->lock without any checks, it can race with utrace_maybe_reap() destroying list nodes. Change utrace_maybe_reap() to empty ->attached before it drops utrace->lock, update the comments a bit. Reported-by: CAI Qian Signed-off-by: Oleg Nesterov Signed-off-by: Roland McGrath --- kernel/utrace.c | 23 ++++++++++++++++------- 1 files changed, 16 insertions(+), 7 deletions(-) --- V1/kernel/utrace.c~8_REAP_FIND_RACE 2009-12-18 01:58:37.000000000 +0100 +++ V1/kernel/utrace.c 2010-01-21 17:31:18.000000000 +0100 @@ -1,7 +1,7 @@ /* * utrace infrastructure interface for debugging user processes * - * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved. + * Copyright (C) 2006-2010 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -859,6 +859,7 @@ void utrace_maybe_reap(struct task_struc bool reap) { struct utrace_engine *engine, *next; + struct list_head attached; spin_lock(&utrace->lock); @@ -897,16 +898,24 @@ void utrace_maybe_reap(struct task_struc } /* - * utrace_add_engine() checks ->utrace_flags != 0. - * Since @utrace->reap is set, nobody can set or clear - * UTRACE_EVENT(REAP) in @engine->flags or change - * @engine->ops, and nobody can change @utrace->attached. + * utrace_add_engine() checks ->utrace_flags != 0. Since + * @utrace->reap is set, nobody can set or clear UTRACE_EVENT(REAP) + * in @engine->flags or change @engine->ops and nobody can change + * @utrace->attached after we drop the lock. */ target->utrace_flags = 0; - splice_attaching(utrace); + + /* + * We clear out @utrace->attached before we drop the lock so + * that find_matching_engine() can't come across any old engine + * while we are busy tearing it down. + */ + list_replace_init(&utrace->attached, &attached); + list_splice_tail_init(&utrace->attaching, &attached); + spin_unlock(&utrace->lock); - list_for_each_entry_safe(engine, next, &utrace->attached, entry) { + list_for_each_entry_safe(engine, next, &attached, entry) { if (engine->flags & UTRACE_EVENT(REAP)) engine->ops->report_reap(engine, target); From forsythet.t at gmail.com Fri Jan 22 23:49:57 2010 From: forsythet.t at gmail.com (Forsythe, Tory) Date: Fri, 22 Jan 2010 15:49:57 -0800 Subject: an appraisal Message-ID: <85243900.20100122154957@gmail.com> Dear Utrace-devel Your site should be at the top of the major search engines. Want a free site analysis? If interested, just reply to this email and we can give you a free appraisal with no strings. Sincerely Tory Forsythe Turbo-Media utrace-devel at redhat.com 1/22/2010 From oleg at redhat.com Thu Jan 21 20:32:07 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 21 Jan 2010 21:32:07 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100107100050.31724463@mschwide.boeblingen.de.ibm.com> References: <1503844142.2061111261478093776.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1257887498.2061171261478252049.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <20100104155225.GA16650@redhat.com> <20100104171626.22ea2d9c@mschwide.boeblingen.de.ibm.com> <20100104181412.GA21146@redhat.com> <20100104211147.4CC94D532@magilla.sf.frob.com> <20100105105030.66bb8a0a@mschwide.boeblingen.de.ibm.com> <20100106205633.700CC134D@magilla.sf.frob.com> <20100107100050.31724463@mschwide.boeblingen.de.ibm.com> Message-ID: <20100121203207.GA20050@redhat.com> On 01/07, Martin Schwidefsky wrote: > > On Wed, 6 Jan 2010 12:56:33 -0800 (PST) > Roland McGrath wrote: > > > In other circumstances with utrace, it is very possible to wind up with > > user_disable_single_step being called superfluously when there was no > > stop (and so not necessarily any context switch or other high overhead). > > On other machines, user_disable_single_step is pretty cheap even where > > user_enable_single_step is quite costly. Given how simple and cheap it > > is to short-circuit the excess work on s390, I think it is worthwhile. > > We could use the same compare of the control registers as the code in > __switch_to. See below. FYI, I tested your c3311c13adc1021e986fef12609ceb395ffc5014 commit which does this optimization (compared to the patch you sent previously), it works fine. But please see another email I am going to send... Oleg. From oleg at redhat.com Thu Jan 21 20:51:13 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Thu, 21 Jan 2010 21:51:13 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100107214821.94FF97300@magilla.sf.frob.com> References: <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> <20100107175446.GA13300@redhat.com> <20100107214821.94FF97300@magilla.sf.frob.com> Message-ID: <20100121205113.GB20050@redhat.com> On 01/07, Roland McGrath wrote: > > > I am confused as well. Yes, I thought about regs->psw.mask change too, > > but I don't understand why it helps.. > [...] > > But. Acoording to the testing I did (unless I did something wrong > > again) this patch doesn't make any difference in this particular > > case. 6580807da14c423f0d0a708108e6df6ebc8bc83d does. > > Those results are quite mysterious to me. > I think we'll have to get Martin to sort it out definitively. I did the testing again with 2.6.32-5.el6 + Martin's c3311c13adc1021e986fef12609ceb395ffc5014 f8d5faf718c9ff2c04eb8484585d4963c4111cd7 patches. the same test-case: #include #include #include #include #include #include int main(void) { int pid, status; if (!(pid = fork())) { assert(ptrace(PTRACE_TRACEME) == 0); kill(getpid(), SIGSTOP); if (!fork()) return 43; wait(&status); return WEXITSTATUS(status); } for (;;) { assert(pid == wait(&status)); if (WIFEXITED(status)) break; assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); } assert(WEXITSTATUS(status) == 43); return 0; } with the simple debugging patch below I did # perl -e 'syscall 172, 666, 0,0'; ./xxx # perl -e 'syscall 172, 666, 0,1'; ./xxx # perl -e 'syscall 172, 666, 1,0'; ./xxx # perl -e 'syscall 172, 666, 1,1'; ./xxx and dmesg reports: XXX disable_step=0, clear_flag=0 XXX: xxx/1868 0 [... 799 times ...] XXX disable_step=0, clear_flag=1 XXX: xxx/1905 0 [... 799 times ...] XXX disable_step=1, clear_flag=0 XXX disable_step=1, clear_flag=1 Just in case, I did the testing with and without CONFIG_UTRACE, result is the same. IOW, copy_thread()->clear_tsk_thread_flag(TIF_SINGLE_STEP) doesn't make any difference, copy_process()->user_disable_single_step() does. Although I need to re-read Martin's explanations about psw magic, perhaps this was already explained... Oleg. --- K/kernel/sys.c~ 2010-01-21 14:16:15.366639654 -0500 +++ K/kernel/sys.c 2010-01-21 14:30:35.131591879 -0500 @@ -1453,6 +1453,8 @@ SYSCALL_DEFINE1(umask, int, mask) return mask; } +int xxx_disable_step, xxx_clear_flag; + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { @@ -1466,6 +1468,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsi error = 0; switch (option) { + case 666: + xxx_disable_step = arg2; + xxx_clear_flag = arg3; + printk(KERN_INFO "XXX disable_step=%d, clear_flag=%d\n", + xxx_disable_step, xxx_clear_flag); + break; + case PR_SET_PDEATHSIG: if (!valid_signal(arg2)) { error = -EINVAL; --- K/kernel/fork.c~ 2010-01-18 09:35:16.823811008 -0500 +++ K/kernel/fork.c 2010-01-21 14:29:39.131624971 -0500 @@ -964,6 +964,8 @@ static void posix_cpu_timers_init(struct INIT_LIST_HEAD(&tsk->cpu_timers[2]); } +extern int xxx_disable_step; + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -1207,7 +1209,8 @@ static struct task_struct *copy_process( * Syscall tracing and stepping should be turned off in the * child regardless of CLONE_PTRACE. */ - user_disable_single_step(p); + if (xxx_disable_step) + user_disable_single_step(p); clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE); #ifdef TIF_SYSCALL_EMU clear_tsk_thread_flag(p, TIF_SYSCALL_EMU); --- K/arch/s390/kernel/process.c~ 2010-01-21 14:32:38.541609793 -0500 +++ K/arch/s390/kernel/process.c 2010-01-21 14:34:10.461584130 -0500 @@ -161,6 +161,8 @@ void release_thread(struct task_struct * { } +extern int xxx_clear_flag; + int copy_thread(unsigned long clone_flags, unsigned long new_stackp, unsigned long unused, struct task_struct *p, struct pt_regs *regs) @@ -217,7 +219,8 @@ int copy_thread(unsigned long clone_flag p->thread.mm_segment = get_fs(); /* Don't copy debug registers */ memset(&p->thread.per_info, 0, sizeof(p->thread.per_info)); - clear_tsk_thread_flag(p, TIF_SINGLE_STEP); + if (xxx_clear_flag) + clear_tsk_thread_flag(p, TIF_SINGLE_STEP); /* Initialize per thread user and system timer values */ ti = task_thread_info(p); ti->user_timer = 0; From sfr at canb.auug.org.au Fri Jan 22 00:17:47 2010 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Fri, 22 Jan 2010 11:17:47 +1100 Subject: linux-next: add utrace tree In-Reply-To: <20100121013822.28781960.sfr@canb.auug.org.au> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> Message-ID: <20100122111747.3c224dfd.sfr@canb.auug.org.au> Hi Ingo, Andrew, Any thoughts? On Thu, 21 Jan 2010 01:38:22 +1100 Stephen Rothwell wrote: > > On Wed, 20 Jan 2010 08:29:25 +0100 Ingo Molnar wrote: > > > > > > * Ingo Molnar wrote: > > > > > > > > * Ananth N Mavinakayanahalli wrote: > > > > > > > On Wed, Jan 20, 2010 at 06:49:50AM +0100, Ingo Molnar wrote: > > > > > > > > Ingo, > > > > > > > > > Note, i'm not yet convinced that this (and the rest: uprobes and systemtap, > > > > > etc.) can go uptream in its present form. > > > > > > > > Agreed, uprobes is still not upstream ready -- it was an RFC. We are > > > > working through the comments there to get it ready for merger. > > > > > > > > > IMHO the far more important thing to address beyond formalities and workflow > > > > > cleanliness are the (many) technical observations and objections offered by > > > > > Peter Zijstra on lkml. Not just the git history but also the abstractions and > > > > > concepts are messy and should be reworked IMO, and also good and working perf > > > > > events integration should be achieved, etc. > > > > > > > > I think Oleg addressed most of Peter's concerns on utrace when the > > > > ptrace/utrace patchset was reposted. > > > > > > Peter is Cc:-ed and he might want to chime in. > > > > > > > Perf integration with uprobes will be done and discussions have started with > > > > Masami and Frederic. There are a couple of fundamental technical aspects > > > > (XOL vma vs. emulation; breakpoint insertion through CoW and not through > > > > quiesce) that need resolution. > > > > > > > > > The fact that there's a well established upstream workflow for instrumentation > > > > > patches, which is being routed around by the utrace/uprobes/systemtap code > > > > > here is not a good sign in terms of reaching a good upstream solution. Lets > > > > > hope it works out well though. > > > > > > > > Agreed. > > > > > > > > On the other hand, having ptrace/utrace in the -next tree will give it a > > > > lot more testing, while any outstanding technical issues are being addressed. > > > > > > Including experimental code that is RFC and which is not certain to go > > > upstream is certainly not the purpose of linux-next though. > > > > > > It will cause conflicts with various other trees and increases the overhead > > > all around. It also causes us to trust linux-next bugreports less - as it's > > > not the 'next Linux' anymore. Also, there's virtually no high-level > > > technical review done in linux-next: the trees are implicitly trusted > > > (because they are pushed by maintainers), bugs and conflicts are reported > > > but otherwise it's a neutral tree that includes pretty much any commit > > > indiscriminately. > > > > > > If you need review and testing there's a number of trees you can get > > > inclusion into. > > > > Btw., the utrace code has lived in -mm for quite some time - that's an > > excellent route as Andrew does thorough review and testing. > > > > If Andrew agrees with this particular tree as-is and wants these bits to live > > in linux-next and have it in -mm that way then that's a fair approach > > obviously and i have no objections ... > > So, what is it to be? In or out? > > Frank, please be clear as to which branch you want included (master or > utrace-ptrace). Also note that neither of those branches matches what > was posted in the sense that they both have lots of history and merges > not represented in the patches. (I assume that they do produce the same > final source tree, though). > > > The point is to have at least one relevant maintainer request and track it and > > then supervise the completion of it (which includes the resolution of all > > outstanding objections) and then push it to Linus. > > If we do include it, it is still possible for people to decide (when the > next merge window opens) that it is still not ready. It adds a bit of > maybe unneeded complication to linux-next, but we had the same problem in > this merge window and we have all survived. :-) > > In the end, Linus is the final arbitrator of course. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From akpm at linux-foundation.org Fri Jan 22 00:30:04 2010 From: akpm at linux-foundation.org (Andrew Morton) Date: Thu, 21 Jan 2010 16:30:04 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100122111747.3c224dfd.sfr@canb.auug.org.au> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> Message-ID: <20100121163004.8779bd69.akpm@linux-foundation.org> On Fri, 22 Jan 2010 11:17:47 +1100 Stephen Rothwell wrote: > Any thoughts? I'm nearly a week behind again and am trying to avoid thinking. I've had a (n old) version of utrace in -mm for ages and it didn't break anything. I still don't think I've seen a really compelling reason for merging it. At least, I wouldn't be able to explain why we did it. But presumably there _are_ such reasons, because it was a lot of development work. Someone please sell this to us. From akpm at linux-foundation.org Fri Jan 22 00:31:45 2010 From: akpm at linux-foundation.org (Andrew Morton) Date: Thu, 21 Jan 2010 16:31:45 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100121163004.8779bd69.akpm@linux-foundation.org> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> Message-ID: <20100121163145.7e958c3f.akpm@linux-foundation.org> On Thu, 21 Jan 2010 16:30:04 -0800 Andrew Morton wrote: > Someone please sell this to us. Here's what Oleg said last time I asked this: First of all, utrace makes other things possible. gdbstub, nondestructive core dump, uprobes, kmview, hopefully more. I didn't look at these projects closely, perhaps other people can tell more. As for their merge status, until utrace itself is merged it is very hard to develop them out of tree. To me, even seccomp is the good example why utrace is useful. seccomp is simple, but it needs hooks in arch/ hot pathes. Contrary, utrace-based implementation is more flexible, simple, and it is completely "hidden" behind utrace. In my opinion, ptrace-utrace is another example. Once CONFIG_UTRACE goes away, we can remove almost all ptrace-related code from core kernel (and kill task_struct->ptrace/etc members). ftrace/etc is excellent in many ways, but even if we need the simple "passive" tracing it is not enough sometimes. And we have nothing else except ptrace currently. But ptrace is so horrible and unfixeable, and it has so many limitations. In fact, even the simple things like stop/ continue this thread/process are not trivial using ptrace, gdb/strace have to do a lot of hacks to overcome ptrace's limitations, and some of these hacks falls into "mostly works, but that is all" category. Of course, I can't promise we will have the new gdb which explores utrace facilities soon, but I think at least utrace gives a chance. From fche at redhat.com Fri Jan 22 00:51:47 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Thu, 21 Jan 2010 19:51:47 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100121163145.7e958c3f.akpm@linux-foundation.org> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> Message-ID: <20100122005147.GD22003@redhat.com> Hi - On Thu, Jan 21, 2010 at 04:31:45PM -0800, Andrew Morton wrote: > [...] > > Someone please sell this to us. > Here's what Oleg said last time I asked this: [...] I wonder if Roland/Oleg are being too modest in their current role as ptrace maintainers. Considering that *they* think of utrace as a means toward proper refactoring of ptrace, how much further burden of proof should they shoulder? To what extent are other subsystem maintainers required to "sell" reworkings of their areas, when there appear to be no drawbacks and at least arguable benefits? - FChE From akpm at linux-foundation.org Fri Jan 22 01:05:41 2010 From: akpm at linux-foundation.org (Andrew Morton) Date: Thu, 21 Jan 2010 17:05:41 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100122005147.GD22003@redhat.com> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> Message-ID: <20100121170541.7425ff10.akpm@linux-foundation.org> On Thu, 21 Jan 2010 19:51:47 -0500 "Frank Ch. Eigler" wrote: > Hi - > > On Thu, Jan 21, 2010 at 04:31:45PM -0800, Andrew Morton wrote: > > [...] > > > Someone please sell this to us. > > Here's what Oleg said last time I asked this: [...] > > I wonder if Roland/Oleg are being too modest in their current role as > ptrace maintainers. Considering that *they* think of utrace as a > means toward proper refactoring of ptrace, how much further burden of > proof should they shoulder? To what extent are other subsystem > maintainers required to "sell" reworkings of their areas, when there > appear to be no drawbacks and at least arguable benefits? > ptrace is a nasty, complex part of the kernel which has a long history of problems, but it's all been pretty quiet in there for the the past few years. This leads one to expect that a rip-out-n-rewrite is a high-risk prospect. So, quite reasonably, one looks for a good reason for taking such risk. It's not really appropriate to generalise from other subsystem maintainer's reworkings onto ptrace. It's very rare that we'd make a change this radical to a tricky part of core kernel. From fche at redhat.com Fri Jan 22 01:25:16 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Thu, 21 Jan 2010 20:25:16 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100121170541.7425ff10.akpm@linux-foundation.org> References: <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> Message-ID: <20100122012516.GE22003@redhat.com> Hi - On Thu, Jan 21, 2010 at 05:05:41PM -0800, Andrew Morton wrote: > [...] ptrace is a nasty, complex part of the kernel which has a > long history of problems, but it's all been pretty quiet in there > for the the past few years. This leads one to expect that a > rip-out-n-rewrite is a high-risk prospect. So, quite reasonably, > one looks for a good reason for taking such risk. [...] To the extent the discussion is colored by risk avoidance, then the answer to that would consist of code reviews, and of course a look at the actual historical reliability of this code. While some might enjoy reminding us about the brief kerneloops incident in 2008, let's keep in mind that versions of this code has been deployed in fedora and rhel for several *years*, with millions of users. It's not some rickety experiment. To the extent the discussion is colored by the new features enabled from this refactoring, well, there is Oleg's list which may or may not have mentioned enabling systemtap's user-space probing. More details can be furnished on demand. Several of the use examples were constructed in good faith upon request from the kernel community asking for more and more. But what's enough? Who knows, really? - FChE From torvalds at linux-foundation.org Fri Jan 22 01:28:42 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Thu, 21 Jan 2010 17:28:42 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100121170541.7425ff10.akpm@linux-foundation.org> References: <20100119211646.GF16096@redhat.com> <20100120111220.e7fb4e2c.sfr@canb.auug.org.au> <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> Message-ID: On Thu, 21 Jan 2010, Andrew Morton wrote: > > ptrace is a nasty, complex part of the kernel which has a long history > of problems, but it's all been pretty quiet in there for the the past few > years. More importantly, we're not ever going to get rid of it. Quite frankly, judging my all past history we have ever seen in kernel interfaces, new an non-portable interfaces simply are never used. The whole question whether they are nicer or not is entirely immaterial. I'm personally very dubious that there are any merits to utrace that outweigh the very clear disadvantages: just another layer that adds a new level of abstraction to the only interface that people actually _use_, namely ptrace. But I haven't followed utrace. I doubt _anybody_ has, except for the utrace people themselves. Linus From torvalds at linux-foundation.org Fri Jan 22 01:32:47 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Thu, 21 Jan 2010 17:32:47 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100122012516.GE22003@redhat.com> References: <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> Message-ID: On Thu, 21 Jan 2010, Frank Ch. Eigler wrote: > > To the extent the discussion is colored by the new features enabled > from this refactoring, well, there is Oleg's list which may or may not > have mentioned enabling systemtap's user-space probing. Let's face it, system tap isn't going to be merged, so why even bring it up? Every kernel developer I have _ever_ seen agrees that all the new tracing is a million times superior. I'm sure there are system tap people who disagree, but quite frankly, I don't see it being merged considering how little the system tap people ever did for the kernel. So if things like system tap and "security models that go behind the kernel by tying into utrace" are the reasons for utrace, color me utterly uninterested. In fact, color me actively hostile. I think that's the worst possible situation that we'd ever be in as kernel people (namely exactly the "do things in kernel space by hiding behind utrace without having kernel people involved") Linus From fche at redhat.com Fri Jan 22 02:22:55 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Thu, 21 Jan 2010 21:22:55 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> Message-ID: <20100122022255.GF22003@redhat.com> Hi - On Thu, Jan 21, 2010 at 05:32:47PM -0800, Linus Torvalds wrote: > [...] > > To the extent the discussion is colored by the new features enabled > > from this refactoring, well, there is Oleg's list which may or may not > > have mentioned enabling systemtap's user-space probing. > > Let's face it, system tap isn't going to be merged, so why even bring it > up? It was certainly not meant to derail the discussion about the merits of utrace as a useful cleanup API in its own right, but rather to be an example of what kinds of things become straightforward in its presence. You may be aware of nascent efforts to bring the same uprobes infrastructure to perf. > Every kernel developer I have _ever_ seen agrees that all the new > tracing is a million times superior. [...] And that is fine. We believe there is plenty of space in the problem domain for different approaches. > ... considering how little the system tap people ever did for the kernel. Less passionate analysis would identify a long history of contribution by the the greater affiliated team, including via merged code and by and passing on requirements and experiences. We have been trying to share as much as you have been willing to take. While systemtap's current codebase may not (and need not) have a future inside the kernel, chances are good that improvements in common infrastructure will allow systemtap to shrink and change enough that the question becomes moot. - FChE From torvalds at linux-foundation.org Fri Jan 22 02:35:26 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Thu, 21 Jan 2010 18:35:26 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100122022255.GF22003@redhat.com> References: <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> <20100122022255.GF22003@redhat.com> Message-ID: On Thu, 21 Jan 2010, Frank Ch. Eigler wrote: > > Less passionate analysis would identify a long history of contribution > by the the greater affiliated team, including via merged code and by > and passing on requirements and experiences. The reason I'm so passionate is that I dislike the turn the discussion was taking, as if "utrace" was somehow _good_ because it allowed various other interfaces to hide behind it. And I'm not at all convinced that is true. And I really didn't want to single out system tap, I very much feel the same way abotu some seccomp-replacement "security model that the kernel doesn't even need to know about" thing. So don't take the systemtap part to be the important part, it's the bigger issue of "I'd much rather have explicit interfaces than have generic hooks that people can then use in any random way". I realize that my argument is very anti-thetical to the normal CS teaching of "general-purpose is good". I often feel that very specific code with very clearly defined (and limited) applicability is a good thing - I'd rather have just a very specific ptrace layer that does nothing but ptrace, than a "generic plugin layer that can be layered under ptrace and other things". In one case, you know exactly what the users are, and what the semantics are going to be. In the other, you don't. So I really want to see a very big and immediate upside from utrace. Because to me, the "it's a generic layer with any application you want to throw at it" is a _downside_. Linus From hjlvincent at hanjinlogistics.com Fri Jan 22 03:50:21 2010 From: hjlvincent at hanjinlogistics.com (vincentxiao) Date: Fri, 22 Jan 2010 11:50:21 +0800 Subject: =?gb2312?B?uqu9+LSstqvO78H3sr8=?= Message-ID: <358B4BC140EF42F1A5AA995DBEB896F4@altszdn.com> ????????? ????????????????????????????????????? ??????????????????????? ?????????????????????????????????????? ?1-2????????????????????????????????? ?????????????????????? Hanjin Logistics, Inc. [HJL] was founded in 2001 with the vision of becoming a multiservice domestic transportation entity through Hanjin Shipping's available resources within North America.? However, HJL recognized our customers' needs and demands for more sophisticated logistics services due to the ever-increasing complexity of their supply chain structures. Rising up to the challenge, HJL has focused on expanding its scope of services; providing global coverage as a Third-Party Logistics Service Provider. Through competencies in Warehousing, Trucking, Customs Clearance, Freight Forwarding, Transloading, and IT services, HJL would like to invite you to experience the true benefits of 3PL services. HanJin shipping & logistics shenzhen office ADD: Rm 23/F, China Resources Building, No. 5001 Shennan Road East, Shenzhen 518001, Guangdong,PRC Mobile : 86-13714654094/18922874094 Tel : 86-755-82690122* 257 Fax: 86-755-82690182 Msn: sz_forwarder at hotmail.com Email:hjlvincent at hanjinlogistics.com Web site : Http:// www.hanjinlogistics.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 62863 bytes Desc: not available URL: From ananth at in.ibm.com Fri Jan 22 05:21:39 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 22 Jan 2010 10:51:39 +0530 Subject: linux-next: add utrace tree In-Reply-To: References: <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> Message-ID: <20100122052139.GA20532@in.ibm.com> On Thu, Jan 21, 2010 at 05:28:42PM -0800, Linus Torvalds wrote: > > > On Thu, 21 Jan 2010, Andrew Morton wrote: > > > > ptrace is a nasty, complex part of the kernel which has a long history > > of problems, but it's all been pretty quiet in there for the the past few > > years. > > More importantly, we're not ever going to get rid of it. FWIW, Oleg's implementation of ptrace over utrace is 100% compatible with legacy ptrace; gdb testsuite indicates that (http://lkml.org/lkml/2009/12/21/98). Ananth From srikar at linux.vnet.ibm.com Fri Jan 22 05:27:57 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Fri, 22 Jan 2010 10:57:57 +0530 Subject: Fw: Re: linux-next: add utrace tree Message-ID: <20100122052757.GA23114@linux.vnet.ibm.com> Hi Roland, Oleg, Would it be a good idea to probably start looking at user space api for utrace? By doing that we would get usecases that maintainers in LKML are looking for and start looking at its usefulness. Currently its probably a egg and chicken case where they look at what end customers are getting that additional benefit from utrace and we are looking at providing the user interface after the bits go in. -- Thanks and Regards Srikar -------------- next part -------------- An embedded message was scrubbed... From: Andrew Morton Subject: Re: linux-next: add utrace tree Date: Thu, 21 Jan 2010 16:30:04 -0800 Size: 6716 URL: From phemgtkfxj at tom.com Fri Jan 22 05:27:08 2010 From: phemgtkfxj at tom.com (=?gb2312?B?zuTHqw==?=) Date: Fri, 22 Jan 2010 13:27:08 +0800 Subject: =?gb2312?B?yrfP5s34LNW+LM3GLLnjLL74LNGn?= Message-ID: ???????????????????????????????????? ??????????????????????????????! ????????????:wnwbww.gxiob3v3o.bceny (????????????????????????????????????????,??????????????????!) -------------- next part -------------- An HTML attachment was scrubbed... URL: From srikar at linux.vnet.ibm.com Fri Jan 22 07:02:32 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Fri, 22 Jan 2010 12:32:32 +0530 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> Message-ID: <20100122070232.GA2975@linux.vnet.ibm.com> Here is a summary of the Comments and actions that need to be taken for the current uprobes patchset. Please let me know if I missed or misunderstood any of your comments. 1. Uprobes depends on trap signal. Uprobes depends on trap signal rather than hooking to the global die notifier. It was suggested that we hook to the global die notifier. In the next version of patches, Uprobes will use the global die notifier and look at the per-task count of the probes in use to see if it has to be consumed. However this would reduce the ability of uprobe handlers to sleep. Since we are dealing with userspace, sleeping in handlers would have been a good feature. We are looking at ways to get around this limitation. 2. XOL vma vs Emulation vs Single Stepping Inline vs using Protection Rings. XOL VMA is an additional process address vma. This is opposition to add an additional vma without user actually requesting for the same. XOL vma and single stepping inline are the two architecture independent implementations. While other implementations are more architecture specific. Single stepping inline wouldnt go well with multithreaded process. Even though XOL vma has its own issues, we will go with it since other implementations seem to have more complications. we would look forward to implementing boosters later. Later on, if we come across another techniques with lesser side-effects than the XOL vma, we would switch to using them. 3. Current Uprobes looks at process life times and not vma lifetimes. Also it needs threads to quiesce when inserting and removing breakpoints. Current uprobes was quiesing threads of a process before insertion and deletion. This resulted in uprobes having to track process lifetimes. An alternative method to track vma lifetimes was suggested. Next version would update the copy of the page and flip the pagetables as suggested by Peter. Hence it would no more depend on threads quiescing. However this would need hooks in munmap/rmap so that uprobes can remove breakpoints that are placed in that vma. This would also mean removing the rcu_deference we were using. 4. Move the ftrace plugin to use trace events. Since ftrace plugins are relegated to obsolescence, it was suggested we use trace events which would have much wider scope. Next version will use trace events. 5. rename UBP to user_bkpt 6. updating the authors for all files that are getting added. I shall work towards v2 of uprobes and send across the patches at the earliest. -- Thanks and Regards Srikar From ananth at in.ibm.com Fri Jan 22 07:24:02 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 22 Jan 2010 12:54:02 +0530 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <20100122070232.GA2975@linux.vnet.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> Message-ID: <20100122072402.GA7440@in.ibm.com> On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote: > Here is a summary of the Comments and actions that need to be taken for > the current uprobes patchset. Please let me know if I missed or > misunderstood any of your comments. > > 1. Uprobes depends on trap signal. > Uprobes depends on trap signal rather than hooking to the global > die notifier. It was suggested that we hook to the global die notifier. > > In the next version of patches, Uprobes will use the global die > notifier and look at the per-task count of the probes in use to > see if it has to be consumed. > > However this would reduce the ability of uprobe handlers to > sleep. Since we are dealing with userspace, sleeping in handlers > would have been a good feature. We are looking at ways to get > around this limitation. We could set a TIF_ flag in the notifier to indicate a breakpoint hit and process it in task context before the task heads into userspace. Ananth From mariavilan at yahoo.combr Mon Jan 18 13:18:42 2010 From: mariavilan at yahoo.combr (CJP - Soluções em Publicidade) Date: Mon, 18 Jan 2010 13:18:42 GMT Subject: Servidor SMTP para Newsletter Message-ID: <201001220758.o0M7wAoT027125@mx1.redhat.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: campanha_10.jpg Type: image/jpeg Size: 28785 bytes Desc: not available URL: From peterz at infradead.org Fri Jan 22 10:47:37 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 22 Jan 2010 11:47:37 +0100 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <20100122072402.GA7440@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> <20100122072402.GA7440@in.ibm.com> Message-ID: <1264157257.4283.1529.camel@laptop> On Fri, 2010-01-22 at 12:54 +0530, Ananth N Mavinakayanahalli wrote: > On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote: > > Here is a summary of the Comments and actions that need to be taken for > > the current uprobes patchset. Please let me know if I missed or > > misunderstood any of your comments. > > > > 1. Uprobes depends on trap signal. > > Uprobes depends on trap signal rather than hooking to the global > > die notifier. It was suggested that we hook to the global die notifier. > > > > In the next version of patches, Uprobes will use the global die > > notifier and look at the per-task count of the probes in use to > > see if it has to be consumed. > > > > However this would reduce the ability of uprobe handlers to > > sleep. Since we are dealing with userspace, sleeping in handlers > > would have been a good feature. We are looking at ways to get > > around this limitation. > > We could set a TIF_ flag in the notifier to indicate a breakpoint hit > and process it in task context before the task heads into userspace. Make that optional, not everybody might want that. Either provide a simple trampoline or use a flag to indicate the callback be called from process context on registration. From Valdis.Kletnieks at vt.edu Fri Jan 22 13:43:18 2010 From: Valdis.Kletnieks at vt.edu (Valdis.Kletnieks at vt.edu) Date: Fri, 22 Jan 2010 08:43:18 -0500 Subject: linux-next: add utrace tree In-Reply-To: Your message of "Fri, 22 Jan 2010 10:51:39 +0530." <20100122052139.GA20532@in.ibm.com> References: <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122052139.GA20532@in.ibm.com> Message-ID: <18867.1264167798@localhost> On Fri, 22 Jan 2010 10:51:39 +0530, Ananth N Mavinakayanahalli said: > FWIW, Oleg's implementation of ptrace over utrace is 100% compatible > with legacy ptrace; gdb testsuite indicates that > (http://lkml.org/lkml/2009/12/21/98). No, that only proves it's compatible enough for gdb to not care. The problem is all those *other* packages that abuse ptrace in totally crackhead ways. (No, I can't name them - but ptrace is the sort of interface that almost encourages its use for things somewhere between crackhead and mad-scientist, so they're almost certainly out there.. WAY out there.. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 227 bytes Desc: not available URL: From alip at exherbo.org Fri Jan 22 14:49:28 2010 From: alip at exherbo.org (Ali Polatel) Date: Fri, 22 Jan 2010 16:49:28 +0200 Subject: PTRACE_SYSCALL_ENTRY/EXIT In-Reply-To: <20100119020216.01CEB506@magilla.sf.frob.com> References: <20100116215124.GA963@harikalardiyari> <20100119020216.01CEB506@magilla.sf.frob.com> Message-ID: <1264171061-sup-496@harikalardiyari> Roland McGrath yazm??: > We don't have any particular plans to extend the ptrace interface. > I strongly doubt we would even try to do anything like that until the > utrace-based ptrace interface is merged into Linux and the old ptrace > implementation gone. > > In general, we are not looking for extensions to the ptrace interface. > It is an ugly hairball already and we are more interested in having > the utrace API layer available inside the kernel and then embarking on > new and sane userland interfaces instead of shoehorning more into ptrace. > I respect that. > That said, some particular kinds of simple enhancements to ptrace are > really quite trivial to implement in the new utrace-based implementation. > The particular area you suggest is one of these. > > What I would expect is not new variants of the one-shot interface like > PTRACE_SYSCALL. Rather, I would envision new PTRACE_O_* options to enable > syscall entry and exit tracing analogous to the PTRACE_EVENT_* events you > can now enable. This means that you make one PTRACE_SETOPTIONS call to > enable the set of events you want, and then use plain PTRACE_CONT (or > whatever). > > If you really want exactly the one-shot behavior instead, then we could > consider that. But, like I said, we are not looking to add much in the > way of new wrinkles to the dismal ptrace userland interface. The one-shot behaviour is what I want because adding a PTRACE_O_* option won't solve my problem if I understood correctly. I'm writing a tool that audits system calls and *only* denied system calls need to be stopped at the exit of the system call to set return value and errno. System calls are checked at entry, if they're safe another PTRACE_SYSCALL_ENTRY will be issued to continue to the next system call. If, however, the system call needs to be denied, PTRACE_SYSCALL_EXIT will be issued after changing system call no to something invalid so that return value and errno can be set. I think this will be useful for every program that audits system calls. > > Thanks, > Roland -- Regards, Ali Polatel -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From oleg at redhat.com Fri Jan 22 17:45:41 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Fri, 22 Jan 2010 18:45:41 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100121170541.7425ff10.akpm@linux-foundation.org> References: <20100120054950.GB27108@elte.hu> <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> Message-ID: <20100122174541.GA8945@redhat.com> On 01/21, Andrew Morton wrote: > > ptrace is a nasty, complex part of the kernel which has a long history > of problems, but it's all been pretty quiet in there for the the past few > years. Well, yes, I'd say ptrace is "frozen". Nobody add new features/improvements, only bugfixes. > This leads one to expect that a rip-out-n-rewrite is a > high-risk prospect. So, quite reasonably, one looks for a good reason > for taking such risk. As it was already said, utrace was not created to just replace the current ptrace. However, speaking of ptrace, imho even ptrace-utrace is more flexible and allows to improve this api easily. Just for example, even attach and detach are not trivial to use from user-space when it comes to multithread tracees. A one-liner patch for ptrace-utrace can implement PTRACE_DETACH which doesn't need TASK_TRACED, it is easy to avoid the initial SIGSTOP from attach (which doesn't always work but strace/gdb relies on it). Of course, I do not profess this is not posible with the current implementation. But this will need more changes, and these changes will touch the code outside of ptrace.c. And in fact I think that any enhancements in this area will lead to rewrite of the current ptrace code. I must admit that personally I think the current ptrace api is unfixable, we need the new one in the long term. It would be nice to just kill ptrace, but this is not possible and that is why ptrace-utrace exists. And, if nothing more, utrace allows to have both old and new ones without any changes outside of ptrace/utrace code. Oleg. From urethritis at vrejo.com Fri Jan 22 17:41:03 2010 From: urethritis at vrejo.com (Petro) Date: Fri, 22 Jan 2010 18:41:03 +0100 Subject: Han the city are the citizens. Who are they, and of what blood and Message-ID: <4B59E172.4010502@vrejo.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vitrain.jpg Type: image/jpeg Size: 13898 bytes Desc: not available URL: From peterz at infradead.org Fri Jan 22 18:06:14 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 22 Jan 2010 19:06:14 +0100 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <20100122070232.GA2975@linux.vnet.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> Message-ID: <1264183574.4283.1558.camel@laptop> On Fri, 2010-01-22 at 12:32 +0530, Srikar Dronamraju wrote: > 2. XOL vma vs Emulation vs Single Stepping Inline vs using Protection > Rings. > XOL VMA is an additional process address vma. This is > opposition to add an additional vma without user actually > requesting for the same. > > XOL vma and single stepping inline are the two architecture > independent implementations. While other implementations are > more architecture specific. Single stepping inline wouldnt go > well with multithreaded process. > > Even though XOL vma has its own issues, we will go with it since > other implementations seem to have more complications. > > we would look forward to implementing boosters later. > Later on, if we come across another techniques with lesser > side-effects than the XOL vma, we would switch to using them. How about modifying glibc to reserve like 64 bytes on the TLS structure it has and storing the ins and possible boost jmp there? Since each thread can only have a single trap at any one time that should be enough. From oleg at redhat.com Fri Jan 22 18:28:27 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Fri, 22 Jan 2010 19:28:27 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100120061551.GB6588@in.ibm.com> <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> Message-ID: <20100122182827.GA13185@redhat.com> On 01/21, Linus Torvalds wrote: > > On Thu, 21 Jan 2010, Andrew Morton wrote: > > > > ptrace is a nasty, complex part of the kernel which has a long history > > of problems, but it's all been pretty quiet in there for the the past few > > years. > > More importantly, we're not ever going to get rid of it. Unfortunately, you are right. The current ptrace (as it is visible from user-space) should stay forever. > Quite frankly, judging my all past history we have ever seen in kernel > interfaces, new an non-portable interfaces simply are never used. The > whole question whether they are nicer or not is entirely immaterial. I have to admit this point looks very reasonable to me. Except, can't resist, ptrace itself is hardly portable. > I'm personally very dubious that there are any merits to utrace that > outweigh the very clear disadvantages: just another layer that adds a new > level of abstraction to the only interface that people actually _use_, > namely ptrace. Of course they can't use other interfaces, we don't have them. And without the new abstraction layer we will never have, I think. Oleg. From mhiramat at redhat.com Fri Jan 22 18:36:34 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Fri, 22 Jan 2010 13:36:34 -0500 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <1264183574.4283.1558.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> <1264183574.4283.1558.camel@laptop> Message-ID: <4B59F032.2060801@redhat.com> Peter Zijlstra wrote: > On Fri, 2010-01-22 at 12:32 +0530, Srikar Dronamraju wrote: > >> 2. XOL vma vs Emulation vs Single Stepping Inline vs using Protection >> Rings. >> XOL VMA is an additional process address vma. This is >> opposition to add an additional vma without user actually >> requesting for the same. >> >> XOL vma and single stepping inline are the two architecture >> independent implementations. While other implementations are >> more architecture specific. Single stepping inline wouldnt go >> well with multithreaded process. >> >> Even though XOL vma has its own issues, we will go with it since >> other implementations seem to have more complications. >> >> we would look forward to implementing boosters later. >> Later on, if we come across another techniques with lesser >> side-effects than the XOL vma, we would switch to using them. > > How about modifying glibc to reserve like 64 bytes on the TLS structure > it has and storing the ins and possible boost jmp there? Since each > thread can only have a single trap at any one time that should be > enough. Hmm, it is a good idea. Well, we'll have a copy of original insn in kernel, but it could be simpler than managing XOL vma. :-) Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From oleg at redhat.com Fri Jan 22 19:39:23 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Fri, 22 Jan 2010 20:39:23 +0100 Subject: linux-next: add utrace tree In-Reply-To: <18867.1264167798@localhost> References: <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122052139.GA20532@in.ibm.com> <18867.1264167798@localhost> Message-ID: <20100122193923.GB13185@redhat.com> On 01/22, Valdis.Kletnieks at vt.edu wrote: > > No, that only proves it's compatible enough for gdb to not care. The problem > is all those *other* packages that abuse ptrace in totally crackhead ways. > > (No, I can't name them - but ptrace is the sort of interface that almost > encourages its use for things somewhere between crackhead and mad-scientist, > so they're almost certainly out there.. WAY out there.. :) Yes, this is true. We are trying to test it as much as possible. Oleg. From fche at redhat.com Fri Jan 22 20:01:29 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 22 Jan 2010 15:01:29 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100122182827.GA13185@redhat.com> References: <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> Message-ID: <20100122200129.GG22003@redhat.com> Hi - oleg wrote: > [...] >> I'm personally very dubious that there are any merits to utrace that >> outweigh the very clear disadvantages: just another layer that adds a new >> level of abstraction to the only interface that people actually _use_, >> namely ptrace. > > Of course they can't use other interfaces, we don't have them. And > without the new abstraction layer we will never have, I think. This is one of the reasons we built, up on request of lkml people, the utrace-gdbstub prototype (http://lkml.org/lkml/2009/11/30/173). It presents a standard userspace debugging interface -- actually, more standard than ptrace! It has the potential to be more powerful feature-wise and perhaps even perform faster than ptrace. And yet that RFC didn't receive any on-topic review, only wishes for unspecified blue-sky integration with kernel debugging. So then there's uprobes, which is another potential utrace "killer app", if it weren't so tainted by some peoples' disdain for its current user, when other users are already being seriously discussed. So a working prototype, which demonstrates both the utility of utrace itself and the end-user value of user-space probing, is disregarded. And there are several smaller utrace clients in the works, each of them merge candidates in the future. Yes, most of them may be rewritten with special-purpose hook after hook as people reinvent the utrace wheel piece by piece, but how long will that take? How is the opportunity cost of missing features valued? Finally, I don't know how to address the logic of "if a feature requires utrace, that's a bad argument for utrace" and at the same time "you need to show a killer app for utrace". What could possibly satisfy both of those constraints? Please advise. - FChE From peterz at infradead.org Fri Jan 22 20:16:16 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Fri, 22 Jan 2010 21:16:16 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100122200129.GG22003@redhat.com> References: <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> Message-ID: <1264191376.4283.1590.camel@laptop> On Fri, 2010-01-22 at 15:01 -0500, Frank Ch. Eigler wrote: > So then there's uprobes, which is another potential utrace "killer > app" That's bollocks, uprobes is an utter and total mis-match for utrace. Probing userspace is primarily about DSOs which is files and vma's, not tasks. You might maybe want a utrace interface to that, but that is largely non-interesting. IOW, we don't need utrace to make sensible use of uprobes. (And when I speak of uprobes I mean the thing formerly called UBP) From oleg at redhat.com Fri Jan 22 20:51:12 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Fri, 22 Jan 2010 21:51:12 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> <20100122022255.GF22003@redhat.com> Message-ID: <20100122205112.GA20716@redhat.com> On 01/21, Linus Torvalds wrote: > > I realize that my argument is very anti-thetical to the normal CS teaching > of "general-purpose is good". I often feel that very specific code with > very clearly defined (and limited) applicability is a good thing - I'd > rather have just a very specific ptrace layer that does nothing but > ptrace, than a "generic plugin layer that can be layered under ptrace and > other things". I am repeating the same (and probably poor) arguments, but we don't have a clearly defined ptrace layer. The current code is just the set of precedents, I mean, "this code does this because we always did this for unknown reason". And we can't fix it without breaking things. Even the obvious bugs which could be fixed by the very simple patch should be preserved sometimes. In fact, afaics the current state is: if it can't crash the kernel - it is not the bug. Otoh, ptrace is very limited, yes. Imho - too limited. And, as a user-space api, it is just horrible. However: "we're not ever going to get rid of it". Yes, sure. But I am afraid this all is almost off-topic. Afaik, utrace was not created to solve the problems with ptrace, at least I am sure this wasn't the only goal. Unfortunately, I didn't participate in other projects which use utrace. Even if I did, I don't know how could I prove they are "important enough" to have a generic layer to make other things possible. Oleg. From fche at redhat.com Fri Jan 22 21:44:24 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 22 Jan 2010 16:44:24 -0500 Subject: linux-next: add utrace tree In-Reply-To: <1264191376.4283.1590.camel@laptop> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <1264191376.4283.1590.camel@laptop> Message-ID: <20100122214424.GH22003@redhat.com> Hi - On Fri, Jan 22, 2010 at 09:16:16PM +0100, Peter Zijlstra wrote: > [...] > > So then there's uprobes, which is another potential utrace "killer > > app" > That's bollocks, uprobes is an utter and total mis-match for utrace. > Probing userspace is primarily about DSOs which is files and vma's, > not tasks. [...] Your experience with user-space probing apparently differs from ours. In fact there exists plenty of interest and utility in probing given processes only, if for no other reason then to avoid disrupting others running on the machine. Nearly always, it is better to build a multiprocess probing widget from multiply-applied single-process ones, rather than to build single-process probing from grossly-filtered systemwide/VMA ones. (If the lower level infrastructure provides both options, groovy.) - FChE From torvalds at linux-foundation.org Fri Jan 22 21:59:11 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Fri, 22 Jan 2010 13:59:11 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100122200129.GG22003@redhat.com> References: <20100120062834.GB12165@elte.hu> <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> Message-ID: On Fri, 22 Jan 2010, Frank Ch. Eigler wrote: > > Finally, I don't know how to address the logic of "if a feature > requires utrace, that's a bad argument for utrace" and at the same > time "you need to show a killer app for utrace". What could possibly > satisfy both of those constraints? Please advise. The point is, the feature needs to be a killer feature. And I have yet to hear _any_ such killer feature, especially from a kernel maintenance standpoint. The "better ptrace than ptrace" is irrelevant. Sure, we all know ptrace isn't a wonderful feature. But it's there, and a debugger is going to have support for it anyway, so what's the _advantage_ of a "better ptrace interface"? There is absolutely _zero_ advantage, there's just "yet another interface". We can't get rid of the old one _anyway_. And the seccomp replacement just sounds horrible. Using some tracing interface to implement security models sounds like the worst idea ever. And like it or not, over the last almost-decade, _not_ having to have to work with system tap has been a feature, not a problem, for the kernel community. So what's the killer feature? Linus From fche at redhat.com Fri Jan 22 22:13:48 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 22 Jan 2010 17:13:48 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> Message-ID: <20100122221348.GA4263@redhat.com> Hi - On Fri, Jan 22, 2010 at 01:59:11PM -0800, Linus Torvalds wrote: > [...] > > Finally, I don't know how to address the logic of "if a feature > > requires utrace, that's a bad argument for utrace" and at the same > > time "you need to show a killer app for utrace". What could possibly > > satisfy both of those constraints? Please advise. > > The point is, the feature needs to be a killer feature. And I have yet to > hear _any_ such killer feature, especially from a kernel maintenance > standpoint. > The "better ptrace than ptrace" is irrelevant. Sure, we all know ptrace > isn't a wonderful feature. But it's there, and a debugger is going to have > support for it anyway, so what's the _advantage_ of a "better ptrace > interface"? There is absolutely _zero_ advantage, there's just "yet > another interface". We can't get rid of the old one _anyway_. The point is that the intermediate api will allow (and, as the part you clipped out about utrace-gdbstub said, *already has allowed*) alternative plausible interfaces that coexist just fine. > And the seccomp replacement just sounds horrible. Using some tracing > interface to implement security models sounds like the worst idea ever. So all this is about *naming* utrace? It was never built "for tracing", but for (efficient/multiplexed) *control*. That wasn't even its original name -- one of your lieutenants asked roland to change it to utrace. > And like it or not, over the last almost-decade, _not_ having to > have to work with system tap has been a feature, not a problem, for > the kernel community. I don't have a problem with that. We have apprx. never imposed anything on developers who didn't want to use it. There are plenty who have and will. - FChE From jkenisto at us.ibm.com Fri Jan 22 23:55:11 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Fri, 22 Jan 2010 15:55:11 -0800 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <1264183574.4283.1558.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> <1264183574.4283.1558.camel@laptop> Message-ID: <1264204511.5150.6.camel@localhost.localdomain> On Fri, 2010-01-22 at 19:06 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-22 at 12:32 +0530, Srikar Dronamraju wrote: > > > 2. XOL vma vs Emulation vs Single Stepping Inline vs using Protection > > Rings. > > XOL VMA is an additional process address vma. This is > > opposition to add an additional vma without user actually > > requesting for the same. > > > > XOL vma and single stepping inline are the two architecture > > independent implementations. While other implementations are > > more architecture specific. Single stepping inline wouldnt go > > well with multithreaded process. > > > > Even though XOL vma has its own issues, we will go with it since > > other implementations seem to have more complications. > > > > we would look forward to implementing boosters later. > > Later on, if we come across another techniques with lesser > > side-effects than the XOL vma, we would switch to using them. > > How about modifying glibc to reserve like 64 bytes on the TLS structure > it has and storing the ins and possible boost jmp there? Since each > thread can only have a single trap at any one time that should be > enough. We once implemented something similar, but using an area just beyond the top of the stack instead of TLS. We figured it would never pass muster because we have to temporarily map the page executable (and undo it after the single-step), and this felt like a big security hole. I'd think we'd have the same concern with TLS. Jim From torvalds at linux-foundation.org Sat Jan 23 00:11:03 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Fri, 22 Jan 2010 16:11:03 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100122221348.GA4263@redhat.com> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Fri, 22 Jan 2010, Frank Ch. Eigler wrote: > > The point is that the intermediate api will allow (and, as the part > you clipped out about utrace-gdbstub said, *already has allowed*) > alternative plausible interfaces that coexist just fine. And my point is that multiple interfaces are BAD. There is one interface we _have_ to have: the traditional ptrace one. That one we can't get away from. "Multiple interfaces" on its own is just confusion with no upside. You need a _reason_ to have other interfaces. They need to have that killer feature. Just being "different" is not a feature at all. > So all this is about *naming* utrace? It was never built "for > tracing", but for (efficient/multiplexed) *control*. That wasn't even > its original name -- one of your lieutenants asked roland to change it > to utrace. No. It's not about naming. It's about the downside of having amorphous interfaces that apparently don't even have rules, and are then used to implement random crap. Yes, the SNL skit about "It's a dessert topping _and_ a floor wax" was funny, but it was funny exactly because it was crazy. The fact that you can do crazy things is not a good thing. You need to find the "goodness" somewhere else, and that's what I'm trying to tell you. You just seem to have trouble listening. Linus From torvalds at linux-foundation.org Sat Jan 23 00:22:22 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Fri, 22 Jan 2010 16:22:22 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Fri, 22 Jan 2010, Linus Torvalds wrote: > > No. It's not about naming. It's about the downside of having amorphous > interfaces that apparently don't even have rules, and are then used to > implement random crap. > > Yes, the SNL skit about "It's a dessert topping _and_ a floor wax" was > funny, but it was funny exactly because it was crazy. Put yet another way: I'd _much_ rather have two totally separate pieces that don't depend on each other, and do different things. So to take a very practical example: I'd much rather have 'seccomp' and 'ptrace' that have _nothing_ what-so-ever to do with each other, than have some intermediate layer that then needs to make both of those happy, and that both have to interact with. There are cases where we really _want_ to have common code. We want to have a common VFS interface because we want to show _one_ interface to user space across a gazillion different filesystems. We want to have a common driver layer (as far as possible) because - again - we expose a metric shitload of drivers, and we want to have one unified interface to them. But going the other way: trying to share code when the interfaces are fundamentally _different_ is generally not at all such a great idea. It ends up tying two conceptually totally separate things together, and suddenly people who work on feature X aneed to modify infrastructure that affects feature Y, and it turns ou that details A, B and C are all totally different for the two features and the middle layer has two conflicting things it needs to work with. This is why when somebody brought up "you could do a seccomp-like thing on top of utrace" that my reaction was and is just totally negative. It shows all the wrong kinds of tying things together. Linus From news at imoveisemoferta.com.br Fri Jan 22 13:14:28 2010 From: news at imoveisemoferta.com.br (imoveis em Oferta) Date: Fri, 22 Jan 2010 13:14:28 GMT Subject: Site GRATIS para imobiliarias e corretores de imoveis, utrace-devel@redhat.com Message-ID: An HTML attachment was scrubbed... URL: From mingo at elte.hu Sat Jan 23 06:04:01 2010 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 23 Jan 2010 07:04:01 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> <20100122022255.GF22003@redhat.com> Message-ID: <20100123060401.GB19399@elte.hu> * Linus Torvalds wrote: > On Thu, 21 Jan 2010, Frank Ch. Eigler wrote: > > > Less passionate analysis would identify a long history of contribution by > > the the greater affiliated team, including via merged code and by and > > passing on requirements and experiences. > > The reason I'm so passionate is that I dislike the turn the discussion was > taking, as if "utrace" was somehow _good_ because it allowed various other > interfaces to hide behind it. And I'm not at all convinced that is true. > > And I really didn't want to single out system tap, I very much feel the same > way abotu some seccomp-replacement "security model that the kernel doesn't > even need to know about" thing. > > So don't take the systemtap part to be the important part, it's the bigger > issue of "I'd much rather have explicit interfaces than have generic hooks > that people can then use in any random way". > > I realize that my argument is very anti-thetical to the normal CS teaching > of "general-purpose is good". I often feel that very specific code with very > clearly defined (and limited) applicability is a good thing - I'd rather > have just a very specific ptrace layer that does nothing but ptrace, than a > "generic plugin layer that can be layered under ptrace and other things". ( I think to a certain degree it mirrors the STEAMS hooks situation from a decade ago - and while there were big flamewars back then we never regretted not taking the STREAMS opaque hooks upstream. ) > In one case, you know exactly what the users are, and what the semantics are > going to be. In the other, you don't. > > So I really want to see a very big and immediate upside from utrace. Because > to me, the "it's a generic layer with any application you want to throw at > it" is a _downside_. One component of the whole utrace/systemtap codebase that i think would make sense upstreaming in the near term is the concept of user-space probes. We are actively looking into it from a 'perf probe' angle, and PeterZ suggested a few ideas already. Allowing apps to transparently improve the standard set of events is a plus. (From a pure Linux point of view it's probably more important than any kernel-only instrumentation.) Also, if any systemtap person is interested in helping us create a more generic filter engine out of the current ftrace filter engine (which is really a precursor of a safe, sandboxed in-kernel script engine), that would be excellent as well. Right now we support simple C-syntax expressions like: perf record -R -f -e irq:irq_handler_entry --filter 'irq==18 || irq==19' More could be done - a simple C-like set of function perhaps - some minimal per probe local variable state, etc. (perhaps even looping as well, with a limit on number of predicament executions per filter invocation.) ( _Such_ a facility, could then perhaps be used to allow applications access to safe syscall sandboxing techniques: i.e. a programmable seccomp concept in essence, controlled via ASCII space filter expressions [broken down into predicaments for fast execution], syscall driven and inherited by child tasks so that security restrictions percolate down automatically. IMHO that would be a superior concept for security modules too: there's no reason why all the current somewhat opaque security hooks couldnt be expressed in terms of more generic filter expressions, via a facility that can be used both for security and for instrumentation. That's all what SELinux boils down to in the end: user-space injected policy rules. ) The opaque hookery all around the core kernel just to push everything outside of mainline is one of the biggest downsides of utrace/systemtap - and neither uprobes nor the concept of user-defined scripting around existing events is affected much by that. So lots of work is left and all that work is going to be rather utilitarian with little downside: specific functionality with an immediately visible upside, with no need for opaque hooks. Ingo From kyle at moffetthome.net Sat Jan 23 06:20:53 2010 From: kyle at moffetthome.net (Kyle Moffett) Date: Sat, 23 Jan 2010 01:20:53 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Fri, Jan 22, 2010 at 19:22, Linus Torvalds wrote: > There are cases where we really _want_ to have common code. We want to > have a common VFS interface because we want to show _one_ interface to > user space across a gazillion different filesystems. We want to have a > common driver layer (as far as possible) because - again - we expose a > metric shitload of drivers, and we want to have one unified interface to > them. So... Everybody agrees that ptrace() is horrible and a royal pain to use, let alone use correctly and without bugs. Everybody also agrees that ptrace() needs to stay around for a long time to avoid breaking all the existing users. Now how do we get from here to a moderately portable API for interrogating, controlling, and intercepting process state? Essentially it would need to support all of the things that a powerful debugger would want to do, including modifying registers and memory, substituting syscall return values, etc. I believe that "utrace" is the kernel side of that API. The killer app for this will be the ability to delete thousands of lines of code from GDB, strace, and all the various other tools that have to painfully work around the major interface gotchas of ptrace(), while at the same time making their handling of complex processes much more robust. The *second* killer app for this is to make it much easier for people to write new userspace debugging tools. I love the various crash-catching tools that different distributions or applications provide, but they all basically have to trap the SIGSEGV and hope they're still sensible enough to fork() and exec() a gdb process. Furthermore, I would love to be able to write debugging tools for scripting languages that allow me to step across Perl, C, PHP, assembly code, etc, all within the same process. In theory that's all possible today, but given how much of a *pain* ptrace() is to use correctly, nobody bothers. Now, with all that said, "utrace" does not provide any of the userspace side APIs today... but I think it is a necessary refactoring if we want to provide a new ideal process-introspection interface without breaking all the ptrace() users. Think of the "utrace" interface as very much like the LSM interface. Just like with LSMs, there is a lot of active research in debugging and tracing tools, and nobody can even remotely agree what the hell they want out of the hooks. In theory you could add one hook for every place each security module needs one... but then your fast-path is littered with always-false test-and-jump statements. What "utrace" provides is the one single test in each fast path that then searches for and executes the appropriate slow path(s) for that process. I personally would be very happy to see "utrace" merged. Cheers, Kyle Moffett From adobriyan at gmail.com Sat Jan 23 08:05:07 2010 From: adobriyan at gmail.com (Alexey Dobriyan) Date: Sat, 23 Jan 2010 10:05:07 +0200 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Sat, Jan 23, 2010 at 2:22 AM, Linus Torvalds wrote: > This is why when somebody brought up "you could do a seccomp-like thing on > top of utrace" that my reaction was and is just totally negative. It shows > all the wrong kinds of tying things together. seccomp-via-utrace should be just removed to be honest before its users. It entered the tree because it was very small and simple. If rewritten, it no longer is small and simple because of whole kernel/utrace.c. From alan at lxorguk.ukuu.org.uk Sat Jan 23 11:01:21 2010 From: alan at lxorguk.ukuu.org.uk (Alan Cox) Date: Sat, 23 Jan 2010 11:01:21 +0000 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <20100123110121.1ce89bb9@lxorguk.ukuu.org.uk> > The killer app for this will be the ability to delete thousands of > lines of code from GDB, strace, and all the various other tools that > have to painfully work around the major interface gotchas of ptrace(), > while at the same time making their handling of complex processes much > more robust. Years ago (and it really must be years ago because this was about the time I started hacking on Linux stuff !) there was a proposal to extract and sanitize the arch specific stuff in binutils and in gdb etc into sensible libraries that could be used by other apps. What I don't understand is why that doesn't solve 99% of your problem. ptrace is not perfect but most of the real ptrace limitations actually come about because either the CPU can't do something or because the supporting logic would be too expensive - things like having extra private debugger pages. Yes ptrace needs a lot of icky support code, but it's already been written... Alan From mingo at elte.hu Sat Jan 23 11:23:33 2010 From: mingo at elte.hu (Ingo Molnar) Date: Sat, 23 Jan 2010 12:23:33 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <20100123112333.GA15455@elte.hu> * Kyle Moffett wrote: > On Fri, Jan 22, 2010 at 19:22, Linus Torvalds > wrote: > > There are cases where we really _want_ to have common code. We want to > > have a common VFS interface because we want to show _one_ interface to > > user space across a gazillion different filesystems. We want to have a > > common driver layer (as far as possible) because - again - we expose a > > metric shitload of drivers, and we want to have one unified interface to > > them. > > So... Everybody agrees that ptrace() is horrible and a royal pain to use, > let alone use correctly and without bugs. Everybody also agrees that > ptrace() needs to stay around for a long time to avoid breaking all the > existing users. > > Now how do we get from here to a moderately portable API for interrogating, > controlling, and intercepting process state? Essentially it would need to > support all of the things that a powerful debugger would want to do, > including modifying registers and memory, substituting syscall return > values, etc. I believe that "utrace" is the kernel side of that API. The problem is, utrace does not do that really. What utrace does is that it provides an opaque set of APIs for unspecified and out of tree _kernel_ modules (such as systemtap). It doesnt support any 'application' per se. It basically removes the kernel's freedom at shaping its own interaction with debug application. If utrace was a 'better ptrace' syscall, where the syscall itself is the goal of the hookery, it would all be rather different. People could argue about _that_ interface (and the hooks would be a pure kernel internal implementational detail - not an interface specification), and once people agree about that ABI and there's enough application momentum behind it, the hooks are really not that opaque anymore - they are for that ABI and not more. Note that it's still a _big_ hurdle: it's hard to agree on a new syscall and it's hard to get 'application momentum' behind it. Special Linux system calls have a checkered past, they tend to not be used by much anything, and thus they tend to be a breeding ground of both bugs, maintenance complexity and security problems. Lack of attention is never good. In that sense it might be better to fix/enhance ptrace, if there's interest. I've written a handful of ptrace extensions in the past (none of them went upstream tho), it can be done in a useful manner and the code is pretty hackable. There are basic problems left to be solved: for example why is there still no 'memory block copy' call, why are we _still_ limited to one word per system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has PTRACE_WRITE*/READ* support that implements this, but none of the other architectures have it so it's essentially unused. Or another possible direction would be to extend the perf events syscall with interception capabilities. It's far more performant at extracting application state without scheduling than any ptrace method - and interception/injection would be a natural next step - if there's interest. Thanks, Ingo From fche at redhat.com Sat Jan 23 11:47:29 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 23 Jan 2010 06:47:29 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100123112333.GA15455@elte.hu> References: <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> Message-ID: <20100123114729.GA7828@redhat.com> Hi - mingo wrote: > [...] > > Now how do we get from here to a moderately portable API for interrogating, > > controlling, and intercepting process state? Essentially it would need to > > support all of the things that a powerful debugger would want to do, > > including modifying registers and memory, substituting syscall return > > values, etc. I believe that "utrace" is the kernel side of that API. > > The problem is, utrace does not do that really. In fact, it is exactly designed for that. > What utrace does is that it provides an opaque set of APIs for > unspecified and out of tree _kernel_ modules (such as systemtap). It > doesnt support any 'application' per se. It basically removes the > kernel's freedom at shaping its own interaction with debug > application. This claim is hard to take any more seriously than emoting that the blockio layer is "opaque" because device drivers "remove freedom" for the kernel to "shape its interaction" with hardware. If you have any *real evidence* about how any present user of utrace misuses that capability, or interferes with the "kernel's freedom", show us please. - FChE From fche at redhat.com Sat Jan 23 11:51:49 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 23 Jan 2010 06:51:49 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100123110121.1ce89bb9@lxorguk.ukuu.org.uk> References: <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123110121.1ce89bb9@lxorguk.ukuu.org.uk> Message-ID: <20100123115149.GB7828@redhat.com> Hi - On Sat, Jan 23, 2010 at 11:01:21AM +0000, Alan Cox wrote: > [...] > What I don't understand is why [libgdb?] doesn't solve 99% of your problem. > ptrace is not perfect but most of the real ptrace limitations actually > come about because either the CPU can't do something or because the > supporting logic would be too expensive - things like having extra > private debugger pages. At least one reason is that ptrace is single-usage-only, so for example you cannot concurrently debug & strace the same program. OTOH, utrace is designed to permit clean nesting/sharing semantics for concurrent debugger-type tools operating on the same processes. - FChE From fche at redhat.com Sat Jan 23 12:03:01 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Sat, 23 Jan 2010 07:03:01 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100123060401.GB19399@elte.hu> References: <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> <20100122022255.GF22003@redhat.com> <20100123060401.GB19399@elte.hu> Message-ID: <20100123120301.GD7828@redhat.com> Hi - On Sat, Jan 23, 2010 at 07:04:01AM +0100, Ingo Molnar wrote: > [...] Also, if any systemtap person is interested in helping us > create a more generic filter engine out of the current ftrace filter > engine (which is really a precursor of a safe, sandboxed in-kernel > script engine), that would be excellent as well. [...] Thank you for the invitation. > More could be done - a simple C-like set of function perhaps - some minimal > per probe local variable state, etc. (perhaps even looping as well, with a > limit on number of predicament executions per filter invocation.) Yes, at some point when such bytecode intepreter gets rich enough, one may not need the translated-to-C means of running scripts. > ( _Such_ a facility, could then perhaps be used to allow applications access > to safe syscall sandboxing techniques: i.e. a programmable seccomp concept > in essence, controlled via ASCII space filter expressions [...] > IMHO that would be a superior concept for security modules too [...] > > [...] specific functionality with an immediately visible upside, > with no need for opaque hooks. This OTOH seem like rather a stretch. If one claims that "opaque hooks" are bad, so instead have hooks that jump not to auditable C code but an bytecode interpreter? And have the bytecodes be uploaded from userspace? How is this supposed to produce "transparency" from the kernel/hook point of view? - FChE From acme at infradead.org Sat Jan 23 15:57:49 2010 From: acme at infradead.org (Arnaldo Carvalho de Melo) Date: Sat, 23 Jan 2010 13:57:49 -0200 Subject: linux-next: add utrace tree In-Reply-To: <20100123110121.1ce89bb9@lxorguk.ukuu.org.uk> References: <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123110121.1ce89bb9@lxorguk.ukuu.org.uk> Message-ID: <20100123155749.GC2689@ghostprotocols.net> Em Sat, Jan 23, 2010 at 11:01:21AM +0000, Alan Cox escreveu: > Years ago (and it really must be years ago because this was about the > time I started hacking on Linux stuff !) there was a proposal to extract > and sanitize the arch specific stuff in binutils and in gdb etc into > sensible libraries that could be used by other apps. Aleluiah if it had happened at that time, but sadly... :-( - Arnaldo From tytso at mit.edu Sat Jan 23 19:48:20 2010 From: tytso at mit.edu (tytso at mit.edu) Date: Sat, 23 Jan 2010 14:48:20 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100123114729.GA7828@redhat.com> References: <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> Message-ID: <20100123194820.GM21263@thunk.org> On Sat, Jan 23, 2010 at 06:47:29AM -0500, Frank Ch. Eigler wrote: > > What utrace does is that it provides an opaque set of APIs for > > unspecified and out of tree _kernel_ modules (such as systemtap). It > > doesnt support any 'application' per se. It basically removes the > > kernel's freedom at shaping its own interaction with debug > > application. > > This claim is hard to take any more seriously than emoting that the > blockio layer is "opaque" because device drivers "remove freedom" for > the kernel to "shape its interaction" with hardware. If you have any > *real evidence* about how any present user of utrace misuses that > capability, or interferes with the "kernel's freedom", show us please. The fundamental issue which Ingo is trying to say (and which you apparently don't seem to be understanding) is that utrace doesn't export a syscall (which is an ABI that we are willing to promise will be stable), but rather a set of kernel API's (which we never promise to be stable), and the fact that there will be out-of-tree programs that are going to be trying to depend on that interface (much like Systemtap does today when it creates kernel modules) is something that is considered on par with Nvidia trying to ship proprietary video drivers. (OK, maybe not *quite* as evil as Nvidia because at least SystemTap is open source, but the bottom line is that enabling out-of-tree modules isn't considered a good thing, and if we know in advance that there are out-of-tree modules, there is a strong tendency to want to nip those in the bud.) The reason why I avoid Nvidia hardware like the plague is because I work on bleeding-edge kernels, and even though companies like Nvidia and Broadcom try very hard to keep up with released upstream kernels, #1, there is always the concern of what happens if they decide to change that policy, and #2, invariably something will break during the -rc1 or -rc2 stage, and then my laptop is useless for running bleeding edge kernels. It's one of the reasons why many kernel developers gave up on SystemTap, because it's not something that can be trusted to be there, and the fault is not on our changing the API's, it's on SystemTap depending on API's that were never guaranteed to be stable in the first place. If you want to try to slide utrace in, such that we're able to ignore the fact that there will be this external house that will be built on quicksand, pointing at how nice the external house will be isn't going to be helpful. Nor is pointing at the ability that other people will be able to build other really nice houses on the aforementioned quicksand (i.e., out-of-tree kernel modules that depend on kernel API's). A simple "code cleanup" argument is not carrying the day (Look! We can cleanup the ptree code!). It's going to have to be a **really** cool in-tree kernel funtionality that provides a killer feature (in Linus's words), enough so that people are willing to overlook the fact that there's this monster external out-of-tree project that wants to be depend on API's that may not be stable, and which, even if the developers don't grump at us, users will grump at us when we change API's that we had never guaranteed will be stable, and then Systemtap breaks. This is probably why Ingo invited you to think about ways of doing some kind of safe in-kernel bytecode approach. That has the advantage of doing away with external kernel modules, with all of their many downsides: its dependency on unstable kernel API's, the fact that many financial customers have security policies that prohibit C compilers on production machines, the inherent security risk of allowing external random kernel modules to be delivered and loaded into a system, etc. - Ted From torvalds at linux-foundation.org Sun Jan 24 05:04:56 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sat, 23 Jan 2010 21:04:56 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Sat, 23 Jan 2010, Kyle Moffett wrote: > > Now how do we get from here to a moderately portable API for > interrogating, controlling, and intercepting process state? Umm? ptrace? It's not _pretty_, but it's a hell of a lot more portable than utrace is ever going to be. Yes, the details differ between OS's (and between architectures), but let's face it, things like register state probing is _never_ going to be portable across different architectures simply because the register state isn't the same. > The killer app for this will be the ability to delete thousands of > lines of code from GDB, strace, and all the various other tools that > have to painfully work around the major interface gotchas of ptrace(), > while at the same time making their handling of complex processes much > more robust. No. There is absolutely _no_ reason to believe that gdb et al would ever delete the ptrace interfaces anyway. That really is my point. Adding a new interface, when an old and crufty (but working) interface is inevitably going to be around anyway - and is inevitably always going to have portability issues - is STUPID. Let's take strace, for example. Yes, ptrace() is crufty, but have you actually looked at strace source code? The problem isn't really a crufty interface to read registers etc, the bigger problem for strace is that different architectures and OS's have different system call argument rules, different ways to read/write system call numbers yadda yadda yadda. Take a look at strace sources some day. Moving away from ptrace on Linux (even if you decided that you don't care about old versions of the kernel that don't know anything else) would simplify ABSOLUTELY NOTHING. Really. Quiet the reverse, I suspect. The Solaris and FreeBSD support uses ptrace too, afaik, so you' just be confusing the issue. And the fact is, strace would still end up supporting ptrace anyway, just so that you could run it on old kernels. So the whole "making a new utrace interface would simpligy things" is simply a total lie. The fact that ptrace is a bit of an odd interface IN NO WAY means that any other interface would end up being appreciably simpler. It would just result in _more_ code in strace, and more confusion. Linus From tytso at mit.edu Sun Jan 24 10:25:13 2010 From: tytso at mit.edu (tytso at mit.edu) Date: Sun, 24 Jan 2010 05:25:13 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <20100124102513.GB4382@thunk.org> On Sat, Jan 23, 2010 at 09:04:56PM -0800, Linus Torvalds wrote: > > The killer app for this will be the ability to delete thousands of > > lines of code from GDB, strace, and all the various other tools that > > have to painfully work around the major interface gotchas of ptrace(), > > while at the same time making their handling of complex processes much > > more robust. > > No. There is absolutely _no_ reason to believe that gdb et al would ever > delete the ptrace interfaces anyway. More to the point, gdb *couldn't* use utrace, because utrace only exports a kernel API; not a syscall interface. And if the Red Hat Toolchain folks are thinking about encouraging gdb to start creating out-of-tree kernel modules, so that (a) gdb requires root privs, and (b) gdb is as (un)stable as SystemTap with respect to development kernels by making it dependent on internal kernel API's, the Red Hat Toolchain group needs to be smacked upside the head... - Ted From fche at redhat.com Sun Jan 24 13:20:09 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Sun, 24 Jan 2010 08:20:09 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100124102513.GB4382@thunk.org> References: <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100124102513.GB4382@thunk.org> Message-ID: <20100124132009.GB4263@redhat.com> Hi - On Sun, Jan 24, 2010 at 05:25:13AM -0500, tytso at mit.edu wrote: > [...] > > > The killer app for this will be the ability to delete thousands of > > > lines of code from GDB, strace, and all the various other tools that > > > have to painfully work around the major interface gotchas of ptrace(), > > > while at the same time making their handling of complex processes much > > > more robust. > > > > No. There is absolutely _no_ reason to believe that gdb et al would ever > > delete the ptrace interfaces anyway. > > More to the point, gdb *couldn't* use utrace, because utrace only > exports a kernel API; not a syscall interface. Yes, this might explain why Kyle wrote: > > > [...] I believe that "utrace" is the kernel side of that > > > API. [...] > And if the Red Hat Toolchain folks are thinking about encouraging > gdb to start creating out-of-tree kernel modules [...] the Red Hat > Toolchain group needs to be smacked upside the head... Those keeping up will note that an ordinary in-tree, non-modular, non-root-only, already-works-with-standard-gdb, potentially-better-than-ptrace debugger interface has already been prototyped & posted on lkml as an RFC. - FChE From tglx at linutronix.de Sun Jan 24 16:36:21 2010 From: tglx at linutronix.de (Thomas Gleixner) Date: Sun, 24 Jan 2010 17:36:21 +0100 (CET) Subject: linux-next: add utrace tree In-Reply-To: <20100123120301.GD7828@redhat.com> References: <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122012516.GE22003@redhat.com> <20100122022255.GF22003@redhat.com> <20100123060401.GB19399@elte.hu> <20100123120301.GD7828@redhat.com> Message-ID: On Sat, 23 Jan 2010, Frank Ch. Eigler wrote: > On Sat, Jan 23, 2010 at 07:04:01AM +0100, Ingo Molnar wrote: > > > [...] Also, if any systemtap person is interested in helping us > > create a more generic filter engine out of the current ftrace filter > > engine (which is really a precursor of a safe, sandboxed in-kernel > > script engine), that would be excellent as well. [...] > > Thank you for the invitation. > > > More could be done - a simple C-like set of function perhaps - some minimal > > per probe local variable state, etc. (perhaps even looping as well, with a > > limit on number of predicament executions per filter invocation.) > > Yes, at some point when such bytecode intepreter gets rich enough, one > may not need the translated-to-C means of running scripts. > > > > ( _Such_ a facility, could then perhaps be used to allow applications access > > to safe syscall sandboxing techniques: i.e. a programmable seccomp concept > > in essence, controlled via ASCII space filter expressions [...] > > IMHO that would be a superior concept for security modules too [...] > > > > [...] specific functionality with an immediately visible upside, > > with no need for opaque hooks. > > This OTOH seem like rather a stretch. If one claims that "opaque > hooks" are bad, so instead have hooks that jump not to auditable C > code but an bytecode interpreter? And have the bytecodes be uploaded > from userspace? How is this supposed to produce "transparency" from > the kernel/hook point of view? Simply because the kernel controls which byte code is executed and has control over the functionality behind it. That makes the hooks well defined and transparent. Thanks, tglx From fche at redhat.com Sun Jan 24 18:01:21 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Sun, 24 Jan 2010 13:01:21 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100123194820.GM21263@thunk.org> References: <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> Message-ID: <20100124180121.GA12744@redhat.com> Hi - tytso wrote: > [...] Let me see if I can paraphrase those of your concerns that were substantive: 1) That if utrace is merged, and systemtap keeps on using it, there may be some sort of chilling effect on kernel developers that would impede utrace's future development. This might sound plausible to an outsider, but luckily we're not stuck with having to speculate: one can examine history. Systemtap has been around, working roughly the same way, for about *five years*. Systemtap modules use more than a handful of mainstream module-accessible kernel services. During all this time, how many examples have there been when when systemtap developers have pleaded with lkml to avoid changing some prior interface? How many of those successfully? (That last one is a trick question, since both numbers are really close to *zero*.) How much real impediment to change has our mere existence caused? 2) That systemtap is not portable to all kernel versions. Problems do periodically occur. However, one can again refer to historical facts to assess whether in fact they warrant long term grudges. In every release note, we list the range of kernel versions we test against. We may have one of the broadest ranges of support, 2.6.9 through to many current -rc*s and non-linus trees. We have several mechanisms which let us easily adapt to most changes. It may interest readers to find out that the number of systemtap changes we have had to add on account of kernel changes is on the order of a *few per year*. The usual turnaround, once reported, is on the order of a *few days*. 3) That systemtap users will complain to kernel developers if systemtap becomes incompatible. Let's go to the historical record again. How many such complaints have actually been seen in inappropriate fora such as lkml? How difficult were they to diagnose / redirect to the proper venue? Have they constituted a "loss of face" for kernel developers? 4) That systemtap is almost but not quite as evil as nvidia. It seems factors like ... - always being completely open source project - keeping in regular contact with lkml and other constituencies - not being related to essential hardware enablement, so users not wanting it don't have to touch it - the compile-to-C approach being technologically necessary since there was no alternative plausible way at the time (and still now) - repeatedly offering infrastructure code with non-stap uses ... all add up to a mere nudge away from entirely "evil". If so, I wonder if your sort of grossly bimodal view of ethical virtue is going to foster the right sorts of change in the linux kernel community. - FChE From cmoller at redhat.com Sun Jan 24 20:04:22 2010 From: cmoller at redhat.com (Chris Moller) Date: Sun, 24 Jan 2010 15:04:22 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100124180121.GA12744@redhat.com> References: <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100124180121.GA12744@redhat.com> Message-ID: <4B5CA7C6.2040502@redhat.com> On 01/24/10 13:01, Frank Ch. Eigler wrote: > > ... all add up to a mere nudge away from entirely "evil". If so, I > wonder if your sort of grossly bimodal view of ethical virtue is going > to foster the right sorts of change in the linux kernel community. > Nothing like a good religious debate to liven up your Sunday... > > - FChE > > From web.agent at msa.hinet.net Sun Jan 24 20:20:10 2010 From: web.agent at msa.hinet.net (BeClass §K¶O½u¤W³ø¦W¨t²Î) Date: Mon, 25 Jan 2010 04:20:10 +0800 Subject: §K¶O½u¤W³ø¦W¨t²Î-¤è«K¡B§Ö³t¡B§K¶O Message-ID: <201001242020.EAA18447@msr15.hinet.net> An HTML attachment was scrubbed... URL: From pavel at ucw.cz Wed Jan 20 12:55:46 2010 From: pavel at ucw.cz (Pavel Machek) Date: Wed, 20 Jan 2010 13:55:46 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <1263740506.557.20963.camel@twins> References: <20100111122529.22050.32596.sendpatchset@srikar.in.ibm.com> <1263467289.4244.288.camel@laptop> <1263498366.4875.25.camel@localhost.localdomain> <1263546228.4244.343.camel@laptop> <20100115093831.GC26396@in.ibm.com> <1263549014.4244.374.camel@laptop> <4B53213C.9050303@redhat.com> <1263739939.557.20938.camel@twins> <4B532508.4000806@redhat.com> <1263740506.557.20963.camel@twins> Message-ID: <20100120125546.GC1420@ucw.cz> On Sun 2010-01-17 16:01:46, Peter Zijlstra wrote: > On Sun, 2010-01-17 at 16:56 +0200, Avi Kivity wrote: > > On 01/17/2010 04:52 PM, Peter Zijlstra wrote: > > > > Also, if its fixed size you're imposing artificial limits on the number > > > of possible probes. > > > > > > > Obviously we'll need a limit, a uprobe will also take kernel memory, we > > can't allow people to exhaust it. > > Only if its unprivilidged, kernel and root should be able to place as > many probes until the machine keels over. Well, it is address space that limits you in both cases... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html From kyle at moffetthome.net Mon Jan 25 01:42:13 2010 From: kyle at moffetthome.net (Kyle Moffett) Date: Sun, 24 Jan 2010 20:42:13 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100123194820.GM21263@thunk.org> References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> Message-ID: On Sat, Jan 23, 2010 at 14:48, wrote: > The fundamental issue which Ingo is trying to say (and which you > apparently don't seem to be understanding) is that utrace doesn't > export a syscall (which is an ABI that we are willing to promise will > be stable), but rather a set of kernel API's (which we never promise > to be stable), The point that's being missed is that there is a chicken-and-egg problem here. The "chicken" is a replacement or extension to the debugger interface that would make it possible for me to do things like GDB a process while it's being strace'd or vice versa. The "egg" is the "utrace" bits, an unstable but somewhat arch-generic ABI that abstracts out ptrace() to make it possible to stack both in-kernel and userspace debuggers/tracers/etc and have multiple simultaneous users. > and the fact that there will be out-of-tree programs > that are going to be trying to depend on that interface (much like > Systemtap does today when it creates kernel modules) is something that > is considered on par with Nvidia trying to ship proprietary video > drivers. Ugh... perhaps we should derive a variation of Godwin's law for this: "As an LKML discussion grows longer, the probability of an unfavorable comparison involving nVidia or Microsoft approaches 1." > If you want to try to slide utrace in, such that we're able to ignore > the fact that there will be this external house that will be built on > quicksand, pointing at how nice the external house will be isn't going > to be helpful. ?Nor is pointing at the ability that other people will > be able to build other really nice houses on the aforementioned > quicksand (i.e., out-of-tree kernel modules that depend on kernel > API's). Personally I don't give a flying **** about SystemTap; I'm interested in things like the ability to stack gdb with strace, the RFC gdb-stub posted a week ago, etc. None of those abilities would be out-of-tree modules at all, and therefore the "quicksand" analogy is specious. > A simple "code cleanup" argument is not carrying the day (Look! ?We > can cleanup the ptree code!). ?It's going to have to be a **really** > cool in-tree kernel funtionality that provides a killer feature (in > Linus's words), enough so that people are willing to overlook the fact > that there's this monster external out-of-tree project that wants to > be depend on API's that may not be stable, and which, even if the > developers don't grump at us, users will grump at us when we change > API's that we had never guaranteed will be stable, and then Systemtap > breaks. I would be willing to guess that something like 95% of the people using SystemTap or other tools are doing so on Red Hat Enterprise Linux or other enterprise supported platforms, and so when something breaks they go whinge at Red Hat, etc. If I recall correctly Red Hat and many of the other vendors already heavily fiddle with kernel patches they apply to provide some amount of binary module compatibility. > This is probably why Ingo invited you to think about ways of doing > some kind of safe in-kernel bytecode approach. ?That has the advantage > of doing away with external kernel modules, with all of their many > downsides: its dependency on unstable kernel API's, the fact that many > financial customers have security policies that prohibit C compilers > on production machines, the inherent security risk of allowing > external random kernel modules to be delivered and loaded into a > system, etc. There are substantial non-SystemTap uses for utrace that would *not* be satisfied by an "in-kernel bytecode approach", starting with stacking debuggers and tracers. Furthermore, let's say they did go off and build the in-kernel bytecode interpreter. I can pretty much guarantee that people would say the hooks into the rest of the kernel are too invasive and they should be abstracted out into an API. *This is that API!* Cheers, Kyle Moffett From ananth at in.ibm.com Mon Jan 25 04:59:08 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Mon, 25 Jan 2010 10:29:08 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100123112333.GA15455@elte.hu> References: <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> Message-ID: <20100125045908.GB4895@in.ibm.com> On Sat, Jan 23, 2010 at 12:23:33PM +0100, Ingo Molnar wrote: > > * Kyle Moffett wrote: > > > On Fri, Jan 22, 2010 at 19:22, Linus Torvalds > > wrote: ... > In that sense it might be better to fix/enhance ptrace, if there's interest. > I've written a handful of ptrace extensions in the past (none of them went > upstream tho), it can be done in a useful manner and the code is pretty > hackable. There are basic problems left to be solved: for example why is there > still no 'memory block copy' call, why are we _still_ limited to one word per > system call PTRACE_PEEK* memory copies? It's ridiculous. SparcLinux has > PTRACE_WRITE*/READ* support that implements this, but none of the other > architectures have it so it's essentially unused. > > Or another possible direction would be to extend the perf events syscall with > interception capabilities. It's far more performant at extracting application > state without scheduling than any ptrace method - and interception/injection > would be a natural next step - if there's interest. This certainly is now a chicken and egg problem. Everybody agrees that Linux needs something better than ptrace; legacy ptrace will continue to live, so will utilities written to it (strace, etc). But should that limit what Linux can offer? What's the way out? - Enhance ptrace: At least one ptrace maintainer (Roland) had publically stated he doesn't prefer enhancing legacy ptrace -- that its already a beast to maintain, and adding more complexity to it does it no good. - Extend perf; would perf then use utrace underneath? Or would one have to redo some of what utrace already does for thread level control? - Give utrace a syscall and make it the primary way for users to interact with the layer. There are benefits to this if there is agreement on the utrace layer itself, maybe with less fexibility than what it currently offers? If yes, what should it look like? Any new debug facility will have to incorporate some or most learnings from what utrace tried to address. It would be sad to just dump utrace and redo everything from scratch or band-aid existing interfaces. Ananth From abstruse at alpsosyal.com Mon Jan 25 05:19:22 2010 From: abstruse at alpsosyal.com (Viagra on www.na47.com) Date: Mon, 25 Jan 2010 14:19:22 +0900 Subject: brine ll xylop hone horse power s Message-ID: <4B5D28E2.4080602@alpsosyal.com> agree able finge rstal l calen dar leuco cyte caden ce From tytso at mit.edu Mon Jan 25 04:55:01 2010 From: tytso at mit.edu (tytso at mit.edu) Date: Sun, 24 Jan 2010 23:55:01 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> Message-ID: <20100125045501.GC4372@thunk.org> On Sun, Jan 24, 2010 at 08:42:13PM -0500, Kyle Moffett wrote: > > Personally I don't give a flying **** about SystemTap; I'm interested > in things like the ability to stack gdb with strace, the RFC gdb-stub > posted a week ago, etc. None of those abilities would be out-of-tree > modules at all, and therefore the "quicksand" analogy is specious. Great. So what should be reviewed is utrace *plus* these other userland interfaces, which may get critiqued and improved, and utrace patches can be reviewed in light of these new features. But be warned.... if it turns out that only 30% of utrace is only needed to support gdb stacking with strace, etc., the other 70% will likely get ejected and the utrace patches streamlined to support these in-tree users. But since you don't give a flying **** about SystemTap, presumably you won't mind, right? > I would be willing to guess that something like 95% of the people > using SystemTap or other tools are doing so on Red Hat Enterprise > Linux or other enterprise supported platforms, and so when something > breaks they go whinge at Red Hat, etc. If I recall correctly Red Hat > and many of the other vendors already heavily fiddle with kernel > patches they apply to provide some amount of binary module > compatibility. Sure, but as out-of-tree modules, the best they can expect is that most kernel developers will pretend that they don't exist. Which is OK, when I tried using SystemTap most of the concerns which I expressed as being critical for kernel developers were largely ignored (as near as I could tell) because the target market was RHEL corporate customers, and they prioritized their resourcing accordingly --- so they shouldn't mind if kernel developers return the favor. But that means that we should only merge those portions of utrace that are needed for these alleged "killer new features", and only if these new features are cool enough that they justify the new code on their own merits. At least, IMNSHO. - Ted From peterz at infradead.org Mon Jan 25 10:13:46 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Mon, 25 Jan 2010 11:13:46 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100125045908.GB4895@in.ibm.com> References: <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100125045908.GB4895@in.ibm.com> Message-ID: <1264414426.4283.1644.camel@laptop> On Mon, 2010-01-25 at 10:29 +0530, Ananth N Mavinakayanahalli wrote: > - Extend perf; would perf then use utrace underneath? Or would one have > to redo some of what utrace already does for thread level control? No, perf is about monitoring/tracing not modifying. Its about minimal interference, the very opposite of what ptrace/utrace is about. >From a perf POV if you need to stop a task (changing it scheduling state) you've lost. Furthermore, despite the name utrace isn't about tracing at all, its a full blown debugging infrastructure which completely multiplexes the task state, not something perf is interested in at all. From torvalds at linux-foundation.org Mon Jan 25 16:52:41 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 08:52:41 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> Message-ID: On Sun, 24 Jan 2010, Kyle Moffett wrote: > > The point that's being missed is that there is a chicken-and-egg > problem here. The "chicken" is a replacement or extension to the > debugger interface that would make it possible for me to do things > like GDB a process while it's being strace'd or vice versa. The "egg" > is the "utrace" bits, an unstable but somewhat arch-generic ABI that > abstracts out ptrace() to make it possible to stack both in-kernel and > userspace debuggers/tracers/etc and have multiple simultaneous users. Quite frankly, as far as I'm concerned, I'd be a whole lot more interested in utrace if it's _only_ stated (and implied) goal was to do exactly this. The thing I object to is the whole "dessert topping _and_ floor wax" thing, with kernel interfaces for random other users. If somebody extended ptrace in good ways, that's a totally different thing. But I think utrace has been over-designed, possibly as a result of others coming in and saying "hey, I'd like to use that too for xyz". "Do one thing, and do it well". I'd not mind somebody improving ptrace (including extending its semantics - I do agree that the whole SIGSTOP thing makes it hard to have multiple debuggers). That said, I also suspect that people should still look seriously at simply just improving ptrace. For example, I suspect that the biggest problem with ptrace is really just the signalling, and that creating a new extension for JUST THAT, and then having a model where you can choose - at PTRACE_ATTACH time - how to wait for events would be a good thing. But as long as it is "I want to solve all problems", I'm not very impressed. Maybe somebody would be interested in trying to take the utrace improvements, and scaling down what they promise, and ignoring all input except for "I want to strace and gdb at the same time". So stop the crazy "new kernel interfaces" crap. Stop the crazy "maybe we can use it for ftrace and generic user event tracing too". Stop the crazy. Linus From fche at redhat.com Mon Jan 25 17:02:54 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Mon, 25 Jan 2010 12:02:54 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> Message-ID: <20100125170254.GB22862@redhat.com> Hi - On Mon, Jan 25, 2010 at 08:52:41AM -0800, Linus Torvalds wrote: > [...] If somebody extended ptrace in good ways, that's a totally > different thing. But I think utrace has been over-designed, possibly > as a result of others coming in and saying "hey, I'd like to use > that too for xyz". [...] Earlier, you said that you haven't followed utrace "at all". Upon what real information do you infer that it has been over-designed? - FChE From torvalds at linux-foundation.org Mon Jan 25 17:36:13 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 09:36:13 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100125170254.GB22862@redhat.com> References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> Message-ID: On Mon, 25 Jan 2010, Frank Ch. Eigler wrote: > > Earlier, you said that you haven't followed utrace "at all". Upon > what real information do you infer that it has been over-designed? Upon the information that people are talking about magic new kernel interfaces to do fancy things. And talking about doing things with it that are simply not relevant for ptrace/strace. In fact, in this very thread I've been informed that there are no user interfaces to utrace at all, which to me says that it's been TOTALLY MISDESIGNED FROM THE VERY START, and has nothing to do with making ptrace work for strace/gdb at the same time. In other words, I may not have followed utrace development, but I sure as hell can read. And everything I read about it just makes me less inclined to want to merge it. The people who argue "for" it are actually screwing themselves by arguing for all the wrong things, and making me convinced I don't want to touch it with a ten-foot pole. If somebody were to argue that "this is a simple series of patches to clean up ptrace and make it possible to strace a debugged process", then that would have been different. That's not what you or others have been doing. You've been pushing exactly the _reverse_ of that, namely how great it is for some random totally new features that I'm convinced aren't even used by a lot of people. So give me a populist argument that makes sense for tons of actual users, not some f*cking "here's a cool infrastructure that developers can do random crazy out-of-tree crap with". Because I'm not interested in crazy developers. Linus From torvalds at linux-foundation.org Mon Jan 25 17:45:57 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 09:45:57 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> Message-ID: On Mon, 25 Jan 2010, Linus Torvalds wrote: > > So give me a populist argument that makes sense for tons of actual users, > not some f*cking "here's a cool infrastructure that developers can do > random crazy out-of-tree crap with". Because I'm not interested in crazy > developers. In other words, give me the "killer feature". The thing I've asked for all the time. The thing that you seem to continually NOT EVEN UNDERSTAND. Linus From rostedt at goodmis.org Mon Jan 25 17:54:30 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 25 Jan 2010 12:54:30 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> Message-ID: <1264442070.31321.422.camel@gandalf.stny.rr.com> On Mon, 2010-01-25 at 09:36 -0800, Linus Torvalds wrote: > Because I'm not interested in crazy > developers. > > Linus Uh oh, that's not good for us real-time folks. http://lwn.net/Articles/357800/ "And, according to Linus, the realtime people are crazy, so they can be left to deal with the weird stuff." -- Steve (Sorry, I just couldn't resist) From alan at lxorguk.ukuu.org.uk Mon Jan 25 18:03:15 2010 From: alan at lxorguk.ukuu.org.uk (Alan Cox) Date: Mon, 25 Jan 2010 18:03:15 +0000 Subject: linux-next: add utrace tree In-Reply-To: <1264442070.31321.422.camel@gandalf.stny.rr.com> References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264442070.31321.422.camel@gandalf.stny.rr.com> Message-ID: <20100125180315.6bcb347f@lxorguk.ukuu.org.uk> > Uh oh, that's not good for us real-time folks. > > http://lwn.net/Articles/357800/ > > "And, according to Linus, the realtime people are crazy, so they can be > left to deal with the weird stuff." I'd prefer the trees to be separate for testing purposes: it doens't make much sense to have SMP support as a normal kernel feature when most people won't have SMP anyway" -- Linus Torvalds Use cases got that into the tree pretty easily, I am sure RT ones will do the same. From torvalds at linux-foundation.org Mon Jan 25 18:12:28 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 10:12:28 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <1264442070.31321.422.camel@gandalf.stny.rr.com> References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264442070.31321.422.camel@gandalf.stny.rr.com> Message-ID: On Mon, 25 Jan 2010, Steven Rostedt wrote: > > Uh oh, that's not good for us real-time folks. > > http://lwn.net/Articles/357800/ > > "And, according to Linus, the realtime people are crazy, so they can be > left to deal with the weird stuff." The RT people have actually been pretty good at slipping their stuff in, in small increments, and always with good reasons for why they aren't crazy. Yeah, it's taken them years, and they still have out-of-tree stuff. And yeah, they had to change some things to make them more palatable to the mainline kernel - the whole fundamental raw spinlock change is just the most recent example of that. But on the whole, I think it's actually worked out pretty well for them. I think the mainline kernel has improved in the process, but I also suspect that _their_ RT patches have also improved thanks to having to make the work more palatable to people like me who don't care all that deeply about their particular flavor of crazy. And yeah, I still think the hard-RT people are mostly crazy. So I can work with crazy people, that's not the problem. They just need to _sell_ their crazy stuff to me using non-crazy arguments, and in small and well-defined pieces. When I ask for killer features, I want them to lull me into a safe and cozy world where the stuff they are pushing is actually useful to mainline people _first_. In other words, every new crazy feature should be hidden in a nice solid "Trojan Horse" gift: something that looks _obviously_ good at first sight. The fact that it may contain the germs for future features should be hidden so well that not only is it not used as an argument ("Hey, look at all those soldiers in that horse, imagine what you could do with them"), it should also not be obvious from the source code ("Look at all those hooks I sprinkled around, which aren't actually used by anything, but just imagine what you could do with them"). Linus From rostedt at goodmis.org Mon Jan 25 18:30:37 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 25 Jan 2010 13:30:37 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264442070.31321.422.camel@gandalf.stny.rr.com> Message-ID: <1264444237.31321.431.camel@gandalf.stny.rr.com> On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote: > But on the whole, I think it's actually worked out pretty well for them. I > think the mainline kernel has improved in the process, but I also suspect > that _their_ RT patches have also improved thanks to having to make the > work more palatable to people like me who don't care all that deeply about > their particular flavor of crazy. Actually this is an understatement. Every feature (and I do mean _every_) that went from -rt into mainline, undertook 3 or more rewrites before it was acceptable for mainline. And every time, the end result made the -rt patch set better as a whole. Not to mention, that a lot of the early stuff also cleaned up mainline. You can't have Real-Time without having a clean kernel. And as you stated, a lot of those patches to clean up the kernel, no one even knew that the real reason was to help the -rt patch set. They were well disguised Trojan horses. Darn, it looks like you are onto our scheme. -- Steve From tglx at linutronix.de Mon Jan 25 18:45:53 2010 From: tglx at linutronix.de (Thomas Gleixner) Date: Mon, 25 Jan 2010 19:45:53 +0100 (CET) Subject: linux-next: add utrace tree In-Reply-To: <1264444237.31321.431.camel@gandalf.stny.rr.com> References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264442070.31321.422.camel@gandalf.stny.rr.com> <1264444237.31321.431.camel@gandalf.stny.rr.com> Message-ID: On Mon, 25 Jan 2010, Steven Rostedt wrote: > On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote: > > > But on the whole, I think it's actually worked out pretty well for them. I > > think the mainline kernel has improved in the process, but I also suspect > > that _their_ RT patches have also improved thanks to having to make the > > work more palatable to people like me who don't care all that deeply about > > their particular flavor of crazy. > > Actually this is an understatement. Every feature (and I do mean > _every_) that went from -rt into mainline, undertook 3 or more rewrites > before it was acceptable for mainline. And every time, the end result > made the -rt patch set better as a whole. > > Not to mention, that a lot of the early stuff also cleaned up mainline. > You can't have Real-Time without having a clean kernel. And as you > stated, a lot of those patches to clean up the kernel, no one even knew > that the real reason was to help the -rt patch set. They were well > disguised Trojan horses. Tsss. Never admit such things. > Darn, it looks like you are onto our scheme. Which scheme ? The only Trojan horses in the kernel tree are in drivers/char/drivers/char/tty_io.c which put Linus himself into Linux-0.98.2 :) tglx From mjw at redhat.com Mon Jan 25 20:30:53 2010 From: mjw at redhat.com (Mark Wielaard) Date: Mon, 25 Jan 2010 21:30:53 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> Message-ID: <1264451453.3028.59.camel@springer.wildebeest.org> On Mon, 2010-01-25 at 09:36 -0800, Linus Torvalds wrote: > Upon the information that people are talking about magic new kernel > interfaces to do fancy things. And talking about doing things with it that > are simply not relevant for ptrace/strace. Unfortunately ptrace does all that magic already (badly). People don't just use it for (s)tracing syscalls, but also for tracing signals, for single step debugging and poking at memory, register state, for process jailing and virtualization (uml) through syscall emulation. So when they are talking about these fancy things that is because that is what ptrace gives them currently. And they hate it, because the ptrace interface is such a pain to work with. And all these things don't really work together. You cannot trace, emulate, debug, jail at the same time. And all these users have wishes to extend the current ptrace interface mess. But nobody dares to extend ptrace in any direction because fixing/cleaning up one of these use cases might break the others in subtle and not so subtle ways. Which is why the utrace series of patches is cleaning up all this stuff first. Cheers, Mark From mingo at elte.hu Mon Jan 25 20:34:30 2010 From: mingo at elte.hu (Ingo Molnar) Date: Mon, 25 Jan 2010 21:34:30 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264442070.31321.422.camel@gandalf.stny.rr.com> <1264444237.31321.431.camel@gandalf.stny.rr.com> Message-ID: <20100125203430.GA31937@elte.hu> * Thomas Gleixner wrote: > On Mon, 25 Jan 2010, Steven Rostedt wrote: > > > On Mon, 2010-01-25 at 10:12 -0800, Linus Torvalds wrote: > > > > > But on the whole, I think it's actually worked out pretty well for them. > > > I think the mainline kernel has improved in the process, but I also > > > suspect that _their_ RT patches have also improved thanks to having to > > > make the work more palatable to people like me who don't care all that > > > deeply about their particular flavor of crazy. > > > > Actually this is an understatement. Every feature (and I do mean _every_) > > that went from -rt into mainline, undertook 3 or more rewrites before it > > was acceptable for mainline. And every time, the end result made the -rt > > patch set better as a whole. > > > > Not to mention, that a lot of the early stuff also cleaned up mainline. > > You can't have Real-Time without having a clean kernel. And as you stated, > > a lot of those patches to clean up the kernel, no one even knew that the > > real reason was to help the -rt patch set. They were well disguised Trojan > > horses. > > Tsss. Never admit such things. Here's four examples of recent kernel features: - lockdep [1] - ftrace [2] - new-style generic mutexes and spin-mutexes [3] - the new arch/x86 tree [4] I suspect few would guess that all of these features were motivated by the -rt kernel originally: [1] lockdep started out as the 'track irqs-off sections' patches in -rt [2] ftrace started out as -rt's latency tracer and logdev [3] mutex.c was motivated by rtmutex.c [4] arch-x86 was motivated by annoyance with needless porting of -rt features from 32-bit to 64-bit x86 and back. [ Nor would you normally guess that Linux itself was motivated by a guy wanting to toy around with 32-bit x86 assembly ;-) ] Various forms of craziness that motivate us dont really hurt, as long as the process is rooted in reality. We can 'wish' for the crazier future stuff and can help it indirectly, and sometimes it might even happen down the road - but reality and common-sense utility is what controls. And note that there's nothing dishonest about doing multi-purpose patches, as long as the mainstream purpose isnt really just a decoy. When we decouple a feature from -rt we usually forget its -rt purpose and the intermediate for-mainstream forms arent even useful for -rt - back-integration into -rt comes at a later stage. This makes it doubly sure that it's all formed by mainstream's need, not -rt's needs. In the few cases where the -rt role is prominent for some weird reason we declare it as such. It's the exception to the rule really - few useful kernel features are single purpose. ( When they are then we are likely doing something wrong. -rt _is_ a special case. ) Ingo From torvalds at linux-foundation.org Mon Jan 25 20:42:22 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 12:42:22 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <1264451453.3028.59.camel@springer.wildebeest.org> References: <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264451453.3028.59.camel@springer.wildebeest.org> Message-ID: On Mon, 25 Jan 2010, Mark Wielaard wrote: > > And all these users have wishes to extend the current ptrace interface > mess. But nobody dares to extend ptrace in any direction because > fixing/cleaning up one of these use cases might break the others in > subtle and not so subtle ways. Which is why the utrace series of patches > is cleaning up all this stuff first. I call bullshit. You can clean up ptrace without introducing odd new interfaces and trying to sell it as some revolutionary new kernel interface that can do anything. I also call bullshit on the "ptrace() is so horribly nasty" argument. Yes, I've seen the code that uses ptrace in user space, and yes, it's nasty, but it's invariably _not_ nasty so much because ptrace itself is nasty, but because it's full of #ifdef so-and-so-os/so-and-so-arch, and the code is never cleaned up. There are a couple of obvious cases of ptrace being uglier-than-it-needs- to-be. Like the traditional ptrace read/write interface being purely "word at a time", and that clearly is not pretty. Several architectures already do "copy range" kind of versions on it, though, so that's just a detail, and if anybody wanted to clean it up, they could have. The more fundamental problem is the use of signals (while at the same time wanting to _trap_ non-ptrace signals), without any model for a "connection state", which is why you can have only one tracer. But again, that's largely a user interface issue, and apparently utrace does _nothing_ for that problem at all. So I do agree that ptrace is not a great interface. However: repeating that statement over and over in _no_ way excuses some totally unrelated code that doesn't have anything what-so-ever to do with the actual problems of ptrace. Linus From tromey at redhat.com Mon Jan 25 21:05:54 2010 From: tromey at redhat.com (Tom Tromey) Date: Mon, 25 Jan 2010 14:05:54 -0700 Subject: linux-next: add utrace tree In-Reply-To: (Linus Torvalds's message of "Sat, 23 Jan 2010 21:04:56 -0800 (PST)") References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: >>>>> "Linus" == Linus Torvalds writes: Linus> No. There is absolutely _no_ reason to believe that gdb et al would ever Linus> delete the ptrace interfaces anyway. Yes, in GDB we approximately never delete anything. Nevertheless, if the Linux kernel were to present a new user-space API, and if it had an advantage over ptrace, then we would port GDB to use it. There are other platforms where, IIRC, we now use some /proc thing instead of ptrace. There are definitely things we would like from such an API. Here's a few I can think of immediately, there are probably others. * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Internally we're already using a self-pipe to integrate this into gdb's main loop. Relatedly, don't mess with the inferior's parentage. * Support "displaced stepping" in the kernel; I think this would improve performance when debugging in non-stop mode. * Support some kind of breakpoint expression in the kernel; this would improve performance of conditional breakpoints. Perhaps the existing gdb agent expressions could be used. Tom From torvalds at linux-foundation.org Mon Jan 25 21:41:57 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 13:41:57 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Mon, 25 Jan 2010, Tom Tromey wrote: > > There are definitely things we would like from such an API. Here's a > few I can think of immediately, there are probably others. > > * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. > Internally we're already using a self-pipe to integrate this into > gdb's main loop. Relatedly, don't mess with the inferior's parentage. As I kind of alluded to elsewhere, I heartily agree with this. The really major design mistake of ptrace (as opposed to just various ugly corners) is how it has no connection information, and that ends up being one of the main reasons why you can't have two ptracers working on the same thing. (There are other things that complicate that too, of course, like simply just trying to manage various per-thread state like debug registers etc, but that's a separate class of complications). > * Support "displaced stepping" in the kernel; I think this would improve > performance when debugging in non-stop mode. Don't we already do that at least on x86? Just doing a single-step should work on an instruction even if it has a breakpoint on it, because we set the TF bit. Or maybe I'm not understanding what displaced stepping means to you. > * Support some kind of breakpoint expression in the kernel; this would > improve performance of conditional breakpoints. Perhaps the existing > gdb agent expressions could be used. I suspect it might be reasonable to do simple expressions on breakpoints, but not the kind of things gdb exports to users. IOW, maybe you could have a single conditional on a single value (register or memory) associated with an expression. Regardless, internally to the kernel your two later issues are "details". The "how to connect to the debuggee" is a much more fundamental issue, and has the biggest design/interface impact. The other would likely just be new ptrace command extensions that somebody would have to just implement the grotty details on. Linus From renzo at cs.unibo.it Tue Jan 26 00:02:37 2010 From: renzo at cs.unibo.it (Renzo Davoli) Date: Tue, 26 Jan 2010 01:02:37 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264451453.3028.59.camel@springer.wildebeest.org> Message-ID: <20100126000237.GA15936@cs.unibo.it> Let me add my two euro-cents to this discussion. Mark Wielaard : > Unfortunately ptrace does all that magic already (badly). People don't > just use it for (s)tracing syscalls, but also for tracing signals, for > single step debugging and poking at memory, register state, for process > jailing and virtualization (uml) through syscall emulation. > So when they are talking about these fancy things that is because that > is what ptrace gives them currently. And they hate it, because the > ptrace interface is such a pain to work with. And all these things don't > really work together. You cannot trace, emulate, debug, jail at the same > time. I support Mark's words. I don't use ptrace for debugging/tracing and I have experienced severe limitations of ptrace interface. (I have tried to post some extensions for ptrace to overcome some constraints.... see my posts on ptrace_vm or ptrace_multi on LKML). Oleg Nesterov, writing to Andrew Morton said: > First of all, utrace makes other things possible. gdbstub, > nondestructive core dump, uprobes, kmview, hopefully more. I didn't > look at these projects closely, perhaps other people can tell more. As > for their merge status, until utrace itself is merged it is very hard to > develop them out of tree. In the list above there is also kmview, which is a creature of mines. umview and kmview are partial virtual machines, processes running in a [uk]mview machine can have their own view for the file system, networking support, user-id, system-name, etc. A [uk]mview machine virtualizes just what the user need: the filesystem or just a subtree/some subtrees or networking or define one/some virtual devices, etc. The "view" provided by a [uk]mview machine can be a composition of real resources (provided by the Linux kernel) and virtual resources. Each system call request gets hijacked to a module of [uk]mview when it refers to a virtual resource. The request is forwarded to the kernel otherwise. umview is based on ptrace, kmview uses a kernel module based on utrace. (umview is included in debian lenny (to sid), tutorial and manuals in wiki.virtualsquare.org) IMHO utrace is better than ptrace (or an optimized version of it): 1 - "Frank Ch. Eigler" wrote: > At least one reason is that ptrace is single-usage-only, so for > example you cannot concurrently debug & strace the same program. - exactly. utrace allows multiple tracing engines, this means that kmview machines can be nested (in a natural way, no extra code is needed for this feature). In the same way strace/gdb can run on virtualized processes, too. 2 - kmview kernel module implements several optimizations to minimize the number of requests forwarded to the kmview process (the virtual machine monitor). kmview is just a module using the utrace interface, prior attempts of optimized umview required kernel patches. Like kmview any other service requiring process tracing can include specific optimizations in its own kernel module. On the other hand, all these services could use the standardized utrace interface for their optimizations, instead asking for messy patches to change code all around the kernel source. 3 - ptrace takes SIGSTOP/SIGCONT for its own management. Strace/gdb and umview cannot be transparent for programs using these signals. Oleg Nesterov talking about Ptrace said: > Of course they can't use other interfaces, we don't have them. And > without the new abstraction layer we will never have, I think. I agree. THe following list includes the execution times I got in a recent test (make vde-2, see http://www.cs.unibo.it/~renzo/view-os-lk2009.pdf) plain kernel 22.7s, kmview (no modules) 23.9s (+5.5%), full kmview (modules loaded, all syscall virtualized) 38.5s (+70%) optimized umview 51.0 (+124%), umview on vanilla kernel 75.7s (+233%). utrace can be used to speedup virtualization (at least in my case it worked in this way). Performance can be useful for debugging but it is a main issue for virtualization. Kmview module provides optimizations to select the system call requests depending on the syscall number, the pathnames or the file descriptors. http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications Trying to add all the optimizations needed by different projects to ptrace is a never-ending nightmare: the LKML will continue to receive patch proposals for ptrace... The solution is that everybody can code his/her optimized kernel/user interface for tracing in his/her kernel module, i.e. utrace. renzo From torvalds at linux-foundation.org Tue Jan 26 00:07:21 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 25 Jan 2010 16:07:21 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100126000237.GA15936@cs.unibo.it> References: <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264451453.3028.59.camel@springer.wildebeest.org> <20100126000237.GA15936@cs.unibo.it> Message-ID: On Tue, 26 Jan 2010, Renzo Davoli wrote: > > The solution is that everybody can code his/her optimized kernel/user > interface for tracing in his/her kernel module, i.e. utrace. I don't think people understand. That is simply not a "solution". That is a PROBLEM. The thing you describe is an absolute disaster. Which is exactly why I rant against it. The last thing we want to have is "here, take this, and make your own kernel module mess around it optimized for your particular crazy scenario". But every SINGLE post in this thread that has argued for utrace has argued exactly this way. Linus From hjlvincent at hanjinlogistics.com Tue Jan 26 02:25:33 2010 From: hjlvincent at hanjinlogistics.com (vincentxiao) Date: Tue, 26 Jan 2010 10:25:33 +0800 Subject: Ocean & air quotation Message-ID: <93EC201B68CE486B99A6A63438B0D911@altszdn.com> ????????? ????????????????????????????????????? ??????????????????????? ?????????????????????????????????????? ?1-2????????????????????????????????? ?????????????????????? Hanjin Logistics, Inc. [HJL] was founded in 2001 with the vision of becoming a multiservice domestic transportation entity through Hanjin Shipping's available resources within North America.? However, HJL recognized our customers' needs and demands for more sophisticated logistics services due to the ever-increasing complexity of their supply chain structures. Rising up to the challenge, HJL has focused on expanding its scope of services; providing global coverage as a Third-Party Logistics Service Provider. Through competencies in Warehousing, Trucking, Customs Clearance, Freight Forwarding, Transloading, and IT services, HJL would like to invite you to experience the true benefits of 3PL services. HanJin shipping & logistics shenzhen office ADD: Rm 23/F, China Resources Building, No. 5001 Shennan Road East, Shenzhen 518001, Guangdong,PRC Mobile : 86-13714654094/18922874094 Tel : 86-755-82690122* 257 Fax: 86-755-82690182 Msn: sz_forwarder at hotmail.com Email:hjlvincent at hanjinlogistics.com Web site : Http:// www.hanjinlogistics.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 62863 bytes Desc: not available URL: From schwidefsky at de.ibm.com Tue Jan 26 13:13:06 2010 From: schwidefsky at de.ibm.com (Martin Schwidefsky) Date: Tue, 26 Jan 2010 14:13:06 +0100 Subject: s390 && user_enable_single_step() (Was: odd utrace testing results on s390x) In-Reply-To: <20100121205113.GB20050@redhat.com> References: <20100105153633.GA9376@redhat.com> <20100105164610.388effd3@mschwide.boeblingen.de.ibm.com> <20100105155913.GA10652@redhat.com> <20100105170301.GA13641@redhat.com> <20100105195818.GA20358@redhat.com> <20100106201722.GB26204@redhat.com> <20100106211329.DB4F5134D@magilla.sf.frob.com> <20100107101855.13248dc2@mschwide.boeblingen.de.ibm.com> <20100107175446.GA13300@redhat.com> <20100107214821.94FF97300@magilla.sf.frob.com> <20100121205113.GB20050@redhat.com> Message-ID: <20100126141306.3cb60b14@mschwide.boeblingen.de.ibm.com> On Thu, 21 Jan 2010 21:51:13 +0100 Oleg Nesterov wrote: > On 01/07, Roland McGrath wrote: > > > > > I am confused as well. Yes, I thought about regs->psw.mask change too, > > > but I don't understand why it helps.. > > [...] > > > But. Acoording to the testing I did (unless I did something wrong > > > again) this patch doesn't make any difference in this particular > > > case. 6580807da14c423f0d0a708108e6df6ebc8bc83d does. > > > > Those results are quite mysterious to me. > > I think we'll have to get Martin to sort it out definitively. Finally nailed that one. Grrmpf.. the special case in the program check handler for single stepped svcs clobbers the argument registers. With our test case this affects the clone() system call. Funny things happen when the clone_flags argument is more or less random .. The following patch fixes the problem for me. -- Subject: [PATCH] fix single stepped svcs with TRACE_IRQFLAGS=y From: Martin Schwidefsky If irq flags tracing is enabled the TRACE_IRQS_ON macros expands to a function call which clobbers registers %r0-%r5. The macro is used in the code path for single stepped system calls. The argument registers %r2-%r6 need to be restored from the stack before the system call function is called. Cc: stable at kernel.org Signed-off-by: Martin Schwidefsky --- arch/s390/kernel/entry.S | 1 + arch/s390/kernel/entry64.S | 1 + 2 files changed, 2 insertions(+) diff -urpN linux-2.6/arch/s390/kernel/entry64.S linux-2.6-patched/arch/s390/kernel/entry64.S --- linux-2.6/arch/s390/kernel/entry64.S 2009-12-03 04:51:21.000000000 +0100 +++ linux-2.6-patched/arch/s390/kernel/entry64.S 2010-01-26 14:04:58.000000000 +0100 @@ -549,6 +549,7 @@ pgm_svcper: mvc __THREAD_per+__PER_access_id(1,%r8),__LC_PER_ACCESS_ID oi __TI_flags+7(%r9),_TIF_SINGLE_STEP # set TIF_SINGLE_STEP TRACE_IRQS_ON + lmg %r2,%r6,SP_R2(%r15) # load svc arguments stosm __SF_EMPTY(%r15),0x03 # reenable interrupts j sysc_do_svc diff -urpN linux-2.6/arch/s390/kernel/entry.S linux-2.6-patched/arch/s390/kernel/entry.S --- linux-2.6/arch/s390/kernel/entry.S 2009-12-03 04:51:21.000000000 +0100 +++ linux-2.6-patched/arch/s390/kernel/entry.S 2010-01-26 14:04:58.000000000 +0100 @@ -571,6 +571,7 @@ pgm_svcper: mvc __THREAD_per+__PER_access_id(1,%r8),__LC_PER_ACCESS_ID oi __TI_flags+3(%r9),_TIF_SINGLE_STEP # set TIF_SINGLE_STEP TRACE_IRQS_ON + lm %r2,%r6,SP_R2(%r15) # load svc arguments stosm __SF_EMPTY(%r15),0x03 # reenable interrupts b BASED(sysc_do_svc) -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. From pavel at ucw.cz Tue Jan 26 13:58:50 2010 From: pavel at ucw.cz (Pavel Machek) Date: Tue, 26 Jan 2010 14:58:50 +0100 Subject: linux-next: add utrace tree In-Reply-To: <18867.1264167798@localhost> References: <20100120072925.GA11395@elte.hu> <20100121013822.28781960.sfr@canb.auug.org.au> <20100122111747.3c224dfd.sfr@canb.auug.org.au> <20100121163004.8779bd69.akpm@linux-foundation.org> <20100121163145.7e958c3f.akpm@linux-foundation.org> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122052139.GA20532@in.ibm.com> <18867.1264167798@localhost> Message-ID: <20100126135849.GC1764@ucw.cz> On Fri 2010-01-22 08:43:18, Valdis.Kletnieks at vt.edu wrote: > On Fri, 22 Jan 2010 10:51:39 +0530, Ananth N Mavinakayanahalli said: > > > FWIW, Oleg's implementation of ptrace over utrace is 100% compatible > > with legacy ptrace; gdb testsuite indicates that > > (http://lkml.org/lkml/2009/12/21/98). > > No, that only proves it's compatible enough for gdb to not care. The problem > is all those *other* packages that abuse ptrace in totally crackhead ways. > > (No, I can't name them - but ptrace is the sort of interface that almost > encourages its use for things somewhere between crackhead and mad-scientist, > so they're almost certainly out there.. WAY out there.. :) strace, subterfugue, ltrace, ...? Plus various homegrown sandboxing tools... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html From ananth at in.ibm.com Tue Jan 26 14:21:54 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Tue, 26 Jan 2010 19:51:54 +0530 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <20100126142154.GA30571@in.ibm.com> On Mon, Jan 25, 2010 at 01:41:57PM -0800, Linus Torvalds wrote: > > > On Mon, 25 Jan 2010, Tom Tromey wrote: ... > > * Support "displaced stepping" in the kernel; I think this would improve > > performance when debugging in non-stop mode. > > Don't we already do that at least on x86? Just doing a single-step should > work on an instruction even if it has a breakpoint on it, because we set > the TF bit. > > Or maybe I'm not understanding what displaced stepping means to you. If Tom is referring to supporting single-stepping out of line, ie., not putting back the original instruction at the bp location, yes, we already support it on various architectures for kernel breakpoints, through the kprobes infrastructure. For userspace, there are more complications to take care of. We are reworking a prototype based on community comments (see the long UBP/XOL thread on lkml from a few days ago). Hopefully the userspace breakpoint assistance layer will be generic enough for gdb to also take advantage of, though the interface details need to be hashed out. Ananth From fche at redhat.com Tue Jan 26 15:00:15 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 26 Jan 2010 10:00:15 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <20100126150015.GA24292@redhat.com> Hi - On Mon, Jan 25, 2010 at 02:05:54PM -0700, Tom Tromey wrote: > [...] > Nevertheless, if the Linux kernel were to present a new user-space API, > and if it had an advantage over ptrace, then we would port GDB to use > it. There are other platforms where, IIRC, we now use some /proc thing > instead of ptrace. > > There are definitely things we would like from such an API. Here's a > few I can think of immediately, there are probably others. > > * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. > [...] Relatedly, don't mess with the inferior's parentage. This is satisfied by the gdbstub prototype. > * Support "displaced stepping" in the kernel [...] I believe this is tantamount to hardware breakpoint support, which is already present (via optional uprobes). > * Support some kind of breakpoint expression in the kernel; this would > improve performance of conditional breakpoints. Perhaps the existing > gdb agent expressions could be used. This is in the todo list. And that "KILLER FEATURE" of running strace plus gdb on the same process? It *already works* with the gdbstub, and unmodified strace + gdb, thanks to utrace multiplexing process control. It is still artificially restricted in many ways, but this sort of thing is ready for testing: % process & [1] 9999 % strace -o FILE -p 9999 & % gdb process (gdb) target remote /proc/9999/gdb (gdb) backtrace (gdb) cont (gdb) ^D % [process continues] % cat FILE [...] % kill 9999 - FChE From js at sig21.net Tue Jan 26 16:08:11 2010 From: js at sig21.net (Johannes Stezenbach) Date: Tue, 26 Jan 2010 17:08:11 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264451453.3028.59.camel@springer.wildebeest.org> <20100126000237.GA15936@cs.unibo.it> Message-ID: <20100126160811.GA3242@sig21.net> On Mon, Jan 25, 2010 at 04:07:21PM -0800, Linus Torvalds wrote: > On Tue, 26 Jan 2010, Renzo Davoli wrote: > > > > The solution is that everybody can code his/her optimized kernel/user > > interface for tracing in his/her kernel module, i.e. utrace. > > I don't think people understand. That is simply not a "solution". That is > a PROBLEM. The thing you describe is an absolute disaster. Which is > exactly why I rant against it. > > The last thing we want to have is "here, take this, and make your own > kernel module mess around it optimized for your particular crazy > scenario". > > But every SINGLE post in this thread that has argued for utrace has argued > exactly this way. I haven't followed much of the utrace discussions, but my impression was that utrace primarily is a cleanup effort, replacing "don't change it, you might break it" code with a clean, well defined (and even documented) implementation. To make it easier for people not familiar with the low-level architecture details to experiment with debugging stuff. Two points to consider: 1. If you'd merge utrace + ptrace-on-utrace, but never anything else which uses the utrace API, wouldn't it still be an improvement? 2. A well defined utrace API makes debugging code more hackable, thus more likely that someone might come up with a brilliant killer debug feature in the future. (This might sound lame, but there are already a few people doing crazy things with utrace while I'm not aware that people have done such experiments based on the current ptrace impl.) BTW, the ptrace improvements discussed elsewhere in this thread (like using an fd intead of signals/wait) are orthogonal to utrace, no? IMHO it's a seperate discussion. Johannes From torvalds at linux-foundation.org Tue Jan 26 16:28:15 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Tue, 26 Jan 2010 08:28:15 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <20100126160811.GA3242@sig21.net> References: <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <20100125170254.GB22862@redhat.com> <1264451453.3028.59.camel@springer.wildebeest.org> <20100126000237.GA15936@cs.unibo.it> <20100126160811.GA3242@sig21.net> Message-ID: On Tue, 26 Jan 2010, Johannes Stezenbach wrote: > > 1. If you'd merge utrace + ptrace-on-utrace, but never anything else > which uses the utrace API, wouldn't it still be an improvement? I already said earlier that I'd be perfectly happy to merge utrace code, as long as it was clear that I'm not merging a platform for crazy work. IOW, the end result might be merging 99% of the code, but I want to set peoples _expectations_ right. I'm not at all interested in merging stuff that has various exported helper functions for people doing random things, but I could happily merge stuff that cleans up internal implementation. > 2. A well defined utrace API makes debugging code more hackable, thus more > likely that someone might come up with a brilliant killer debug > feature in the future. I don't really agree. Clean code makes things easier to improve, and maybe utrace cleans thigns up. But defining new API's makes me very worried, and quite frankly, the last thing I ever want to see is a new interface that out-of-tree modules starr using for random hacking. So I'd be much happier without the whole utrace kernel interface and callbacks, and very much would want to avoid the whole issue of plugins. I'd like to see ptrace improvements - not something else. In other words, I'd much much rather keep the utrace thing _internal_ to ptrace. If people have performance complaints about ptrace, let's look at fixing those _as_such_, rather than look at new modules etc. > BTW, the ptrace improvements discussed elsewhere in this thread > (like using an fd intead of signals/wait) are orthogonal > to utrace, no? IMHO it's a seperate discussion. Largely, yes. Tied together to some degree of course, but the whole issue of code cleanup can be seen as a reasonably independent first step (while moving to a fd-based interface should probably not be done without some cleanup first, so they _are_ somewhat tied together). Linus From hch at infradead.org Tue Jan 26 16:34:31 2010 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 26 Jan 2010 11:34:31 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100125170254.GB22862@redhat.com> <1264451453.3028.59.camel@springer.wildebeest.org> <20100126000237.GA15936@cs.unibo.it> <20100126160811.GA3242@sig21.net> Message-ID: <20100126163431.GA1658@infradead.org> On Tue, Jan 26, 2010 at 08:28:15AM -0800, Linus Torvalds wrote: > I already said earlier that I'd be perfectly happy to merge utrace code, > as long as it was clear that I'm not merging a platform for crazy work. > IOW, the end result might be merging 99% of the code, but I want to set > peoples _expectations_ right. I'm not at all interested in merging stuff > that has various exported helper functions for people doing random things, > but I could happily merge stuff that cleans up internal implementation. > Clean code makes things easier to improve, and maybe utrace cleans thigns > up. But defining new API's makes me very worried, and quite frankly, the > last thing I ever want to see is a new interface that out-of-tree modules > starr using for random hacking. To be fair Roland and Oleg did a lot of work on improving ptrace support that was an offsprint of utrace. It would be great if the reamaining architectures would catch up on beeing converted to it and getting rid of the existing hairy arch ptrace code as much as possible. I'm still not really set on utrace either, but the in-kernel gdbstub Frank has started could be a real killer if it ever gets done up to a fully usable state. If it really requires all the utrace abstractions that seem a bit overdone I'm not sure. Might be a better idea to try to get uprobes and the gdbstub in without it and see how much of the abstraction will be needed anyway as a fallout, just without exporting them to modules and thus actually making them published APIs. From andi at firstfloor.org Tue Jan 26 17:33:26 2010 From: andi at firstfloor.org (Andi Kleen) Date: Tue, 26 Jan 2010 18:33:26 +0100 Subject: linux-next: add utrace tree In-Reply-To: (Tom Tromey's message of "Mon, 25 Jan 2010 14:05:54 -0700") References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <877hr4g49l.fsf@basil.nowhere.org> Tom Tromey writes: > * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. > Internally we're already using a self-pipe to integrate this into > gdb's main loop. Relatedly, don't mess with the inferior's parentage. How would having a kernel based solution be better over your user space simulation? BTW there's the new signalfd() system call that might do it (haven't checked if it works for SIGCHLD) > * Support "displaced stepping" in the kernel; I think this would improve > performance when debugging in non-stop mode. Not sure what "displaced stepping" is exactly, but it sounds like the branch tracing extensions that got added a few releases ago? On modern Intel chips they give you a branch buffer in memory. -Andi -- ak at linux.intel.com -- Speaking for myself only. From envoi at bdop89.info Tue Jan 26 16:54:48 2010 From: envoi at bdop89.info (Celine de Fizeo) Date: Tue, 26 Jan 2010 18:54:48 +0200 Subject: =?UTF-8?Q?Localisez_vos_v=C3=A9hicules_en_temps_reel.?= Message-ID: An HTML attachment was scrubbed... URL: From torvalds at linux-foundation.org Tue Jan 26 18:46:00 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Tue, 26 Jan 2010 10:46:00 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <877hr4g49l.fsf@basil.nowhere.org> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <877hr4g49l.fsf@basil.nowhere.org> Message-ID: On Tue, 26 Jan 2010, Andi Kleen wrote: > Tom Tromey writes: > > > * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. > > Internally we're already using a self-pipe to integrate this into > > gdb's main loop. Relatedly, don't mess with the inferior's parentage. > > How would having a kernel based solution be better over your > user space simulation? Oh, the reason we should do something in the kernel is that you really can't do certain things with the ptrace() interface. For example, think about how Wine and UML use ptrace - and then realize that that makes it impossible to attach a debugger from the outside. That's a real deficiency in ptrace - much more so than the fact that there are some odd details (ie the whole "read/write a word at a time" is just a quirky detail in comparison - not a fundamental problem). > BTW there's the new signalfd() system call that might do it > (haven't checked if it works for SIGCHLD) No, you miss the point. The problem isn't that you want to turn signals into a file descriptor just because you like file descriptors. The problem is that anything that is based on reparenting and signals is fundamentally a "one parent only" kind of interface. See? So the reason I think using an fd is a good idea is _not_ because gdb already uses an fd internally, but because it gives you a "connection" between the debugger and debuggee that is not fundamentally limited to a single controller. (It doesn't have to be a file descriptor, of course, but could be any kind of other model that allows multiple connections. It's just that in unix terms, using a file descriptor as the "cookie" for the connection is a very natural model. So the important part isn't the file descriptor itself, it's the model you could build). Linus From andi at firstfloor.org Tue Jan 26 21:02:32 2010 From: andi at firstfloor.org (Andi Kleen) Date: Tue, 26 Jan 2010 22:02:32 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <877hr4g49l.fsf@basil.nowhere.org> Message-ID: <20100126210232.GF6567@basil.fritz.box> > The problem is that anything that is based on reparenting and signals is > fundamentally a "one parent only" kind of interface. See? I was actually thinking about that before I wrote the email. But when I did that i couldn't come up with a good scenario where multiple debuggers actually make sense. In a sense being a debugger is really a very "intimate" thing for process. Do you really want to have multiple of them messing with each other? If yes how would they know what to touch and what not? The only thing I could think of was "user space virtualization" (like old UML) together with a real debugger, but frankly these solutions all seemed like big race conditions to me anyways and should be better done in the kernel or below it, so I have a hard time taking them seriously. Can you think of any scenario where multiple debuggers on a process make sense? -Andi From oleg at redhat.com Tue Jan 26 21:30:10 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 26 Jan 2010 22:30:10 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <877hr4g49l.fsf@basil.nowhere.org> Message-ID: <20100126213010.GA19146@redhat.com> On 01/26, Linus Torvalds wrote: > > The problem is that anything that is based on reparenting and signals is > fundamentally a "one parent only" kind of interface. See? Indeed. signals + do_wait() is the horrible model. > So the reason I think using an fd is a good idea is _not_ because gdb > already uses an fd internally, but because it gives you a "connection" > between the debugger and debuggee that is not fundamentally limited to a > single controller. > > (It doesn't have to be a file descriptor, of course, but could be any kind > of other model that allows multiple connections. Yes. But then we need something which represents this connection in kernel: utrace_engine. Then we need something which allows multiple tracers to cooperate. Just for example, one tracer wants to resume the tracee, another tracer wants the tracee to be stopped. Utrace does this. And, since we should preserve the current ptrace, the tracers should cooperate with ptrace too. IOW, this quickly leads to the new abstraction layer, I think. And of course it is possible to implement this new model on top of utrace. Yes, utrace itself comes with utrace_engine_ops vector to implement "whatever you like", perhaps you dislike this part. Oleg. From oleg at redhat.com Tue Jan 26 21:53:49 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Tue, 26 Jan 2010 22:53:49 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100126210232.GF6567@basil.fritz.box> References: <20100122221348.GA4263@redhat.com> <877hr4g49l.fsf@basil.nowhere.org> <20100126210232.GF6567@basil.fritz.box> Message-ID: <20100126215349.GB19146@redhat.com> On 01/26, Andi Kleen wrote: > > But when I did that i couldn't come up with a good scenario > where multiple debuggers actually make sense. In a sense > being a debugger is really a very "intimate" thing for process. Do you > really want to have multiple of them messing with each other? > > If yes how would they know what to touch and what not? Yes, multiple debuggers can confuse each other if they change the state of debuggee simultaneously. The user should do this ;) > Can you think of any scenario where multiple debuggers > on a process make sense? Simple example. Try to debug/strace strace ot gdb itself. Not trivial, you can't attach to strace's tracees. Recently I spent 2 days trying to understand why strace -f hangs. I was able to attach to strace, but I wasn't able to see what its tracees do. And, it was not possible to even trace strace until it hangs, with ptrace the tracee (strace) must stop to report the event and this shadowed the race. Oleg. From andi at firstfloor.org Tue Jan 26 22:03:01 2010 From: andi at firstfloor.org (Andi Kleen) Date: Tue, 26 Jan 2010 23:03:01 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100126215349.GB19146@redhat.com> References: <20100122221348.GA4263@redhat.com> <877hr4g49l.fsf@basil.nowhere.org> <20100126210232.GF6567@basil.fritz.box> <20100126215349.GB19146@redhat.com> Message-ID: <20100126220301.GG6567@basil.fritz.box> > Simple example. Try to debug/strace strace ot gdb itself. Not trivial, > you can't attach to strace's tracees. Recently I spent 2 days trying to > understand why strace -f hangs. I was able to attach to strace, but > I wasn't able to see what its tracees do. But what would the semantics be inside the tracees even if you could? > And, it was not possible to even trace strace until it hangs, with > ptrace the tracee (strace) must stop to report the event and this > shadowed the race. "Shadowing the race" was the second surname of strace I thought anyways @) Basically if you care about races never use strace in the first place. -Andi -- ak at linux.intel.com -- Speaking for myself only. From tromey at redhat.com Tue Jan 26 23:20:22 2010 From: tromey at redhat.com (Tom Tromey) Date: Tue, 26 Jan 2010 16:20:22 -0700 Subject: linux-next: add utrace tree In-Reply-To: (Linus Torvalds's message of "Mon, 25 Jan 2010 13:41:57 -0800 (PST)") References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: >>>>> "Linus" == Linus Torvalds writes: Tom> * Support "displaced stepping" in the kernel; I think this would improve Tom> performance when debugging in non-stop mode. Linus> Don't we already do that at least on x86? I don't know. If it does, and gdb does not yet use that, then that would be worth changing. Linus> Or maybe I'm not understanding what displaced stepping means to you. In non-stop mode (where you can stop one thread but leave the others running), gdb wants to have the breakpoints always inserted. So, something must emulate the displaced instruction. Tom From tromey at redhat.com Tue Jan 26 23:27:06 2010 From: tromey at redhat.com (Tom Tromey) Date: Tue, 26 Jan 2010 16:27:06 -0700 Subject: linux-next: add utrace tree In-Reply-To: <877hr4g49l.fsf@basil.nowhere.org> (Andi Kleen's message of "Tue, 26 Jan 2010 18:33:26 +0100") References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <877hr4g49l.fsf@basil.nowhere.org> Message-ID: Tom> * Use an fd, not SIGCHLD+wait, to report inferior state changes to gdb. Tom> Internally we're already using a self-pipe to integrate this into Tom> gdb's main loop. Relatedly, don't mess with the inferior's parentage. Andi> How would having a kernel based solution be better over your Andi> user space simulation? Signals and wait are a pain because if we want to use some random library in gdb, there might be conflicts. This is true even if we use signalfd. An fd-for-debugging does not have this problem. This matters more now that we're letting people script gdb in python. Tom From oleg at redhat.com Tue Jan 26 23:32:42 2010 From: oleg at redhat.com (Oleg Nesterov) Date: Wed, 27 Jan 2010 00:32:42 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100126220301.GG6567@basil.fritz.box> References: <877hr4g49l.fsf@basil.nowhere.org> <20100126210232.GF6567@basil.fritz.box> <20100126215349.GB19146@redhat.com> <20100126220301.GG6567@basil.fritz.box> Message-ID: <20100126233242.GA25575@redhat.com> On 01/26, Andi Kleen wrote: > > > Simple example. Try to debug/strace strace ot gdb itself. Not trivial, > > you can't attach to strace's tracees. Recently I spent 2 days trying to > > understand why strace -f hangs. I was able to attach to strace, but > > I wasn't able to see what its tracees do. > > But what would the semantics be inside the tracees even if you could? In this particular case, all I need was something like "gdb -p" to attach to the tracee, see the backtrace and detach. > > And, it was not possible to even trace strace until it hangs, with > > ptrace the tracee (strace) must stop to report the event and this > > shadowed the race. > > "Shadowing the race" was the second surname of strace I thought anyways @) > Basically if you care about races never use strace in the first place. Yes. And utrace doesn't require the tracee to be stopped to report the event ;) Yes, yes, utrace can't "fix" strace in this sense automatically, but still. Oleg. From torvalds at linux-foundation.org Tue Jan 26 23:37:49 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Tue, 26 Jan 2010 15:37:49 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: On Tue, 26 Jan 2010, Tom Tromey wrote: > > In non-stop mode (where you can stop one thread but leave the others > running), gdb wants to have the breakpoints always inserted. So, > something must emulate the displaced instruction. I'm almost totally uninterested in breakpoints that actually re-write instructions. It's impossible to do that efficiently and well, especially in threaded environments. So if you do instruction rewriting, I can only say "that's your problem". But using the hardware breakpoints should automatically DTRT, both wrt threads _and_ wrt restarting. Sure, there's onyl a limited number of them, so if somebody wants more than that they are kind of screwed, but that's just how life is. Linus From fche at redhat.com Wed Jan 27 00:38:45 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Tue, 26 Jan 2010 19:38:45 -0500 Subject: linux-next: add utrace tree In-Reply-To: (Tom Tromey's message of "Tue, 26 Jan 2010 16:20:22 -0700") References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: tromey wrote: > [...] > In non-stop mode (where you can stop one thread but leave the others > running), gdb wants to have the breakpoints always inserted. So, > something must emulate the displaced instruction. This sounds like the sort of thing that kernel kprobes do, which the uprobes patch does for userspace. The gdbstub prototype can use uprobes for such "displaced" breakpoints, and single-step-out-of-line to execute them on a few platforms like x86-*. This is already prototyped / working. (gdbstub currently restricts itself to single-threaded programs only, but that's another todo.) - FChE From peterz at infradead.org Wed Jan 27 06:52:14 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 27 Jan 2010 07:52:14 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> Message-ID: <1264575134.4283.1983.camel@laptop> On Tue, 2010-01-26 at 15:37 -0800, Linus Torvalds wrote: > > On Tue, 26 Jan 2010, Tom Tromey wrote: > > > > In non-stop mode (where you can stop one thread but leave the others > > running), gdb wants to have the breakpoints always inserted. So, > > something must emulate the displaced instruction. > > I'm almost totally uninterested in breakpoints that actually re-write > instructions. It's impossible to do that efficiently and well, especially > in threaded environments. > > So if you do instruction rewriting, I can only say "that's your problem". Right, so you're going to love uprobes, which does exactly that. The current proposal is overwriting the target instruction with an INT3 and injecting an extra vma into the target process's address space containing the original instruction(s) and possible jumps back to the old code stream. I'm all in favor of not doing that extra vma and instead use stack or TLS space, but then people complain about having to make that executable (which is something I don't really mind, x86 had executable everything for very long, and also, its only so when debugging the thing anyway). From peterz at infradead.org Wed Jan 27 06:53:45 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 27 Jan 2010 07:53:45 +0100 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <20100122072402.GA7440@in.ibm.com> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> <20100122072402.GA7440@in.ibm.com> Message-ID: <1264575225.4283.1985.camel@laptop> On Fri, 2010-01-22 at 12:54 +0530, Ananth N Mavinakayanahalli wrote: > On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote: > > Here is a summary of the Comments and actions that need to be taken for > > the current uprobes patchset. Please let me know if I missed or > > misunderstood any of your comments. > > > > 1. Uprobes depends on trap signal. > > Uprobes depends on trap signal rather than hooking to the global > > die notifier. It was suggested that we hook to the global die notifier. > > > > In the next version of patches, Uprobes will use the global die > > notifier and look at the per-task count of the probes in use to > > see if it has to be consumed. > > > > However this would reduce the ability of uprobe handlers to > > sleep. Since we are dealing with userspace, sleeping in handlers > > would have been a good feature. We are looking at ways to get > > around this limitation. > > We could set a TIF_ flag in the notifier to indicate a breakpoint hit > and process it in task context before the task heads into userspace. OK, so we can go play stack games in the INT3 interrupt handler by moving to a non IST stack when it comes from userspace, or move kprobes over to INT1 or something. From mingo at elte.hu Wed Jan 27 08:24:40 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 27 Jan 2010 09:24:40 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B56F588.2060109@redhat.com> References: <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> <4B56F588.2060109@redhat.com> Message-ID: <20100127082440.GA16640@elte.hu> * Avi Kivity wrote: > On 01/20/2010 11:57 AM, Peter Zijlstra wrote: > >On Wed, 2010-01-20 at 11:43 +0200, Avi Kivity wrote: > >> 1. Write a trace entry into shared memory, trap into the kernel on > >> overflow. > >> 2. Trap if a condition is satisfied (fast watchpoint implementation). > > > > So now you want to consume more of a process' address space to store trace > > data as well? > > Yes. I know I'm bad. No, you are just wrong. > > Not to mention that that process could wreck the trace data rendering it > > utterly unreliable. > > It could, but it also might not. Are we going to deny high performance > tracing to users just because it doesn't work in all cases? Tracing and monitoring is foremost about being able to trust the instrument, then about performance and usability. That's one of the big things about ftrace and perf. By proposing 'user space tracing' you are missing two big aspects: - That self-contained, kernel-driven tracing can be replicated in user-space. It cannot. Sharing and global state is much harder to maintain reliably, but the bigger problem is that user-space can stomp on its own tracing state and can make it unreliable. Tracing is often used to figure out bugs, and tracers will be trusted less if they can stomp on themselves. - That somehow it's much faster and that this edge matters. It isnt and it doesnt matter. The few places that need very very fast tracing wont use any of these facilities - it will use something specialized. So you are creating a solution for special cases that dont need it, and you are also ignoring prime qualities of a good tracing framework. Ingo From peterz at infradead.org Wed Jan 27 08:24:14 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 27 Jan 2010 09:24:14 +0100 Subject: [RFC] [PATCH 0/7] UBP, XOL and Uprobes [ Summary of Comments and actions to be taken ] In-Reply-To: <1264575225.4283.1985.camel@laptop> References: <20100111122521.22050.3654.sendpatchset@srikar.in.ibm.com> <20100122070232.GA2975@linux.vnet.ibm.com> <20100122072402.GA7440@in.ibm.com> <1264575225.4283.1985.camel@laptop> Message-ID: <1264580654.4283.1986.camel@laptop> On Wed, 2010-01-27 at 07:53 +0100, Peter Zijlstra wrote: > On Fri, 2010-01-22 at 12:54 +0530, Ananth N Mavinakayanahalli wrote: > > On Fri, Jan 22, 2010 at 12:32:32PM +0530, Srikar Dronamraju wrote: > > > Here is a summary of the Comments and actions that need to be taken for > > > the current uprobes patchset. Please let me know if I missed or > > > misunderstood any of your comments. > > > > > > 1. Uprobes depends on trap signal. > > > Uprobes depends on trap signal rather than hooking to the global > > > die notifier. It was suggested that we hook to the global die notifier. > > > > > > In the next version of patches, Uprobes will use the global die > > > notifier and look at the per-task count of the probes in use to > > > see if it has to be consumed. > > > > > > However this would reduce the ability of uprobe handlers to > > > sleep. Since we are dealing with userspace, sleeping in handlers > > > would have been a good feature. We are looking at ways to get > > > around this limitation. > > > > We could set a TIF_ flag in the notifier to indicate a breakpoint hit > > and process it in task context before the task heads into userspace. > > OK, so we can go play stack games in the INT3 interrupt handler by > moving to a non IST stack when it comes from userspace, or move kprobes > over to INT1 or something. Right, it just got pointed out that INT1 doesn't have a single byte encoding, only INT0 and INT3 :/ From avi at redhat.com Wed Jan 27 08:35:39 2010 From: avi at redhat.com (Avi Kivity) Date: Wed, 27 Jan 2010 10:35:39 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100127082440.GA16640@elte.hu> References: <20100118124419.GC1628@linux.vnet.ibm.com> <84144f021001180451k2a84f17x3dc24796fea986c9@mail.gmail.com> <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> <4B56F588.2060109@redhat.com> <20100127082440.GA16640@elte.hu> Message-ID: <4B5FFADB.5090209@redhat.com> On 01/27/2010 10:24 AM, Ingo Molnar wrote: > > >>> Not to mention that that process could wreck the trace data rendering it >>> utterly unreliable. >>> >> It could, but it also might not. Are we going to deny high performance >> tracing to users just because it doesn't work in all cases? >> > Tracing and monitoring is foremost about being able to trust the instrument, > then about performance and usability. That's one of the big things about > ftrace and perf. > > By proposing 'user space tracing' you are missing two big aspects: > > - That self-contained, kernel-driven tracing can be replicated in user-space. > It cannot. Sharing and global state is much harder to maintain reliably, > but the bigger problem is that user-space can stomp on its own tracing > state and can make it unreliable. Tracing is often used to figure out bugs, > and tracers will be trusted less if they can stomp on themselves. > > - That somehow it's much faster and that this edge matters. It isnt and it > doesnt matter. The few places that need very very fast tracing wont use any > of these facilities - it will use something specialized. > > So you are creating a solution for special cases that dont need it, and you > are also ignoring prime qualities of a good tracing framework. > I see it exactly the opposite. Only a very small minority of cases will have such severe memory corruption that tracing will fall apart because of random writes to memory; especially on 64-bit where the address space is sparse. On the other hand, knowing that the cost is a few dozen cycles rather than a thousand or so means that you can trace production servers running full loads without worrying about whether tracing will affect whatever it is you're trying to observe. I'm not against slow reliable tracing, but we shouldn't ignore the need for speed. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. From mingo at elte.hu Wed Jan 27 08:54:42 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 27 Jan 2010 09:54:42 +0100 Subject: linux-next: add utrace tree In-Reply-To: <1264575134.4283.1983.camel@laptop> References: <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> Message-ID: <20100127085442.GA28422@elte.hu> * Peter Zijlstra wrote: > On Tue, 2010-01-26 at 15:37 -0800, Linus Torvalds wrote: > > > > On Tue, 26 Jan 2010, Tom Tromey wrote: > > > > > > In non-stop mode (where you can stop one thread but leave the others > > > running), gdb wants to have the breakpoints always inserted. So, > > > something must emulate the displaced instruction. > > > > I'm almost totally uninterested in breakpoints that actually re-write > > instructions. It's impossible to do that efficiently and well, especially > > in threaded environments. > > > > So if you do instruction rewriting, I can only say "that's your problem". > > Right, so you're going to love uprobes, which does exactly that. The current > proposal is overwriting the target instruction with an INT3 and injecting an > extra vma into the target process's address space containing the original > instruction(s) and possible jumps back to the old code stream. > > I'm all in favor of not doing that extra vma and instead use stack or TLS > space, but then people complain about having to make that executable (which > is something I don't really mind, x86 had executable everything for very > long, and also, its only so when debugging the thing anyway). I think the best solution for user probes (by far) is to use a simplified in-kernel instruction emulator for the few common probes instruction. (Kprobes already partially decodes x86 instructions to make it safe to apply accelerated probes and there's other decoding logic in the kernel too.) The design and practical advantages are numerous: - People want to probe their function prologues most of the time ... a single INT3 there will in most cases just hit the initial stack allocation and that's it. We could get quite good coverage (and very fast emulation) for the common case in not too much code - and much of that code we already have available. No re-trapping, no extra instruction patching and complex maintenance of trampolines. - It's as transparent as it gets - no user-space trampoline or other visible state that modifies behavior or can be stomped upon by user-space bugs. - Lightweight and simple probe insertion: no weird setup sequence needing the stopping of all tasks to install the trampoline. We just add the INT3 and off you go. - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on task local state. - The points we can probe are never truly limited as it's all freely upscalable: if you cannot probe an instruction you want to probe today, extend the emulator. Deny the rest. _All_ versions of uprobes code i've seen so far already restricts the probe-compatible instruction set: RIP-relative instructions are excluded on 64-bit for example. - Emulation has the _least_ semantical side effects as we really execute 'that' instruction - not some other instruction put elsewhere into a special vma or into the process/thread stack, or some special in-kernel trampoline, etc. - Emulation can be very fast for the common case as well. Nobody will probe weird, complex instructions. They will use 'perf probe' to insert probes into their functions 90% of the time ... - FPU and complex ops and pagefault emulation is not really what i'd expect to be necessary for simple probing - but it _can_ be added by people who care about it, if they so wish. Such a scheme would be _far_ more preferable form a maintenance POV as well, as the initial code will be small, and we can extend it gradually. All the other proposals are complex 'all or nothing' schemes with no flexibility for complexity at all. Thanks, Ingo From mingo at elte.hu Wed Jan 27 09:08:24 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 27 Jan 2010 10:08:24 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B5FFADB.5090209@redhat.com> References: <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> <4B56F588.2060109@redhat.com> <20100127082440.GA16640@elte.hu> <4B5FFADB.5090209@redhat.com> Message-ID: <20100127090824.GA23570@elte.hu> * Avi Kivity wrote: > On 01/27/2010 10:24 AM, Ingo Molnar wrote: > > > > > >>>Not to mention that that process could wreck the trace data rendering it > >>>utterly unreliable. > >>It could, but it also might not. Are we going to deny high performance > >>tracing to users just because it doesn't work in all cases? > >Tracing and monitoring is foremost about being able to trust the instrument, > >then about performance and usability. That's one of the big things about > >ftrace and perf. > > > >By proposing 'user space tracing' you are missing two big aspects: > > > > - That self-contained, kernel-driven tracing can be replicated in user-space. > > It cannot. Sharing and global state is much harder to maintain reliably, > > but the bigger problem is that user-space can stomp on its own tracing > > state and can make it unreliable. Tracing is often used to figure out bugs, > > and tracers will be trusted less if they can stomp on themselves. > > > > - That somehow it's much faster and that this edge matters. It isnt and it > > doesnt matter. The few places that need very very fast tracing wont use any > > of these facilities - it will use something specialized. > > > >So you are creating a solution for special cases that dont need it, and you > >are also ignoring prime qualities of a good tracing framework. > > I see it exactly the opposite. Only a very small minority of cases will > have such severe memory corruption that tracing will fall apart because of > random writes to memory; especially on 64-bit where the address space is > sparse. On the other hand, knowing that the cost is a few dozen cycles > rather than a thousand or so means that you can trace production servers > running full loads without worrying about whether tracing will affect > whatever it is you're trying to observe. > > I'm not against slow reliable tracing, but we shouldn't ignore the need for > speed. I havent seen a conscise summary of your points in this thread, so let me summarize it as i've understood them (hopefully not putting words into your mouth): AFAICS you are arguing for some crazy fragile architecture-specific solution that traps INT3 into ring3 just to shave off a few cycles, and then use user-space state to trace into. If so then you ignore the obvious solution to _that_ problem: dont use INT3 at all, but rebuild (or re-JIT) your program with explicit callbacks. It's _MUCH_ faster than _any_ breakpoint based solution - literally just the cost of a function call (or not even that - i've written very fast inlined tracers - they do rock when it comes to performance). Problem solved and none of the INT3 details matters at all. INT3 only matters to _transparent_ probing, and for that, the cost of INT3 is almost _by definition_ less important than the fact that we can do transparent tracing. If performance were the overriding issue they'd use dedicated callbacks - and the INT3 technique wouldnt matter at all. ( Also, just like we were able to extend the kprobes code with more and more optimizations, the same can be done with any user-space probing as well, to make it faster. But at the core of it has to be a sane design that is transparent and controlled by the kernel, so that it has the option to apply more and more otimizations - yours isnt such and its limitations are designed-in. Which is neither smart nor useful. ) Ingo From avi at redhat.com Wed Jan 27 09:25:15 2010 From: avi at redhat.com (Avi Kivity) Date: Wed, 27 Jan 2010 11:25:15 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100127090824.GA23570@elte.hu> References: <4B5459CA.9060603@redhat.com> <4B545ACF.40203@cs.helsinki.fi> <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> <4B56F588.2060109@redhat.com> <20100127082440.GA16640@elte.hu> <4B5FFADB.5090209@redhat.com> <20100127090824.GA23570@elte.hu> Message-ID: <4B60067B.4060708@redhat.com> On 01/27/2010 11:08 AM, Ingo Molnar wrote: > >> I see it exactly the opposite. Only a very small minority of cases will >> have such severe memory corruption that tracing will fall apart because of >> random writes to memory; especially on 64-bit where the address space is >> sparse. On the other hand, knowing that the cost is a few dozen cycles >> rather than a thousand or so means that you can trace production servers >> running full loads without worrying about whether tracing will affect >> whatever it is you're trying to observe. >> >> I'm not against slow reliable tracing, but we shouldn't ignore the need for >> speed. >> > I havent seen a conscise summary of your points in this thread, so let me > summarize it as i've understood them (hopefully not putting words into your > mouth): AFAICS you are arguing for some crazy fragile architecture-specific > solution that traps INT3 into ring3 just to shave off a few cycles, and then > use user-space state to trace into. > That's a good summary, except for the words "crazy fragile", "trap INT3 into ring3" and "a few cycles". Instead of using int 3, put a jump instruction in the program. This shaves a lot more than a few cycles. > If so then you ignore the obvious solution to _that_ problem: dont use INT3 at > all, but rebuild (or re-JIT) your program with explicit callbacks. It's _MUCH_ > faster than _any_ breakpoint based solution - literally just the cost of a > function call (or not even that - i've written very fast inlined tracers - > they do rock when it comes to performance). Problem solved and none of the > INT3 details matters at all. > However did I not think of that? Yes, and let's rip off kprobes tracing from the kernel, we can always rebuild it. Well, I'm observing an issue in a production system now. I may not want to take it down, or if I take it down I may not be able to observe it again as the problem takes a couple of days to show up, or I may not have the full source, or it takes 10 minutes to build and so an iterative edit/build/run cycle can stretch for hours. Adding a vma to a running program is very unlikely to affect it. If the program makes random accesses to memory, it will likely segfault very quickly before we ever get to trace it. > INT3 only matters to _transparent_ probing, and for that, the cost of INT3 is > almost _by definition_ less important than the fact that we can do transparent > tracing. If performance were the overriding issue they'd use dedicated > callbacks - and the INT3 technique wouldnt matter at all. > INT3 isn't transparent. The only thing that comes close to full transparency is hardware breakpoints. So we have a tradeoff between transparency and speed, and except for the wierdest bugs, this level of transparency won't be needed. > ( Also, just like we were able to extend the kprobes code with more and more > optimizations, the same can be done with any user-space probing as well, to > make it faster. But at the core of it has to be a sane design that is > transparent and controlled by the kernel, so that it has the option to apply > more and more otimizations - yours isnt such and its limitations are > designed-in. No design is fully transparent, and I don't see why my design can't be controlled by the kernel? > Which is neither smart nor useful. ) > This style of arguing is neither smart or useful as well. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. From mingo at elte.hu Wed Jan 27 10:23:11 2010 From: mingo at elte.hu (Ingo Molnar) Date: Wed, 27 Jan 2010 11:23:11 +0100 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <4B60067B.4060708@redhat.com> References: <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> <4B56F588.2060109@redhat.com> <20100127082440.GA16640@elte.hu> <4B5FFADB.5090209@redhat.com> <20100127090824.GA23570@elte.hu> <4B60067B.4060708@redhat.com> Message-ID: <20100127102311.GA973@elte.hu> * Avi Kivity wrote: > > If so then you ignore the obvious solution to _that_ problem: dont use > > INT3 at all, but rebuild (or re-JIT) your program with explicit callbacks. > > It's _MUCH_ faster than _any_ breakpoint based solution - literally just > > the cost of a function call (or not even that - i've written very fast > > inlined tracers - they do rock when it comes to performance). Problem > > solved and none of the INT3 details matters at all. > > However did I not think of that? Yes, and let's rip off kprobes tracing > from the kernel, we can always rebuild it. > > Well, I'm observing an issue in a production system now. I may not want to > take it down, or if I take it down I may not be able to observe it again as > the problem takes a couple of days to show up, or I may not have the full > source, or it takes 10 minutes to build and so an iterative edit/build/run > cycle can stretch for hours. You have somewhat misconstrued my argument. What i said above is that _if_ you need extreme levels of performance you always have the option to go even faster via specialized tracing solutions. I did not promote it as a replacement solution. Specialization obviously brings in a new set of problems: infexibility and non-transparency, an example of what you gave above. Your proposed solution brings in precisely such kinds of issues, on a different level, just to improve performance at the cost of transparency and at the cost of features and robustness. It's btw rather ironic as your arguments are somewhat similar to the Xen vs. KVM argument just turned around: KVM started out slower by relying on hardware implementation for virtualization while Xen relied on a clever but limiting hack. With each CPU generation the hardware got faster, while the various design limitations of Xen are hurting it and KVM is winning that race. A (partially) similar situation exists here: INT3 into ring 0 and handling it there in a protected environment might be more expensive, but _if_ it matters to performance it sure could be made faster in hardware (and in fact it will become faster with every new generation of hardware). Both Peter and me are telling you that we are considering your solution too specialized, at the cost of flexibility, features and robustness. Thanks, Ingo From torvalds at linux-foundation.org Wed Jan 27 10:43:39 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Wed, 27 Jan 2010 02:43:39 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <1264575134.4283.1983.camel@laptop> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> Message-ID: On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > Right, so you're going to love uprobes, which does exactly that. The > current proposal is overwriting the target instruction with an INT3 and > injecting an extra vma into the target process's address space > containing the original instruction(s) and possible jumps back to the > old code stream. Just out of interest, how does it handle the threading issue? Last I saw, at least some CPU people were _very_ nervous about overwriting instructions if another CPU might be just about to execute them. Even the "overwrite only the first byte with 'int3'" made them go "umm, I need to talk to some core CPU people to see if that's ok". They mumble about possible CPU errata, I$ coherency, instruction retry etc. I realize kprobes does this very thing, but kprobes is esoteric stuff and doesn't have much choice. In user space, you _could_ do the modification on a different physical page and then just switch the page table entry instead, and not get into the whole D$/I$ coherency thing at all. Linus From peterz at infradead.org Wed Jan 27 10:55:16 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 27 Jan 2010 11:55:16 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> Message-ID: <1264589716.4283.2006.camel@laptop> On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > > > Right, so you're going to love uprobes, which does exactly that. The > > current proposal is overwriting the target instruction with an INT3 and > > injecting an extra vma into the target process's address space > > containing the original instruction(s) and possible jumps back to the > > old code stream. > > Just out of interest, how does it handle the threading issue? > > Last I saw, at least some CPU people were _very_ nervous about overwriting > instructions if another CPU might be just about to execute them. > > Even the "overwrite only the first byte with 'int3'" made them go "umm, I > need to talk to some core CPU people to see if that's ok". They mumble > about possible CPU errata, I$ coherency, instruction retry etc. > > I realize kprobes does this very thing, but kprobes is esoteric stuff and > doesn't have much choice. In user space, you _could_ do the modification > on a different physical page and then just switch the page table entry > instead, and not get into the whole D$/I$ coherency thing at all. Right, so there's two aspects: 1) concurrency when inserting the probe 2) concurrency when hitting the probe 1) used to be dealt with by using utrace to stop all threads in the process and then writing the instruction. I suggested to CoW the page, modify the instruction, set the pagetable and flush tlbs at full speed -- the very thing you suggest here. 2) so traditionally (and the intel arch manual describes this) is to replace the instruction, single step it, and write the probe back. This is racy for multi-threading. The current uprobes stuff solves this by doing single-step-out-of-line (XOL). XOL injects a new vma into the target process and puts the old instruction there, then it single steps on the new location, leaving the original site with INT3. This doesn't work for things like RIP relative instructions, so uprobes considers them un-probable. Also, I myself really object to inserting a vma in a running process, its like a land-lord, sure he has the key but he won't come in an poke through your things. The alternative is to place the instruction in TLS or stack space, since each thread can only have a single trap at a time, you only need space for 1 instruction (plus a possible jump out to the original site). There is the 'problem' of marking the TLS/stack executable when being probed. Then there is the whole emulation angle, the uprobes people basically say its too much effort to write a x86 emulator. From peterz at infradead.org Wed Jan 27 10:58:06 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 27 Jan 2010 11:58:06 +0100 Subject: linux-next: add utrace tree In-Reply-To: <1264589716.4283.2006.camel@laptop> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> Message-ID: <1264589886.4283.2008.camel@laptop> On Wed, 2010-01-27 at 11:55 +0100, Peter Zijlstra wrote: > Right, so there's two aspects: > > 1) concurrency when inserting the probe > 2) concurrency when hitting the probe > > 1) used to be dealt with by using utrace to stop all threads in the > process and then writing the instruction. I suggested to CoW the page, > modify the instruction, set the pagetable and flush tlbs at full speed > -- the very thing you suggest here. Also, since executable maps are typically MAP_PRIVATE, you have to CoW anyway in order to modify it and I would exclude MAP_SHARED from being probable because then the modification could seep through into whatever was backing that thing. From torvalds at linux-foundation.org Wed Jan 27 11:04:58 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Wed, 27 Jan 2010 03:04:58 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <1264589716.4283.2006.camel@laptop> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> Message-ID: On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > Right, so there's two aspects: > > 1) concurrency when inserting the probe That's the one I worried about. Stopping all threads will fix it, obviously at a disastrous performance cost, but what do I care? As noted, there are ways to do it safely with TLB switching, so it's fixable. > 2) concurrency when hitting the probe Yeah, I didn't worry about this part, since the only solution is the out-of-line one, and I don't much care how the memory gets allocated for it. Inserting a whole new vma seems pretty drastic, but compared to stopping all threads, it's a small thing. Linus From ananth at in.ibm.com Wed Jan 27 11:05:55 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Wed, 27 Jan 2010 16:35:55 +0530 Subject: linux-next: add utrace tree In-Reply-To: <1264589716.4283.2006.camel@laptop> References: <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> Message-ID: <20100127110555.GB1842@in.ibm.com> On Wed, Jan 27, 2010 at 11:55:16AM +0100, Peter Zijlstra wrote: > On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: > > > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > > > > > Right, so you're going to love uprobes, which does exactly that. The > > > current proposal is overwriting the target instruction with an INT3 and > > > injecting an extra vma into the target process's address space > > > containing the original instruction(s) and possible jumps back to the > > > old code stream. > > > > Just out of interest, how does it handle the threading issue? > > > > Last I saw, at least some CPU people were _very_ nervous about overwriting > > instructions if another CPU might be just about to execute them. > > > > Even the "overwrite only the first byte with 'int3'" made them go "umm, I > > need to talk to some core CPU people to see if that's ok". They mumble > > about possible CPU errata, I$ coherency, instruction retry etc. > > > > I realize kprobes does this very thing, but kprobes is esoteric stuff and > > doesn't have much choice. In user space, you _could_ do the modification > > on a different physical page and then just switch the page table entry > > instead, and not get into the whole D$/I$ coherency thing at all. > > Right, so there's two aspects: > > 1) concurrency when inserting the probe > 2) concurrency when hitting the probe > > 1) used to be dealt with by using utrace to stop all threads in the > process and then writing the instruction. I suggested to CoW the page, > modify the instruction, set the pagetable and flush tlbs at full speed > -- the very thing you suggest here. > > 2) so traditionally (and the intel arch manual describes this) is to > replace the instruction, single step it, and write the probe back. This > is racy for multi-threading. The current uprobes stuff solves this by > doing single-step-out-of-line (XOL). > > XOL injects a new vma into the target process and puts the old > instruction there, then it single steps on the new location, leaving the > original site with INT3. > > This doesn't work for things like RIP relative instructions, so uprobes > considers them un-probable. Probing RIP-relative instructions work just fine; there are fixups that take care of it. > Also, I myself really object to inserting a vma in a running process, > its like a land-lord, sure he has the key but he won't come in an poke > through your things. > > The alternative is to place the instruction in TLS or stack space, since > each thread can only have a single trap at a time, you only need space > for 1 instruction (plus a possible jump out to the original site). There > is the 'problem' of marking the TLS/stack executable when being probed. > > Then there is the whole emulation angle, the uprobes people basically > say its too much effort to write a x86 emulator. We don't need to write one. I don't know how easy it is to make the kvm emulator less kvm-centric (vcpus, kvm_context, etc). Avi? Ananth From srikar at linux.vnet.ibm.com Wed Jan 27 11:07:22 2010 From: srikar at linux.vnet.ibm.com (Srikar Dronamraju) Date: Wed, 27 Jan 2010 16:37:22 +0530 Subject: linux-next: add utrace tree In-Reply-To: References: <1264575134.4283.1983.camel@laptop> Message-ID: <20100127110722.GA28678@linux.vnet.ibm.com> * Linus Torvalds [2010-01-27 02:43:39]: > > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > > > Right, so you're going to love uprobes, which does exactly that. The > > current proposal is overwriting the target instruction with an INT3 and > > injecting an extra vma into the target process's address space > > containing the original instruction(s) and possible jumps back to the > > old code stream. > > Just out of interest, how does it handle the threading issue? I am not sure why threading would be an issue with XOL. Since all threads of a process would have access to the XOL VMA. i.e This XOL VMA is a per-process VMA that gets attached to the process address space only when we hit the first breakpoint. We reserve a slot for each breakpoint in the XOL VMA, whenever the trap is hit, we jump to the corresponding slot, single step and jump back after necessary fix-ups. We have been able to use this approach in multithreaded applications. However if you see any issues, can you please let us know? > > Last I saw, at least some CPU people were _very_ nervous about overwriting > instructions if another CPU might be just about to execute them. > > Even the "overwrite only the first byte with 'int3'" made them go "umm, I > need to talk to some core CPU people to see if that's ok". They mumble > about possible CPU errata, I$ coherency, instruction retry etc. Thats exactly why we waited for threads to queisce before inserting and deleting the breakpoints. However we were advised by lkml that there are better ways to insert/delete breakpoints without quiescing by adjusting the page table entries similar to what you said just below. And we are working on switching the page table entry solution. > > I realize kprobes does this very thing, but kprobes is esoteric stuff and > doesn't have much choice. In user space, you _could_ do the modification > on a different physical page and then just switch the page table entry > instead, and not get into the whole D$/I$ coherency thing at all. > > Linus > -- Thanks and Regards Srikar From peterz at infradead.org Wed Jan 27 11:08:31 2010 From: peterz at infradead.org (Peter Zijlstra) Date: Wed, 27 Jan 2010 12:08:31 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100127110555.GB1842@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> <20100127110555.GB1842@in.ibm.com> Message-ID: <1264590511.4283.2009.camel@laptop> On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote: > Probing RIP-relative instructions work just fine; there are fixups that > take care of it. Ah my bad then, it was my understanding you simply bailed on those. Just for my information, how large are the replacement sequences? From ananth at in.ibm.com Wed Jan 27 11:20:55 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Wed, 27 Jan 2010 16:50:55 +0530 Subject: linux-next: add utrace tree In-Reply-To: <1264590511.4283.2009.camel@laptop> References: <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> <20100127110555.GB1842@in.ibm.com> <1264590511.4283.2009.camel@laptop> Message-ID: <20100127112055.GA14289@in.ibm.com> On Wed, Jan 27, 2010 at 12:08:31PM +0100, Peter Zijlstra wrote: > On Wed, 2010-01-27 at 16:35 +0530, Ananth N Mavinakayanahalli wrote: > > Probing RIP-relative instructions work just fine; there are fixups that > > take care of it. > > Ah my bad then, it was my understanding you simply bailed on those. > > Just for my information, how large are the replacement sequences? The RIP relative instruction is transformed into indirect addressing mode using a scratch register. For details http://marc.info/?l=linux-kernel&m=126401936114639&w=2. Ananth From rostedt at goodmis.org Wed Jan 27 13:59:52 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 27 Jan 2010 08:59:52 -0500 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> Message-ID: <1264600792.31321.464.camel@gandalf.stny.rr.com> [ Added Arjan ] On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > > > Right, so you're going to love uprobes, which does exactly that. The > > current proposal is overwriting the target instruction with an INT3 and > > injecting an extra vma into the target process's address space > > containing the original instruction(s) and possible jumps back to the > > old code stream. > > Just out of interest, how does it handle the threading issue? > > Last I saw, at least some CPU people were _very_ nervous about overwriting > instructions if another CPU might be just about to execute them. I think the issue was that ring 0 was never meant to do that, where as, ring 3 does it all the time. Doesn't the dynamic library modify its text? -- Steve > > Even the "overwrite only the first byte with 'int3'" made them go "umm, I > need to talk to some core CPU people to see if that's ok". They mumble > about possible CPU errata, I$ coherency, instruction retry etc. > > I realize kprobes does this very thing, but kprobes is esoteric stuff and > doesn't have much choice. In user space, you _could_ do the modification > on a different physical page and then just switch the page table entry > instead, and not get into the whole D$/I$ coherency thing at all. > > Linus From fweisbec at gmail.com Wed Jan 27 16:01:06 2010 From: fweisbec at gmail.com (Frederic Weisbecker) Date: Wed, 27 Jan 2010 17:01:06 +0100 Subject: linux-next: add utrace tree In-Reply-To: References: <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> Message-ID: <20100127160103.GB22447@nowhere> On Wed, Jan 27, 2010 at 03:04:58AM -0800, Linus Torvalds wrote: > > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: > > > > Right, so there's two aspects: > > > > 1) concurrency when inserting the probe > > That's the one I worried about. Stopping all threads will fix it, > obviously at a disastrous performance cost, but what do I care? As noted, > there are ways to do it safely with TLB switching, so it's fixable. That said, inserting a probe is supposed to be a pretty rare operation, stopping all threads in a process shouldn't be painful for this aspect. From hpa at zytor.com Wed Jan 27 17:42:50 2010 From: hpa at zytor.com (H. Peter Anvin) Date: Wed, 27 Jan 2010 09:42:50 -0800 Subject: linux-next: add utrace tree In-Reply-To: <1264600792.31321.464.camel@gandalf.stny.rr.com> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> <1264600792.31321.464.camel@gandalf.stny.rr.com> Message-ID: <4B607B1A.3080007@zytor.com> On 01/27/2010 05:59 AM, Steven Rostedt wrote: > [ Added Arjan ] > > On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote: >> >> On Wed, 27 Jan 2010, Peter Zijlstra wrote: >>> >>> Right, so you're going to love uprobes, which does exactly that. The >>> current proposal is overwriting the target instruction with an INT3 and >>> injecting an extra vma into the target process's address space >>> containing the original instruction(s) and possible jumps back to the >>> old code stream. >> >> Just out of interest, how does it handle the threading issue? >> >> Last I saw, at least some CPU people were _very_ nervous about overwriting >> instructions if another CPU might be just about to execute them. > > I think the issue was that ring 0 was never meant to do that, where as, > ring 3 does it all the time. Doesn't the dynamic library modify its > text? > No, it has nothing to do with ring. It has to do with modifying code that another CPU could be executing at the same time, and with modifying code on the same processor through another virtual alias (they are different issues.) The same issues apply regardless of the CPL of the processor. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. From rostedt at goodmis.org Wed Jan 27 18:53:19 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 27 Jan 2010 13:53:19 -0500 Subject: linux-next: add utrace tree In-Reply-To: <4B607B1A.3080007@zytor.com> References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> <1264600792.31321.464.camel@gandalf.stny.rr.com> <4B607B1A.3080007@zytor.com> Message-ID: <1264618399.31321.470.camel@gandalf.stny.rr.com> On Wed, 2010-01-27 at 09:42 -0800, H. Peter Anvin wrote: > On 01/27/2010 05:59 AM, Steven Rostedt wrote: > > I think the issue was that ring 0 was never meant to do that, where as, > > ring 3 does it all the time. Doesn't the dynamic library modify its > > text? > > > > No, it has nothing to do with ring. It has to do with modifying code > that another CPU could be executing at the same time, and with modifying > code on the same processor through another virtual alias (they are > different issues.) The same issues apply regardless of the CPL of the > processor. Thanks for clarifying. -- Steve From hpa at zytor.com Wed Jan 27 19:18:54 2010 From: hpa at zytor.com (H. Peter Anvin) Date: Wed, 27 Jan 2010 11:18:54 -0800 Subject: linux-next: add utrace tree In-Reply-To: References: <20100121013822.28781960.sfr@canb.auug.org.au> <20100122005147.GD22003@redhat.com> <20100121170541.7425ff10.akpm@linux-foundation.org> <20100122182827.GA13185@redhat.com> <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> Message-ID: <4B60919E.1020900@zytor.com> On 01/27/2010 02:43 AM, Linus Torvalds wrote: > > > On Wed, 27 Jan 2010, Peter Zijlstra wrote: >> >> Right, so you're going to love uprobes, which does exactly that. The >> current proposal is overwriting the target instruction with an INT3 and >> injecting an extra vma into the target process's address space >> containing the original instruction(s) and possible jumps back to the >> old code stream. > > Just out of interest, how does it handle the threading issue? > > Last I saw, at least some CPU people were _very_ nervous about overwriting > instructions if another CPU might be just about to execute them. > > Even the "overwrite only the first byte with 'int3'" made them go "umm, I > need to talk to some core CPU people to see if that's ok". They mumble > about possible CPU errata, I$ coherency, instruction retry etc. > We actually went through a review of that here at Intel. We do not yet have an *official* answer (in order for us to have that we have to have it approved by the architecture committee and published in the SDM), but to the best of our current knowledge (and I'm allowed to say this) the int3 method followed by global IPIs should be safe for modifying *one (atomic) instruction*. This is a specific case of a more general rule, but I don't want to disclose the whole rule until it has been officially approved. > I realize kprobes does this very thing, but kprobes is esoteric stuff and > doesn't have much choice. In user space, you _could_ do the modification > on a different physical page and then just switch the page table entry > instead, and not get into the whole D$/I$ coherency thing at all. On the more general rule of interpretation: I'm really concerned about having a bunch of partially-capable x86 interpreters all over the kernel. x86 is *hard* to emulate, and it will only get harder as the architecture evolves. -hpa From mldireto at tudoemoferta.com.br Wed Jan 27 12:19:52 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Wed, 27 Jan 2010 10:19:52 -0200 Subject: Volta as aulas digital TudoemOferta Message-ID: <13c448c145ea0ca4f107ede000141fa5@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From jkenisto at us.ibm.com Thu Jan 28 01:52:19 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Wed, 27 Jan 2010 17:52:19 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100127085442.GA28422@elte.hu> References: <20100122221348.GA4263@redhat.com> <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> Message-ID: <1264643539.5068.62.camel@localhost.localdomain> On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote: ... > I think the best solution for user probes (by far) is to use a simplified > in-kernel instruction emulator for the few common probes instruction. (Kprobes > already partially decodes x86 instructions to make it safe to apply > accelerated probes and there's other decoding logic in the kernel too.) > > The design and practical advantages are numerous: > > - People want to probe their function prologues most of the time ... > a single INT3 there will in most cases just hit the initial stack > allocation and that's it. Yes, emulating "push %ebp" would buy us a lot of coverage for a lot of apps on x86 (but see below**). Even there, though, we'd have to address the page fault we'd occasionally get when extending the stack vma. > We could get quite good coverage (and very fast > emulation) for the common case in not too much code - and much of that code > we already have available. No re-trapping, As previously discussed, boosting would also get rid of the single-step trap for most instructions. > no extra instruction patching x86_64 rip-relative instructions are the only ones we alter. > and complex maintenance of trampolines. > > - It's as transparent as it gets - no user-space trampoline or other visible > state that modifies behavior or can be stomped upon by user-space bugs. The XOL vma isn't writable from user space, so I can't think of how it could be clobbered merely by a stray memory reference. Yes, it's a vma that the unprobed app would never have; and yes, a malicious app or kernel module could remove it or alter the protection and scribble on it. We don't try to defend the app against such malicious attacks, but we do our best to ensure that the kernel side handles such attacks gracefully. > > - Lightweight and simple probe insertion: no weird setup sequence needing the > stopping of all tasks to install the trampoline. We just add the INT3 and > off you go. FWIW, we don't stop all threads to set up or extend the XOL vma, which is typically a one-time event. We just grab a mutex, in case multiple threads hit previously-unhit probepoints simultaneously, and simultaneously decide that the XOL area needs to be created or extended. > > - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on > task local state. The posted uprobes implementation is, so far as we can tell through code inspection and testing, also thread-safe and SMP-safe. > > - The points we can probe are never truly limited as it's all freely > upscalable: if you cannot probe an instruction you want to probe today, > extend the emulator. I don't see how ripping out existing support for almost* the entire instruction set, and then putting it back instruction by instruction, patch by patch, is a win. Even if we add emulation, it seems sensible to keep the XOL approach as a backup to handle instructions that aren't yet emulated (and architectures that don't yet have emulators). That way, if you don't probe any unemulated instructions, the XOL vma is never created. > Deny the rest. _All_ versions of uprobes code i've > seen so far already restricts the probe-compatible instruction set: *Yes, we currently decline to probe some instructions that look troublesome and we haven't taken the time to test. These include things like privileged instructions, int*, in*/out*, and instructions that fuss with the segment registers. We've never actually seen such instructions in user apps. > RIP-relative instructions are excluded on 64-bit for example. No. As discussed in previous posts, we handle rip-relative instructions. > > - Emulation has the _least_ semantical side effects as we really execute > 'that' instruction - It seems to me that emulation is the only approach that DOESN'T execute the probed instruction. > not some other instruction put elsewhere into a > special vma or into the process/thread stack, or some special in-kernel > trampoline, etc. > > - Emulation can be very fast for the common case as well. Nobody will probe > weird, complex instructions. They will use 'perf probe' to insert probes > into their functions 90% of the time ... > > - FPU and complex ops and pagefault emulation is not really what i'd expect > to be necessary for simple probing - but it _can_ be added by people who > care about it, if they so wish. **In practice, we've had to probe all sorts of instructions, including FP instructions -- especially where you want to exploit the debug info to get the names, types, and locations of variables and args. For some compilers and architectures, the debug info isn't reliable until the end of the function prologue, at which point you could find any old instruction. Ditto if you want to probe statements within a function. > > Such a scheme would be _far_ more preferable form a maintenance POV as well, > as the initial code will be small, and we can extend it gradually. All the > other proposals are complex 'all or nothing' schemes with no flexibility for > complexity at all. > > Thanks, > > Ingo Thanks. Jim From bilgi at mbpazarlama.net Wed Jan 27 08:51:47 2010 From: bilgi at mbpazarlama.net (BÃYÃK KONYA FÄ°RMA REHBERÄ°) Date: Wed, 27 Jan 2010 03:51:47 -0500 (EST) Subject: KONYA'NIN EN BÃYÃK FÄ°RMA REHBERÄ°NE KAYIT OLUN Message-ID: <20100127085147.97DFB7E133@bekir.amber4u.com> An HTML attachment was scrubbed... URL: From mingo at elte.hu Thu Jan 28 08:55:02 2010 From: mingo at elte.hu (Ingo Molnar) Date: Thu, 28 Jan 2010 09:55:02 +0100 Subject: linux-next: add utrace tree In-Reply-To: <1264643539.5068.62.camel@localhost.localdomain> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> Message-ID: <20100128085502.GA7713@elte.hu> * Jim Keniston wrote: > On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote: > ... > > I think the best solution for user probes (by far) is to use a simplified > > in-kernel instruction emulator for the few common probes instruction. (Kprobes > > already partially decodes x86 instructions to make it safe to apply > > accelerated probes and there's other decoding logic in the kernel too.) > > > > The design and practical advantages are numerous: > > > > - People want to probe their function prologues most of the time ... > > a single INT3 there will in most cases just hit the initial stack > > allocation and that's it. > > Yes, emulating "push %ebp" would buy us a lot of coverage for a lot of apps > on x86 (but see below**). [...] Coverage in practice is all that matters. Consider the fact that i get 1000 times more bugreports aided by strace, which has 1000 times more overhead than even the slowest of uprobes approaches. This simple fact tell us that while performance matters, it is of little use if good utility and a clean design is not there. (in fact sane and clean design will almost automatically result in good performance too down the line, but i digress.) Faster crap is still crap. > [...] Even there, though, we'd have to address the page fault we'd > occasionally get when extending the stack vma. Nope, in the simplest model not even page fault emulation is needed, get_user()/put_user() would resolve it automatically. If you either get the value with the pagefault resolved, or you get a -EFAULT. If you concentrate only on the common case then emulation can be _really_ simple. Lets compare the two cases via a drawing. Your current uprobes submission does: [kernel] do probe thing single-step trap ^ | ^ | | v | v [user] INT3 XOL-ins next ins-stream ( add the need for serialization to make sure the whole single-step thing does not get out of sync with reality. ) And emulator approach would do: [kernel] emul-demux-fastpath, do probe thing ^ | | v [user] INT3 next ins-stream far simpler conceptually, and faster as well, because it's one kernel entry. Generally i get nervous if a piece of instrumentation cannot be expressed in simple ways. _Especially_ if i consider it to concentrate on all the wrong things and doesnt even break even with a far less complex scheme. What would be the 'right things' to concentrate on? Make sure it's all all around end-to-end package that is _useful to people_. As of today i have yet to get a _single_ bugreport or kernel improvement requested by an application writer who found out about the inefficiencies in his app using uprobes. There is a gaping hole of utility here, a whole cathedral of tools written that just a handful of ordinary Linux person uses. There's big disconnect and i can say one thing for sure: needless complexity in the wrong places can outright stiffle tools from becoming good. > > We could get quite good coverage (and very fast > > emulation) for the common case in not too much code - and much of that code > > we already have available. No re-trapping, > > As previously discussed, boosting would also get rid of the single-step trap > for most instructions. Boosting is not in the uprobes patch-set you submitted. Even with it present it wont get rid of the initial INT3. So basically _best-case_ (with boosting) XOL-uprobes could roughly break even with a pure emulator approach ... That's a big and fundamental difference. > > no extra instruction patching > > x86_64 rip-relative instructions are the only ones we alter. > > > and complex maintenance of trampolines. > > > > - It's as transparent as it gets - no user-space trampoline or other visible > > state that modifies behavior or can be stomped upon by user-space bugs. > > The XOL vma isn't writable from user space, so I can't think of how it could > be clobbered merely by a stray memory reference. [...] Well there must be some purpose to the instrumentation, there must be some way to save data, right? If yes and it's in user-space, that data is clobberable. If it's in kernel-space then we have to enter the kernel anyway (with similar cost patterns to an INT3 entry) - so we just delayed the kernel entry. So IMHO you have designed in considerable complexity for little immediate benefit. > [...] Yes, it's a vma that the unprobed app would never have; and yes, a > malicious app or kernel module could remove it or alter the protection and > scribble on it. We don't try to defend the app against such malicious > attacks, but we do our best to ensure that the kernel side handles such > attacks gracefully. > > > - Lightweight and simple probe insertion: no weird setup sequence needing the > > stopping of all tasks to install the trampoline. We just add the INT3 and > > off you go. > > FWIW, we don't stop all threads to set up or extend the XOL vma, which is > typically a one-time event. We just grab a mutex, in case multiple threads > hit previously-unhit probepoints simultaneously, and simultaneously decide > that the XOL area needs to be created or extended. Still it's more complex than purely local state. Plus slower than even a naive emulator approach would be able to achieve, due to single-stepping. > > - Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on > > task local state. > > The posted uprobes implementation is, so far as we can tell through code > inspection and testing, also thread-safe and SMP-safe. > > > > > - The points we can probe are never truly limited as it's all freely > > upscalable: if you cannot probe an instruction you want to probe today, > > extend the emulator. > > I don't see how ripping out existing support for almost* the entire > instruction set, and then putting it back instruction by instruction, patch > by patch, is a win. IMO it's a win because it's more controlled in what we can and cannot do safely, and because it's more transparent to the probed context. But by far the most important aspect is that it should be far less code with far less complexity, and hence much more graceful from an upstream POV. Gradual concepts with easy ways forwards/backwards are good. All-or-nothing frameworks are bad. > Even if we add emulation, it seems sensible to keep the XOL approach as a > backup to handle instructions that aren't yet emulated (and architectures > that don't yet have emulators). That way, if you don't probe any unemulated > instructions, the XOL vma is never created. To turn the argument around: an in-kernel emulator is an all-around facility to make sure we probe safely and securely, _and_ it is also more portable because it's simpler (because more gradual) to implement on a new architecture as you dont actually have to copy around instructions (and make sure they work in that new place), but have to emulate a limited subset of the instruction space, on purely local state. There are far less things that can go wrong in such a model. > > Deny the rest. _All_ versions of uprobes code i've > > seen so far already restricts the probe-compatible instruction set: > > *Yes, we currently decline to probe some instructions that look troublesome > and we haven't taken the time to test. These include things like privileged > instructions, int*, in*/out*, and instructions that fuss with the segment > registers. We've never actually seen such instructions in user apps. > > > > RIP-relative instructions are excluded on 64-bit for example. > > No. As discussed in previous posts, we handle rip-relative > instructions. > > > > > - Emulation has the _least_ semantical side effects as we really execute > > 'that' instruction - > > It seems to me that emulation is the only approach that DOESN'T execute the > probed instruction. None of the approaches executes _that_ instruction in _that_ place - the instruction is either replaced by an INT3 or by a jump-to-trampoline instruction. They may execute the same instruction but in another place. With an emulator (assuming the emulator is correct) we can execute the precise semantics of that instruction in that place - without any side-effects from trampolining/replacement. > > not some other instruction put elsewhere into a > > special vma or into the process/thread stack, or some special in-kernel > > trampoline, etc. > > > > - Emulation can be very fast for the common case as well. Nobody will probe > > weird, complex instructions. They will use 'perf probe' to insert probes > > into their functions 90% of the time ... > > > > - FPU and complex ops and pagefault emulation is not really what i'd expect > > to be necessary for simple probing - but it _can_ be added by people who > > care about it, if they so wish. > > **In practice, we've had to probe all sorts of instructions, including FP > instructions -- especially where you want to exploit the debug info to get > the names, types, and locations of variables and args. For some compilers > and architectures, the debug info isn't reliable until the end of the > function prologue, at which point you could find any old instruction. Ditto > if you want to probe statements within a function. For those cases, frankly, the right approach is to fix the debug info (or introduce a new one) and forget the old crap. You treat debuginfo as some god-given property, while it's one of the suckiest aspects of all of Linux. But we've had that discussion months (and years) ago. It has improved in gcc 4.5 so there's some hope. > > Such a scheme would be _far_ more preferable form a maintenance POV as > > well, as the initial code will be small, and we can extend it gradually. > > All the other proposals are complex 'all or nothing' schemes with no > > flexibility for complexity at all. I repeat this point. To be able to scale in and out of a design is rather important, and i dont see that with the current XOL proposal. Ingo From envoi at bdop89.info Thu Jan 28 09:29:27 2010 From: envoi at bdop89.info (PORTAGEO) Date: Thu, 28 Jan 2010 11:29:27 +0200 Subject: =?UTF-8?Q?Travaillez_en_toute_libert=C3=A9_avec_le_portage_salarial?= Message-ID: <3d13adee8509de4e1195c7ff0f04c0ef@direct-service.co.cc> An HTML attachment was scrubbed... URL: From envio at newslettersmtp4.com.br Wed Jan 27 17:30:32 2010 From: envio at newslettersmtp4.com.br (Ideas & Solutions - Portugal) Date: Wed, 27 Jan 2010 12:30:32 -0500 Subject: =?iso-8859-1?Q?LEMBRE-SE, _SE_QUER_A_ATEN=C7=C3O_DOS_CLIENTES, _TEM_QUE_CON?= =?iso-8859-1?Q?QUIST=C1-LA_!!!?= Message-ID: Para visualizar esta mensagem, use um programa de e-mail compativel com html! -------------- next part -------------- An HTML attachment was scrubbed... URL: From benh at kernel.crashing.org Thu Jan 28 23:53:06 2010 From: benh at kernel.crashing.org (Benjamin Herrenschmidt) Date: Fri, 29 Jan 2010 10:53:06 +1100 Subject: linux-next: add utrace tree In-Reply-To: References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> Message-ID: <1264722786.20211.15.camel@pasglop> On Mon, 2010-01-25 at 08:52 -0800, Linus Torvalds wrote: > > That said, I also suspect that people should still look seriously at > simply just improving ptrace. For example, I suspect that the biggest > problem with ptrace is really just the signalling, and that creating a > new > extension for JUST THAT, and then having a model where you can choose > - at > PTRACE_ATTACH time - how to wait for events would be a good thing. like returning a fd to poll() on ? :-) Cheers, Ben. From torvalds at linux-foundation.org Fri Jan 29 00:21:56 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Thu, 28 Jan 2010 16:21:56 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <1264722786.20211.15.camel@pasglop> References: <20100122200129.GG22003@redhat.com> <20100122221348.GA4263@redhat.com> <20100123112333.GA15455@elte.hu> <20100123114729.GA7828@redhat.com> <20100123194820.GM21263@thunk.org> <1264722786.20211.15.camel@pasglop> Message-ID: On Fri, 29 Jan 2010, Benjamin Herrenschmidt wrote: > > like returning a fd to poll() on ? :-) Well, there's the possibility of async polling (rather than the synchronous wait that ptrace forces now), but there are other advantages to having a "connection" model - like not having to look up the child process every time like ptrace does now. Although 'find_task_by_vpid()' is probably cheap enough that nobody really cares. We do a fair job at those hash tables. Linus From jkenisto at us.ibm.com Fri Jan 29 00:59:28 2010 From: jkenisto at us.ibm.com (Jim Keniston) Date: Thu, 28 Jan 2010 16:59:28 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100128085502.GA7713@elte.hu> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> Message-ID: <1264726768.4933.50.camel@localhost.localdomain> On Thu, 2010-01-28 at 09:55 +0100, Ingo Molnar wrote: > * Jim Keniston wrote: > > > On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote: > > ... > > > > Yes, emulating "push %ebp" would buy us a lot of coverage for a lot of apps > > on x86 (but see below**). [...] > ... > > > [...] Even there, though, we'd have to address the page fault we'd > > occasionally get when extending the stack vma. > > Nope, in the simplest model not even page fault emulation is needed, > get_user()/put_user() would resolve it automatically. If you either get the > value with the pagefault resolved, or you get a -EFAULT. get_user()/put_user() have to be done in a context where you can sleep, right? Uprobes currently operates in such contexts, but there's some talk of moving it all to a DIE_INT3 notifier context, where it can't sleep. ... > > > > We could get quite good coverage (and very fast > > > emulation) for the common case in not too much code - and much of that code > > > we already have available. No re-trapping, > > > > As previously discussed, boosting would also get rid of the single-step trap > > for most instructions. > > Boosting is not in the uprobes patch-set you submitted. Even with it present > it wont get rid of the initial INT3. So basically _best-case_ (with boosting) > XOL-uprobes could roughly break even with a pure emulator approach ... > > That's a big and fundamental difference. To be fair, wrt uprobes, emulation and boosting are both in the same state: pretty well understood, but not yet implemented. ... > > > > > > - It's as transparent as it gets - no user-space trampoline or other visible > > > state that modifies behavior or can be stomped upon by user-space bugs. > > > > The XOL vma isn't writable from user space, so I can't think of how it could > > be clobbered merely by a stray memory reference. [...] > > Well there must be some purpose to the instrumentation, there must be some way > to save data, right? If yes and it's in user-space, that data is clobberable. One or two others have advocated an approach (which eliminates the breakpoint trap) where trace data is stored in the uprobe vma, but I haven't. (In such a case, "XOL vma" would be a misnomer.) I agree that in such a scenario, the uprobe vma would of necessity be writable by the app. > > If it's in kernel-space then we have to enter the kernel anyway (with similar > cost patterns to an INT3 entry) - so we just delayed the kernel entry. This seems to presume that you have to extract trace data from the kernel every time a probe is hit. In actual practice, you're often just checking for unusual arg values, incrementing a counter, or some such. > ... > > Even if we add emulation, it seems sensible to keep the XOL approach as a > > backup to handle instructions that aren't yet emulated (and architectures > > that don't yet have emulators). That way, if you don't probe any unemulated > > instructions, the XOL vma is never created. > > To turn the argument around: an in-kernel emulator is an all-around facility > to make sure we probe safely and securely, _and_ it is also more portable > because it's simpler (because more gradual) to implement on a new architecture > as you dont actually have to copy around instructions (and make sure they work > in that new place), but have to emulate a limited subset of the instruction > space, on purely local state. I understand the desire to start small and simple and grow gradually from there. We thought we were doing that. Single-stepping out of line has been in use for close to a decade, maybe more; and boosting (in kprobes) has been around for a few years as well. To the *probes folks, it feels pretty solid. > ... > > With an emulator (assuming the emulator is correct) we can execute the precise > semantics of that instruction in that place - without any side-effects from > trampolining/replacement. And of course, our view has been that the best way to achieve the effect of the instruction, including all desired side-effects, is to execute the instruction on the CPU. ... > > > > **In practice, we've had to probe all sorts of instructions, including FP > > instructions -- especially where you want to exploit the debug info to get > > the names, types, and locations of variables and args. For some compilers > > and architectures, the debug info isn't reliable until the end of the > > function prologue, at which point you could find any old instruction. Ditto > > if you want to probe statements within a function. > > For those cases, frankly, the right approach is to fix the debug info (or > introduce a new one) and forget the old crap. > > You treat debuginfo as some god-given property, while it's one of the suckiest > aspects of all of Linux. But we've had that discussion months (and years) ago. > It has improved in gcc 4.5 so there's some hope. Yes, there seems to be considerable movement toward better debug info -- which could make statement probing (and not just function-boundary probing) more and more feasible. > ... > Ingo Thanks. Jim From ananth at in.ibm.com Fri Jan 29 04:55:46 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 29 Jan 2010 10:25:46 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100128085502.GA7713@elte.hu> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> Message-ID: <20100129045546.GA16920@in.ibm.com> On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: ... > Lets compare the two cases via a drawing. Your current uprobes submission > does: > > [kernel] do probe thing single-step trap > ^ | ^ | > | v | v > [user] INT3 XOL-ins next ins-stream > > ( add the need for serialization to make sure the whole single-step thing > does not get out of sync with reality. ) > > And emulator approach would do: > > [kernel] emul-demux-fastpath, do probe thing > ^ | > | v > [user] INT3 next ins-stream > > far simpler conceptually, and faster as well, because it's one kernel entry. Ingo, Yes, conceptually, emulation is simpler. In fact, it may even be the right thing to do from a housekeeping POV if gdb were enabled to use breakpoint assistance in the kernel. However... emulation is not easy. Just quoting Peter Anvin: > On the more general rule of interpretation: I'm really concerned about > having a bunch of partially-capable x86 interpreters all over the > kernel. x86 is *hard* to emulate, and it will only get harder as the > architecture evolves. > > -hpa Yes, I know you suggested we start with a small subset. We already have an implementation of instruction emulation in kernel for x86 and powerpc, but its too KVM centric. If there is a generic emulation layer, we would use it. There are conflicting opinions for either case; complicated as it is, the XOL scheme works and, to a large extent, it is easily extendable to other architectures compared to the emulation approach. Uprobes can be made to use emulation when possible/available, but I don't think this should be gating decision for the initial implementation of the feature. Ananth From mingo at elte.hu Fri Jan 29 07:39:07 2010 From: mingo at elte.hu (Ingo Molnar) Date: Fri, 29 Jan 2010 08:39:07 +0100 Subject: linux-next: add utrace tree In-Reply-To: <1264726768.4933.50.camel@localhost.localdomain> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> Message-ID: <20100129073907.GF14636@elte.hu> * Jim Keniston wrote: > > > As previously discussed, boosting would also get rid of the single-step > > > trap for most instructions. > > > > Boosting is not in the uprobes patch-set you submitted. Even with it > > present it wont get rid of the initial INT3. So basically _best-case_ > > (with boosting) XOL-uprobes could roughly break even with a pure emulator > > approach ... > > > > That's a big and fundamental difference. > > To be fair, wrt uprobes, emulation and boosting are both in the same state: > pretty well understood, but not yet implemented. So, to sum it up: utrace XOL, which is rather complex already, needs even more complexity (which is not yet implemented) than the much simpler common-case emulator approach i outlined, just to break even with the performance of the much simpler approach. And you've been justifying the complexity of XOL with its performance advantages. See why i'm unimpressed by that argument? [ Note, i'm not dismissing it entirely, the complexity of XOL _might_ be fine in the future if it brings us real advantages: for example if it avoids _ALL_ kernel entries. That can be done too, by using the jump-probe technique in user-space. (the closest anyone came to proposing this was Avi with the user-space INT3 hack - but we can do better than that via the jprobes technique.) At that point the advantage of having a pure user-space callback technique combined with the advantages of having near full instruction coverage might tip the balance. There are other complexities to handle in that case though, like buffering and more. ] But right now we are nowhere near that stage, and i dont see the path towards that either. So i'd much rather see something simpler and get on with these IMHO unimportant performance details to the IMO much more important high level interface and high level tooling details. When we merged kprobes ~10 years ago we made the (rather bad) mistake of merging a raw, opaque facility and leaving 'the rest' up to some other entity. IBM kprobes hackers vanished the day the original kprobes code went upstream and the high level entity never truly materialized in-kernel, for nearly a decade! With uprobes we should learn from that painful lesson and bring in the high level users of uprobes via 'perf probe' (or any other real user) straight away. Complexity is easy to increase when usage is increasing, it's near impossible to reduce when usage is not there. (and it's rather hard to reduce even with increasing usage - especially of aspects of the complexity leak out to user-space ABIs - which danger XOL has written all over it.) So the request is simple to sum up: please reduce complexity of the initial submission and increase all around utility. Thanks, Ingo From mingo at elte.hu Fri Jan 29 07:42:41 2010 From: mingo at elte.hu (Ingo Molnar) Date: Fri, 29 Jan 2010 08:42:41 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100129045546.GA16920@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <20100129045546.GA16920@in.ibm.com> Message-ID: <20100129074241.GG14636@elte.hu> * Ananth N Mavinakayanahalli wrote: > On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: > > ... > > > Lets compare the two cases via a drawing. Your current uprobes submission > > does: > > > > [kernel] do probe thing single-step trap > > ^ | ^ | > > | v | v > > [user] INT3 XOL-ins next ins-stream > > > > ( add the need for serialization to make sure the whole single-step thing > > does not get out of sync with reality. ) > > > > And emulator approach would do: > > > > [kernel] emul-demux-fastpath, do probe thing > > ^ | > > | v > > [user] INT3 next ins-stream > > > > far simpler conceptually, and faster as well, because it's one kernel entry. > > Ingo, > > Yes, conceptually, emulation is simpler. In fact, it may even be the > right thing to do from a housekeeping POV if gdb were enabled to use > breakpoint assistance in the kernel. However... emulation is not > easy. Just quoting Peter Anvin: > > > On the more general rule of interpretation: I'm really concerned about > > having a bunch of partially-capable x86 interpreters all over the > > kernel. x86 is *hard* to emulate, and it will only get harder as the > > architecture evolves. > > > > -hpa This is obviously true for a full emulator. Except for the fact that: > Yes, I know you suggested we start with a small subset. and for the fact that we already have emulators in the kernel. Plus we _already_ need to decode instructions for safe kprobing and have the code for that upstream. So it's not like we can avoid decoding the instructions. (and emulating certain instruction patterns is really just a natural next step of a good decoder.) > We already have an implementation of instruction emulation in kernel for x86 > and powerpc, but its too KVM centric. If there is a generic emulation layer, > we would use it. So this approach, beyond being simpler, more robust and faster than the current XOL code, would also trigger (much needed) cleanups in other parts of the kernel and would share code with other kernel subsystems. Dont you see the obvious advantages of that? Ingo From ananth at in.ibm.com Fri Jan 29 07:52:40 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 29 Jan 2010 13:22:40 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100129073907.GF14636@elte.hu> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> Message-ID: <20100129075240.GF16920@in.ibm.com> On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: ... > When we merged kprobes ~10 years ago we made the (rather bad) mistake of > merging a raw, opaque facility and leaving 'the rest' up to some other entity. > IBM kprobes hackers vanished the day the original kprobes code went upstream > and the high level entity never truly materialized in-kernel, for nearly a > decade! I don't know what you are referring to here... Kprobes was merged in 2.6.9 (~August 2004 -- less than 6 years ago). Since then, we did work on ports to powerpc and s390. We implemented kretprobes. We made it much scalable using RCU; we did the powerpc booster to skip single-step when possible, not to mention various bug fixes over the years. Yes, we did not do the perf integration, but perf did not exist then, either. Its simply wrong to say people 'vanished'. Thanks, Ananth From ananth at in.ibm.com Fri Jan 29 07:55:29 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 29 Jan 2010 13:25:29 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100129075240.GF16920@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> <20100129075240.GF16920@in.ibm.com> Message-ID: <20100129075529.GG16920@in.ibm.com> On Fri, Jan 29, 2010 at 01:22:40PM +0530, Ananth N Mavinakayanahalli wrote: > On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: > > ... > > > When we merged kprobes ~10 years ago we made the (rather bad) mistake of > > merging a raw, opaque facility and leaving 'the rest' up to some other entity. > > IBM kprobes hackers vanished the day the original kprobes code went upstream > > and the high level entity never truly materialized in-kernel, for nearly a > > decade! > > I don't know what you are referring to here... Kprobes was merged in > 2.6.9 (~August 2004 -- less than 6 years ago). Since then, we did work > on ports to powerpc and s390. We implemented kretprobes. We made it much > scalable using RCU; we did the powerpc booster to skip single-step when > possible, not to mention various bug fixes over the years. > > Yes, we did not do the perf integration, but perf did not exist then, either. > > Its simply wrong to say people 'vanished'. Oh, and the x86 instruction decoder was initially implemented by us. Masami has done a great job making it more complete. Ananth From mingo at elte.hu Fri Jan 29 09:11:16 2010 From: mingo at elte.hu (Ingo Molnar) Date: Fri, 29 Jan 2010 10:11:16 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100129075240.GF16920@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> <20100129075240.GF16920@in.ibm.com> Message-ID: <20100129091116.GB10878@elte.hu> * Ananth N Mavinakayanahalli wrote: > On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: > > ... > > > When we merged kprobes ~10 years ago we made the (rather bad) mistake of > > merging a raw, opaque facility and leaving 'the rest' up to some other entity. > > IBM kprobes hackers vanished the day the original kprobes code went upstream > > and the high level entity never truly materialized in-kernel, for nearly a > > decade! > > I don't know what you are referring to here... Kprobes was merged in 2.6.9 > (~August 2004 -- less than 6 years ago). [...] Ok, 6 years then :-) > [...] Since then, we did work on ports to powerpc and s390. We implemented > kretprobes. We made it much scalable using RCU; we did the powerpc booster > to skip single-step when possible, not to mention various bug fixes over the > years. Except it had no real in-kernel user. > Yes, we did not do the perf integration, but perf did not exist then, > either. > > Its simply wrong to say people 'vanished'. It has certainly was a bit stale for years - and with no real users that's certainly not a surprise. That has changed recently so i'm not complaining. We just dont want to repeat the same mistake with uprobes. Ingo From mingo at elte.hu Fri Jan 29 09:16:46 2010 From: mingo at elte.hu (Ingo Molnar) Date: Fri, 29 Jan 2010 10:16:46 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100129075529.GG16920@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> <20100129075240.GF16920@in.ibm.com> <20100129075529.GG16920@in.ibm.com> Message-ID: <20100129091646.GC10878@elte.hu> * Ananth N Mavinakayanahalli wrote: > On Fri, Jan 29, 2010 at 01:22:40PM +0530, Ananth N Mavinakayanahalli wrote: > > On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: > > > > ... > > > > > When we merged kprobes ~10 years ago we made the (rather bad) mistake of > > > merging a raw, opaque facility and leaving 'the rest' up to some other entity. > > > IBM kprobes hackers vanished the day the original kprobes code went upstream > > > and the high level entity never truly materialized in-kernel, for nearly a > > > decade! > > > > I don't know what you are referring to here... Kprobes was merged in > > 2.6.9 (~August 2004 -- less than 6 years ago). Since then, we did work > > on ports to powerpc and s390. We implemented kretprobes. We made it much > > scalable using RCU; we did the powerpc booster to skip single-step when > > possible, not to mention various bug fixes over the years. > > > > Yes, we did not do the perf integration, but perf did not exist then, either. > > > > Its simply wrong to say people 'vanished'. > > Oh, and the x86 instruction decoder was initially implemented by us. Which he implemented at my suggestion, to make it safe and robust for 'perf probe' even if debuginfo for some reason gives us the wrong address and we try to insert a probe where we shouldnt. > Masami has done a great job making it more complete. Absolutely and certainly so! I'm not talking about the present - i'm happy about where kprobes is going currently, and the new jump-probes optimizations look promising too. I just see uprobes repeating some of the mistakes of early kprobes, and i want us to learn from that experience. In my experience real usage and good integration is the key to that, and we can skip those lost 5 years. Ingo From ananth at in.ibm.com Fri Jan 29 09:31:36 2010 From: ananth at in.ibm.com (Ananth N Mavinakayanahalli) Date: Fri, 29 Jan 2010 15:01:36 +0530 Subject: linux-next: add utrace tree In-Reply-To: <20100129091116.GB10878@elte.hu> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> <20100129075240.GF16920@in.ibm.com> <20100129091116.GB10878@elte.hu> Message-ID: <20100129093136.GH16920@in.ibm.com> On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote: > > * Ananth N Mavinakayanahalli wrote: > > > On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: > > > > ... > > > > > When we merged kprobes ~10 years ago we made the (rather bad) mistake of > > > merging a raw, opaque facility and leaving 'the rest' up to some other entity. > > > IBM kprobes hackers vanished the day the original kprobes code went upstream > > > and the high level entity never truly materialized in-kernel, for nearly a > > > decade! > > > > I don't know what you are referring to here... Kprobes was merged in 2.6.9 > > (~August 2004 -- less than 6 years ago). [...] > > Ok, 6 years then :-) > > > [...] Since then, we did work on ports to powerpc and s390. We implemented > > kretprobes. We made it much scalable using RCU; we did the powerpc booster > > to skip single-step when possible, not to mention various bug fixes over the > > years. > > Except it had no real in-kernel user. Not that I want to rebut you Ingo, but there were in-kernel users since 2006 (net/ipv4/tcp_probe.c) :-) Aside, I am also glad that we have more flexibility with the perf integration. Ananth From mingo at elte.hu Fri Jan 29 09:51:52 2010 From: mingo at elte.hu (Ingo Molnar) Date: Fri, 29 Jan 2010 10:51:52 +0100 Subject: linux-next: add utrace tree In-Reply-To: <20100129093136.GH16920@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> <20100129075240.GF16920@in.ibm.com> <20100129091116.GB10878@elte.hu> <20100129093136.GH16920@in.ibm.com> Message-ID: <20100129095152.GA360@elte.hu> * Ananth N Mavinakayanahalli wrote: > On Fri, Jan 29, 2010 at 10:11:16AM +0100, Ingo Molnar wrote: > > > > * Ananth N Mavinakayanahalli wrote: > > > > > On Fri, Jan 29, 2010 at 08:39:07AM +0100, Ingo Molnar wrote: > > > > > > ... > > > > > > > When we merged kprobes ~10 years ago we made the (rather bad) mistake of > > > > merging a raw, opaque facility and leaving 'the rest' up to some other entity. > > > > IBM kprobes hackers vanished the day the original kprobes code went upstream > > > > and the high level entity never truly materialized in-kernel, for nearly a > > > > decade! > > > > > > I don't know what you are referring to here... Kprobes was merged in 2.6.9 > > > (~August 2004 -- less than 6 years ago). [...] > > > > Ok, 6 years then :-) > > > > > [...] Since then, we did work on ports to powerpc and s390. We implemented > > > kretprobes. We made it much scalable using RCU; we did the powerpc booster > > > to skip single-step when possible, not to mention various bug fixes over the > > > years. > > > > Except it had no real in-kernel user. > > Not that I want to rebut you Ingo, but there were in-kernel users since 2006 > (net/ipv4/tcp_probe.c) :-) i said 'real' users. That usage in tcp_probe.c was (and is) really minimal and never expanded really. > Aside, I am also glad that we have more flexibility with the perf > integration. ok, good :) Ingo From festivaltermometro at divulgacaodigital.com Fri Jan 29 14:45:45 2010 From: festivaltermometro at divulgacaodigital.com (=?UTF-8?Q?Festival_Term=C3=B3metro?=) Date: Fri, 29 Jan 2010 14:45:45 +0000 Subject: =?UTF-8?Q?Final_do_Festival_Term=C3=B3metro_-_O_Convite?= Message-ID: Se n?o visualizares esta p?gina correctamente, clica aqui Adiciona-nos ? tua safe-list, para garantir que recebes sempre a info dos nossos eventos. **** Ol?, No pr?ximo S?bado, dia 30 de Janeiro do prof?cuo ano de 2010, acontece a final do Festival Term?metro no Lx Factory em Lisboa e gostar?amos de contar com a presen?a de todos, sobretudo da pessoa que est? a ler o presente par?grafo, justamente esta, e que pelos vistos - quase apostamos - ir? ler o segundo. N?o t?nhamos dito? Pois adivinhamos e j? que est?o no segundo par?grafo, saibam pois que esta ser? uma final diferente de todas as outras . E porqu?? Porque ? uma edi??o comemorativa dos 15 anos do festival, porque pela primeira vez ter? em competi??o bandas de outros pa?ses , do estrangeiro - sem ser da Am?rica por Deus! - e, como convidados especiais: Manuel Cruz, Samuel ?ria e B Fachada. Tr?s dos mais geniais int?rpretes dos dias de hoje, 3 vozes, que pela primeira vez se decidiram juntar, para tocar temas uns dos outros, emprestando a sua voz e talento a cada um deles, e tornando um dia absolutamente normal ? ningu?m dava nada pelo 30 de Janeiro at? hoje - em absolutamente in?dito, que muito nos honra de ter provocado. E assim, para que possam ver isto, para que se inscrevam tamb?m na hist?ria e n?o se sintam ?historio-exclu?dos?, devem desde logo fazer a vossa reserva no site oficial do festival: www.termometro-online.com . Um site muito bonito, com cores garridas e onde podem saber mais sobre cada uma das bandas presentes, as fotos de cada uma das eliminat?rias e um sem n?mero de outras coisas que agora n?o nos lembramos. O que lembramos sim, ? que ? este S?bado, ? a partir das 23 horas, ? no lx factory e tem transmiss?o em directo na Speaky.Tv . ? a final do Festival Term?metro e este ? o convite especial, que ? extens?vel a todos os que no vosso entender v?o gostar de uma coisa destas. E mesmo os que n?o v?o gostar caramba. Divulguem pois esta mensagem por todos os vossos contactos. O nosso sentido e antecipado agradecimento por isso. J? agora podem igualmente fazer gratuitamente o download do novo n?mero da revista 365 aqui: www.revista365.com . Do sempre vosso Fernando Alvim (Director do Festival Term?metro) - ? favor divulgar - Para RE-ENVIAR / To FORWARD - http://www.divulgacaodigital.com/festival_termometro/?p=forward&uid=8796d6f78d5efbb8958965a0e70ab9c8&mid=13 Para REMOVER / To REMOVE - http://www.divulgacaodigital.com/festival_termometro/?p=unsubscribe&uid=8796d6f78d5efbb8958965a0e70ab9c8 Para MODIFICAR / To MODIFY - http://www.divulgacaodigital.com/festival_termometro/?p=preferences&uid=8796d6f78d5efbb8958965a0e70ab9c8 -- Powered by PHPlist, www.phplist.com -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: powerphplist.png Type: image/png Size: 2408 bytes Desc: not available URL: From fche at redhat.com Fri Jan 29 18:13:47 2010 From: fche at redhat.com (Frank Ch. Eigler) Date: Fri, 29 Jan 2010 13:13:47 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100129073907.GF14636@elte.hu> (Ingo Molnar's message of "Fri, 29 Jan 2010 08:39:07 +0100") References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <1264726768.4933.50.camel@localhost.localdomain> <20100129073907.GF14636@elte.hu> Message-ID: Ingo Molnar writes: > [...] So, to sum it up: utrace XOL, which is rather complex already, > needs even more complexity (which is not yet implemented) than the > much simpler common-case emulator approach i outlined, just to break > even with the performance of the much simpler approach. [...] Is it an uncontroversial claim that emulation of CISC instructions should perform better than their native execution, followed by an int3 (as in the simplest working scheme) or boosting (as done by kprobes)? >From my experience with simulators, "simple" software emulation of cpus can be hundreds of times slower or worse than native execution. - FChE From siege at prenumerata.pl Sat Jan 30 00:50:26 2010 From: siege at prenumerata.pl (Buy Viagra on www.ke19.com) Date: Sat, 30 Jan 2010 01:50:26 +0100 Subject: piles coppe rises primi tives Message-ID: <4B6380EB.9050905@prenumerata.pl> piles bioty pic crapp er finge rs incur ring viola tion votiv ely chair women anent inste ad foxtr ots busk luxes conne ctive s redia lled bushb aby ditch es polio virus dener vate kilob its scumb led abacu s colli de flatt ener vapor isabl es debug ger conne ctive s yack quart ers horse radis h opaqu eness demor alise rs foots loggi ng illum inate s disre alize spoki ng calip h twent ies polla iuolo decei tfull y bonfi glio sneez ing epide miolo gy gance bowld er bally hooey conne ctive s abacu s cloud scape flowe ring overt rump hinti ng parte rre turme ric allev iator erysi peloi d conde nsing scrut ator zooph ilous moorh en suber ose preax ially wheat en harbo urles s chron ogram s rearr angin g disbe lief stags stork s wader gigav olt socia lists ascen dant evapo rated weede d hemli ne busk music water color ed chara cterf ul stags chlor otic audil e From erg at yunz.com Sat Jan 30 05:04:20 2010 From: erg at yunz.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sat, 30 Jan 2010 05:04:20 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVsserXvLuvtcTAzbavtqi27rncwO3Ptc2z?= Message-ID: <201001300503.o0U53uHE008417@mx1.redhat.com> utrace-devel???????????????? ?????2010? 3?25-26? ?? ?????2010? 3?27-28? ?? ???????????????.????????.??????.?????? ?????2600?/?(????????????????????) ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comrom nbu at wfgu.com Sat Jan 30 17:05:01 2010 From: nbu at wfgu.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sun, 31 Jan 2010 01:05:01 +0800 Subject: =?GB2312?B?VDI6dXRyYWNlLWRldmVs1sbU7NK1z9a0+rLWtKK53MDt0+vO78HPxeTLzQ==?= Message-ID: <201001301704.o0UH4v9U028321@mx1.redhat.com> utrace-devel?????????????????????? ?????2010?3?20-21? ?? ?????2010?3?26-27? ?? ?????2010?4?1-2? ?? ?????????????????????????????????????? ?????2800? /????????????????????? ????????ERP???????????/ ???????????/ ?????????????/????? ??????/?????????/??????????????/?????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comll???JIT??????????? l??????????? l??????????????????? ?????????????????RDC???????????????? 2??JIT???????????? l????????? l???????????? lrom rostedt at goodmis.org Sat Jan 30 17:49:10 2010 From: rostedt at goodmis.org (Steven Rostedt) Date: Sat, 30 Jan 2010 12:49:10 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100129074241.GG14636@elte.hu> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <20100129045546.GA16920@in.ibm.com> <20100129074241.GG14636@elte.hu> Message-ID: <1264873751.4561.9155.camel@frodo> On Fri, 2010-01-29 at 08:42 +0100, Ingo Molnar wrote: > * Ananth N Mavinakayanahalli wrote: > > > On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: > > > > ... > > > > > Lets compare the two cases via a drawing. Your current uprobes submission > > > does: > > > > > > [kernel] do probe thing single-step trap > > > ^ | ^ | > > > | v | v > > > [user] INT3 XOL-ins next ins-stream > > > > > > ( add the need for serialization to make sure the whole single-step thing > > > does not get out of sync with reality. ) > > > > > > And emulator approach would do: > > > > > > [kernel] emul-demux-fastpath, do probe thing > > > ^ | > > > | v > > > [user] INT3 next ins-stream > > > > > > far simpler conceptually, and faster as well, because it's one kernel entry. > > > > Ingo, > > > > Yes, conceptually, emulation is simpler. In fact, it may even be the > > right thing to do from a housekeeping POV if gdb were enabled to use > > breakpoint assistance in the kernel. However... emulation is not > > easy. Just quoting Peter Anvin: > > > > > On the more general rule of interpretation: I'm really concerned about > > > having a bunch of partially-capable x86 interpreters all over the > > > kernel. x86 is *hard* to emulate, and it will only get harder as the > > > architecture evolves. > > > > > > -hpa > > This is obviously true for a full emulator. Except for the fact that: > > > Yes, I know you suggested we start with a small subset. > > and for the fact that we already have emulators in the kernel. But this would be emulating userspace instructions, correct? The kernel is limited to what instructions it can perform, no floating point for example (of course there are some exceptions). But generally, the instructions in the kernel should be easier to emulate than in userspace. Userspace is free to do any wacky thing it wants. Will this limit the ability to probe apps that take advantage of some strange op code that the user knows is available on their platform? -- Steve > > Plus we _already_ need to decode instructions for safe kprobing and have the > code for that upstream. So it's not like we can avoid decoding the > instructions. (and emulating certain instruction patterns is really just a > natural next step of a good decoder.) From torvalds at linux-foundation.org Sat Jan 30 17:59:28 2010 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Sat, 30 Jan 2010 09:59:28 -0800 (PST) Subject: linux-next: add utrace tree In-Reply-To: <1264873751.4561.9155.camel@frodo> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <20100129045546.GA16920@in.ibm.com> <20100129074241.GG14636@elte.hu> <1264873751.4561.9155.camel@frodo> Message-ID: On Sat, 30 Jan 2010, Steven Rostedt wrote: > > The kernel is limited to what instructions it can perform, no floating > point for example (of course there are some exceptions). Actually, the reason the kernel is limited to not performing floating point instructions is that teh kernel doesn't own the floating point register set - it's too big to save/restore, so the kernel leaves it alone. But for emulating an instruction from user space, it would be perfectly fine to do an FP instruction in kernel space, since we're explicitly doing it on behalf of user space, and with user space owning it. Of course, that would require that we _only_ touch the registers that user space wants us to touch, which is likely impossible in practice for anything but an execute-out-of-line model. > But generally, the instructions in the kernel should be easier to > emulate than in userspace. Yeah, we control the kernel instructions better, and we know what the environment is. For example, we never have to worry about vm86 mode or segments when we fix up kernel instructions, but user space can do anything, of course. Linus From reineouedraogo at voila.fr Sun Jan 31 01:56:26 2010 From: reineouedraogo at voila.fr (reineouedraogo at voila.fr) Date: 30 Jan 2010 20:56:26 -0500 Subject: A Friend Sent You How You Can Help Wounded Veterans From RD.com Message-ID: <29438838.1264902986258.JavaMail.SYSTEM@kodg384059> An HTML attachment was scrubbed... URL: From tussis at yorklawyers.com Sun Jan 31 03:42:57 2010 From: tussis at yorklawyers.com (Cheap Cialis Super Active Plus on www.ma93.com) Date: Sat, 30 Jan 2010 19:42:57 -0800 Subject: detes tatio n cartr idges misti gris Message-ID: <4B64FB1E.5010101@yorklawyers.com> forum spile d oilma n theom ancy unpat roniz ed outhu mours luxur iate crumb led condi gn regat ta spher oidis es snugg ery synch ronis m retin ize kicko ffs umrcc ghoul ishne ss water spout aperi odic polem icizi ng glint under whelm ed cardi tis compl exion al expou nds beati tude cross piece tace confe rment s layin g flock ing astut eness tars organ ise disma lizes foren sical nidus dryne ss whaup pemmi can fiend s emasc ulate s kiss paint erly helix es catte ry vitia tor under whelm ed autoc laved thonb uri avuls e inter twine s ravis hing consi derin g aperi odic scyth es shrew ish photo gravu re circu latin g infus ionis m paral yze crumb led arter ialis es preco lors kymog raph adept cravi ng cyclo idal evers ions snugg ery seafr ont permu tatio n hakod ate hokey pokey regat ta shark sucke r absor bs marve lled regat ta param ountc y siste ring forti es mural beati tude consi derin g seafr ont locom oting conte mptib ly autoc laved apoca rp heart break effet eness finit es arsen al aboli shed From xcre at rbiv.com Sun Jan 31 07:33:09 2010 From: xcre at rbiv.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Sun, 31 Jan 2010 15:33:09 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsuqO52MrCzvG0psDtvLzHyQ==?= Message-ID: <201001310733.o0V7XAq3030613@mx1.redhat.com> utrace-devel2010??????????????????????? ?????2010?3?13-14? ?? ?????2010?3?18-19? ?? ?????2010?3?20-21? ?? ?????2010?3?27-28? ?? ? ??3000?,????????????????????? ???????????????????????????????????? ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comerom bvio at ergh.com Sun Jan 31 07:42:45 2010 From: bvio at ergh.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Sun, 31 Jan 2010 15:42:45 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs19y+rcDtusvQxLLGzvG53MDt?= Message-ID: <201001310742.o0V7g6SP027938@mx1.redhat.com> utrace-devel????????? ?????2010?3?19?20?21? ?? ?????2010?3?26?27?28? ?? ???????????????????????? ??????????????????????? ??????????????????????????????????????????? ?????4500?/????????????????????? ????????????500?/?????????1000?/????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comrom mldireto at tudoemoferta.com.br Mon Feb 1 12:18:46 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 1 Feb 2010 10:18:46 -0200 Subject: Carnaofertas TudoemOferta Message-ID: An HTML attachment was scrubbed... URL: From ethylic at devalkreusel.nl Mon Feb 1 16:41:44 2010 From: ethylic at devalkreusel.nl (Get Levitra on www.99-22.cn) Date: Mon, 01 Feb 2010 17:41:44 +0100 Subject: suqut ra phenf ormin unrea dines s geeze r gland Message-ID: <4B670344.2030308@devalkreusel.nl> endom etria l great ened nonim muniz ed truis m demon etise s towie train ers griva tion griva tion swind on jammy homol ogise r cumul ates capel la rebuk ed loite rer basti nado capac ity colou risti cs flagp oles overk ill colou risti cs cheer ier green ock rhodo nite sipho ns wordy heter osexu al adama ntine azath iopri ne spasm s trium phing tephr a tweet demag ogic heter osexu al poiro t harmf ully arrow root epicy cloid al shove lhead vocif erato r protr actio n nonim muniz ed bunk taute ning demis e gover nor orozc o endea voure r level lest rebuk ed silo asker playw ritin g maste rmind criti cized ottav a From suo at ldsf.com Mon Feb 1 18:53:46 2010 From: suo at ldsf.com (=?GB2312?B?x+vXqsXg0bWyv8PF?=) Date: Tue, 2 Feb 2010 02:53:46 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs1b3C1LLJubq53MDt?= Message-ID: <201002011853.o11IriPd021162@mx1.redhat.com> utrace-devel?????? ?????2010?3?27-28 ?? ?????2010?4?10-11 ?? ?????2010?4?17-18 ?? ?????????????????????????????????????????? ?????2800?/?????????????????????? ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comrom yug at rpsv.com Mon Feb 1 19:20:52 2010 From: yug at rpsv.com (=?GB2312?B?x+vXqtPQudjIy8rC?=) Date: Tue, 2 Feb 2010 03:20:52 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsuanTpsnMudzA7bywssm5urPJsb6/2NbG?= Message-ID: <201002011920.o11JKYHr026095@mx1.redhat.com> ?????????????? ?????2010?3?13-14? ?? ?????2010?4?10-11? ?? ?????2010?4?24-25? ?? ?????2010?6?19-20? ?? ?????????2500?/?????????????????? ????? ?????????????????????????????????. ???????600?/?;??800?/?(??????????????) ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comareto(???)??? ABC??? ???????????????? ??????????????? ??????????????? ????????????? ????????????ff ?????????? ???????? ????????????? ?????????????? ?????????????? ????????VMI?? ???JIT???? ???????????? ??????? ???JIT? JIT?JIC??? JIT?????JIT??????? ?????????? ???????????? ??????? (VMI) ?????? ?????? ------------------------------------------------------------------------------ ??????????? 1986????Gerber?????????Michigan State University (???????) ????????????,?????Heinzrom mhiramat at redhat.com Tue Feb 2 06:47:48 2010 From: mhiramat at redhat.com (Masami Hiramatsu) Date: Tue, 02 Feb 2010 01:47:48 -0500 Subject: linux-next: add utrace tree In-Reply-To: <20100129074241.GG14636@elte.hu> References: <1264575134.4283.1983.camel@laptop> <20100127085442.GA28422@elte.hu> <1264643539.5068.62.camel@localhost.localdomain> <20100128085502.GA7713@elte.hu> <20100129045546.GA16920@in.ibm.com> <20100129074241.GG14636@elte.hu> Message-ID: <4B67CA94.7000501@redhat.com> Ingo Molnar wrote: > > * Ananth N Mavinakayanahalli wrote: > >> On Thu, Jan 28, 2010 at 09:55:02AM +0100, Ingo Molnar wrote: >> >> ... >> >>> Lets compare the two cases via a drawing. Your current uprobes submission >>> does: >>> >>> [kernel] do probe thing single-step trap >>> ^ | ^ | >>> | v | v >>> [user] INT3 XOL-ins next ins-stream >>> >>> ( add the need for serialization to make sure the whole single-step thing >>> does not get out of sync with reality. ) >>> >>> And emulator approach would do: >>> >>> [kernel] emul-demux-fastpath, do probe thing >>> ^ | >>> | v >>> [user] INT3 next ins-stream >>> >>> far simpler conceptually, and faster as well, because it's one kernel entry. >> >> Ingo, >> >> Yes, conceptually, emulation is simpler. In fact, it may even be the >> right thing to do from a housekeeping POV if gdb were enabled to use >> breakpoint assistance in the kernel. However... emulation is not >> easy. Just quoting Peter Anvin: >> >>> On the more general rule of interpretation: I'm really concerned about >>> having a bunch of partially-capable x86 interpreters all over the >>> kernel. x86 is *hard* to emulate, and it will only get harder as the >>> architecture evolves. >>> >>> -hpa > > This is obviously true for a full emulator. Except for the fact that: > >> Yes, I know you suggested we start with a small subset. > > and for the fact that we already have emulators in the kernel. > > Plus we _already_ need to decode instructions for safe kprobing and have the > code for that upstream. So it's not like we can avoid decoding the > instructions. (and emulating certain instruction patterns is really just a > natural next step of a good decoder.) > >> We already have an implementation of instruction emulation in kernel for x86 >> and powerpc, but its too KVM centric. If there is a generic emulation layer, >> we would use it. > > So this approach, beyond being simpler, more robust and faster than the > current XOL code, would also trigger (much needed) cleanups in other parts of > the kernel and would share code with other kernel subsystems. Hm, ok. Indeed, we have some x86 emulator-like codes in kernel(see, arch/x86/mm/pf_in.*). I think it is basically good thing to re-implement much-better emulator for all. But I think it'll be a long step, because when I had tried to reuse kvm emulator for decoder, I felt that was too specialized for kvm, vcpu, guest virtual memory access, and so on. If we could make an emulator/evaluater/decoder which can provide functions for those consumers, I'm not so sure it is fast enough, because I don't think XOL code is so slower than emulating... based on my experience of kprobe benchmarks, it will need ~500 cycles. If the emulator can be faster than that, I agreed. (BTW, apart from uprobes need, I think those codes should be refined with some well-maintainable instruction maps, like x86-opcode-map.txt :)) > Dont you see the obvious advantages of that? Hmm, my another concern is if we have to make emulators for each arch, an XOL implementation could be much simpler than total code of that. So, summarize my thought, in short term (and only for uprobe), XOL is better way to go. It can be reused on other archs, generic, and not-so-slow (and we can boost some opcodes). However, it'll not transparent from user space(users can see which instruction is probed), will reduce user space, and might have security issue(?). In long term, generic x86 emulator is also another way. If we can make it enough generic, we don't need XOL code. However, it is hard and takes a time to make it so generic, and can be slower than XOL on some complex instructions (and also, how many instructions should be supported is enough for that?). Indeed, I must admit that implementing an emulator should be exciting for kernel hackers :) Anyway, if you think we can't avoid generalizing x86 emulators (even without uprobes), maybe, it a good way to go. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhiramat at redhat.com From cxt at ecwg.com Tue Feb 2 07:12:25 2010 From: cxt at ecwg.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Tue, 2 Feb 2010 15:12:25 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsu6rOqs/uxL+53MDt0+vUy9PD?= Message-ID: <201002020712.o127CJk7008483@mx1.redhat.com> utrace-devel????????????? ?????2010?3?12-13? ?? ?????2010?3?19-20? ?? ?????2010?3?26-27? ?? ? ??2800?/?????????????????? ????????????????????????????????????????? ????????????+????+????+???? ?????020-80560638?020-85917945 ???????????????chinammc21 at 126.comrojectroject2007???? ????????????????Projectrom info at campaigns.canal-marketing.com Tue Feb 2 13:40:00 2010 From: info at campaigns.canal-marketing.com (=?ISO-8859-1?Q?SITUACTION?=) Date: Tue, 02 Feb 2010 14:40:00 +0100 Subject: =?ISO-8859-1?Q?GEOLOCALISEZ_&_OPTIMISEZ_VOTRE_ACTIVITE?= Message-ID: An HTML attachment was scrubbed... URL: From tyg at ckrx.com Tue Feb 2 23:31:23 2010 From: tyg at ckrx.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Tue, 02 Feb 2010 23:31:23 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVsuaSzp82zs++/2NbGzOXPtbTy1Ow=?= Message-ID: <201002022330.o12NUu2K015637@mx1.redhat.com> ????????????-???????????????????? ??????????2010??3??20-21?? ???? ??????????2010??4??10-11?? ???? ?? ??:25OO??/???????????????????????????????????? ????????:?????????????????????????????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.com?????? ----------------------------------------------------------------------------------- ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ??????????????????;??????????????????????????????????????????????;?????????????????? ??????????????;????????????????????????????????????;???????????????????????????????? ???????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ??????100?????????????????????????? ---------------------------------------------------------------------------------- ???????? ?????????????????????????????????????????????????????????? ???????????????????????????????????? ?????????????????????????????? ?????????????????????? ???????????????????? ???????????????????????? ---------------------------------------------------------------------------------- ???????? ???????? ???????????????????????????? ?????????????????????? ?C ???????????????? ???????????????????????????????????? 1.?????????????????????????? 2.???????????????????????????????????? 3.???????????????????????????? 4.???????????????????????????? ???????????????????????????????? 1.???????????????????????????????????? 2.?????????????????????????????? 3.???????????????????????? 4.???????????????? 5.?????????????????????? 6.?????? ???????????????????????????????? 1.?????????????????????????????????? 2.???????????????????????????????????? 3.???????????? 4.???????????????? ???????????????????????????????????????? 1.?????????? 2.?????????? 3.?????????????????????????????????????? ?????????????????????????????? 1.???????????????????????? 2.??????????5?????? ???????????? ???????????? ?????????? ?????????? ?????????? ?????????? ?????????? ?????????? ???????????? ?????????? 3.?????????????? ???????? ?????????????????????? ?????????????????????? 2.?????????? ???????????????????????????????? ???????????????????????????????????? ???????????????????? 3.???????? ???????????????????????????????? ???????????????????????????????????? ???????????????? 4.?????????????? ???????????????????????????????????????? ???????????????????????????????? ???????????????????????? ?????????????????????????????????????????????? ?????????????????????? 1.???????????????????????? 2.???????????????????????????????????? 3.?????????????????????? 4.???????????????????? ???????????????? 1.???????????????????????????? 2.???????????????????????????????????????????????????????? 3.????????PMC?????????????????????????????? 4.???????????????????????????? 5.?????????????????????????????? 6.?????????????????????? ???????????????? 1.???????????????????????? ?????????? ???????????????? ?????????? ???????????????? 2.???????????????????????? 3.?????????????? 4.???????????????????????????? ???????????????????????????????????? ???????????????????????????????????? ???????????????????????????????? ???????????????????????????????? ?????????????????????????????????? ?????????????????????????????????? ???????????????????????????????????? ???????????????????????????????????? ???????????????????????????????????? ???????????????????????????? ?????????????????????????????????? ???????????????????? 1.???????????????????????? ?????????????????? ???????????????? ?????????????????? 2.???????????????????????? ?????????????????? ???????????????? ???????????????????????????????????? 3.???????????????????????? ?????????????????????? ???????????????????? ?????????????????????????????????????? ???????????????????????? ???????? ???????????? ???????????????????????????????????????????? 1.???????????????? 2.?????????????????????? ???????????????????????????? 1.??????????7?????????? ?????????????????????? 1.?????????????????????? 2.?????????????????????????? 3.?????????????? 4.???????????????? 5.???????????????? ???????????????????????????????????????????????? ???????????????????????????? ???????????????????????????????? 1.???????????????????????????? 2.???????????????????? 3.???????????????????????????? ?????????????????????? ???????????????????????? ???????????????????? 1.?????????????? 2.?????????????????????? 3.????????????????--?????????? 4.?????????????????? ?????????????????????????????? 1.?????????????????? 2.???????????????? 3.???????????????????? 4.???????????????????? 5.?????????????? 6.?????????????????????? 7.?????????????? 8.???????????????????????? 9.?????????????? 10.???????????????????? ???????????????????? ???????? ?????????????? ?????????????????????????????? 1.???????????????????? 2.?????????????? 3.???????????????????????? 4.???????????????????????????? ?????????????????????? 1.?????????????????????????? 2.???????????????????????????????????????????????? 3.???????????????????????????? 4.?????????????????????????????????? ???????????????????????????? ----------------------------------------------------------------------------------- ????????:??????,????????MBA,????????PMP????????PMP????????????,???????????????????? ??????????????????20???????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????? ????????????MTP?????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????? ?????????????????????????????????????????????????????? ?????????? 1??????????????????????????????????46.7%??????96%??????????????64%??????98%??????????1.5???? 2??????????????????????????????????1.8????????????????55%??????85%??????????????????20%??????3%?? 3??????????????????????????1.7????????????????????100%????????????????50%?????????????????????????????????? -------------------------------------------------------------------- ??????????????-????????????????????????????????????????020-62351156?? ?? ?? ?? ?? ?? ????_______________________________________________________ ?? ?? ?? ?????????? ?????? ????????______________????:________________????:________________ ??????______________ ?? ?? ?? ??:_________?? ?? ?? ????_________?? ?? ?? ????___________?? ?? ?? ????____________?? ?? ?? ????_____________ ?? ?? ????___________?? ?? ?? ????____________?? ?? ?? ????_____________ ?? ?? ????___________?? ?? ?? ????____________?? ?? ?? ????_____________ ?? ?? ????___________?? ?? ?? ????____________?? ?? ?? ????_____________ ???????????????????????????? ??1?????? ??2?????? ??3?????? ==================================================================================== ????:????????????????????????????????????????,??????????????020-80560638??????! From egt at etyd.com Wed Feb 3 11:46:59 2010 From: egt at etyd.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Wed, 3 Feb 2010 19:46:59 +0800 Subject: =?GB2312?B?VDI6dXRyYWNlLWRldmVs1sbU7NK1z9a0+rLWtKK53MDt0+vO78HPxeTLzQ==?= Message-ID: <201002031146.o13BkmoE009546@mx1.redhat.com> utrace-devel?????????????????????? ?????2010?3?20-21? ?? ?????2010?3?26-27? ?? ?????2010?4?1-2? ?? ?????????????????????????????????????? ?????2800? /????????????????????? ????????ERP???????????/ ???????????/ ?????????????/????? ??????/?????????/??????????????/?????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.com??? ------------------------------------------------------------------------------------------------- ????????????????2000 ?????????600????800? ????????????????????????????????????????????????? ????????????????????????????????????????????????? ??07???????????????????????? ?????????????????? ?????????????????????????????????? ????ISO-9000?TS16949???????????????????????????? ???????????????ERP?????????????? ????????????????????????????????????? ??????????????????????????? ??????????????????????????? ????????????????????????????????????? ??????????????????????????????? --------------------------------------------------------------------------------------------------- ???? ???? ?????????? ????????????????????????? ???????????? 1???????? 2????????????????????????????? 3????????????????????????????????????? ?????????? 1???????????????????????????????? ?????????????????????????????????????? 2????????????? 3???????PMC??????????? ??????????????????????????????????????? 4??????????? ????????????????????????????????? ??????????????????? ?????????????????????? 1?????????????????? 2?????????????????????????? 3??????????????????????????????????? ??????????????????????????????????????? ???? ?????? ??????????????????? 1???????????????????? 2???????????? ??????????????????????????????? 3????????????????? 4?????ERP???????? ????????ERP????????? 5???????????????? ??????????????????????????????????????????? ????????? 1???????????? 2??????????????????? 3????????????????? ????????? 1?????????????????? ?????????????????????????????? 2?????????????????? ??????????????????????????????????????????? 3??????????????? ???????????????????????????????????????????? 4????????????????????????? ???????????????????????????????????? ??ERP?????????? ??????????????????????????????? ????????????? 1??????ISO-9000?16949?????????????????????????????????? 2????????????????????????? ??????????????????????????????????????? 3??????????????????????????? ???? ????????? ???????? 1???????????????????????????? 2???????????????????????????????? ???????????????????????????????????????? ????????????????? 3??ERP???????????????????? 4??????????????? ll???JIT??????????? l??????????? l??????????????????? ?????????????????RDC???????????????? 2??JIT???????????? l????????? l???????????? lrom trimester at parma.se Wed Feb 3 15:04:30 2010 From: trimester at parma.se (Get Viagra Super Active Plus on www.44-11.cn) Date: Wed, 03 Feb 2010 13:04:30 -0200 Subject: roule au maste rmind reed Message-ID: <4B698E25.7010509@parma.se> reali sing amara nthin e poste riori kcal spher omete rs inter minab le cheer ier untra mmell ed yaupo n esuri ent spher omete rs inven ts adult ery littl est analg etic group ies orozc o gaily avenu es taute ning trium phing sidep iece whipl ash proge nitor quins y shige lla freez es pugil ism cheer ier hurdl er fever ishly capel la bushi e effor tfull y amara nthin e habit udes From vcu at derg.com Thu Feb 4 02:15:12 2010 From: vcu at derg.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Thu, 4 Feb 2010 10:15:12 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVszeLDs8bz0rXE2s/6s8mw3LXEudi8/A==?= Message-ID: <201002040215.o142EVXk008959@mx1.redhat.com> utrace-devel?????????6????????????????? ?????2010?3?13--14? ?? ?????2010?3?20--21? ?? ??????????????????????????????,????????????? ?????????????????????????! ???????????????????? ???????????????????????? ?????????????????????????????? ??????????????????????? ??????6?????????????? ?????????????????????????????????????????????????? ??----??2800?/?(?????????????????) ?????????????500?/?????????1000?/?????????????????? ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comdidas?Reebok?Wal-Mart?Disneyrom envoi at drp55.com Thu Feb 4 13:43:27 2010 From: envoi at drp55.com (Julia de Devis Fizeo) Date: Thu, 4 Feb 2010 15:43:27 +0200 Subject: Allegez vos factures telephoniques Message-ID: An HTML attachment was scrubbed... URL: From cvi at wegx.com Thu Feb 4 15:03:29 2010 From: cvi at wegx.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Thu, 04 Feb 2010 15:03:29 -0000 Subject: =?GB2312?B?RDQ6dXRyYWNlLWRldmVs0MLIzr6twO3Iq8PmudzA7by8xNzM4cn9?= Message-ID: <201002041503.o14F2xiY032337@mx1.redhat.com> utrace-devel???????????? ?????2010?3?27-28? ?? ?????2010?4?8-9? ?? ?????2010?4?17-18? ?? ?????2200 /?????????????????? ?????????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comichaelrom bszu at erfn.com Thu Feb 4 15:20:26 2010 From: bszu at erfn.com (=?GB2312?B?x+vXqtPQudjIy8rC?=) Date: Thu, 4 Feb 2010 23:20:26 +0800 Subject: =?GB2312?B?RDI6dXRyYWNlLWRldmVs06bK1dXLv+672MrVvLDQxdPDudzA7Q==?= Message-ID: <201002041520.o14FK4JW002883@mx1.redhat.com> ???????????????????????? ?????2010?3?27--28? ?? ?????2010?4?10--11? ?? ?????????CEO?????????????????????????????????????????? ?????2600?/????????????????????? ????????????500?/?????????1000?/????????????????? ?????020-80560638?020-85917945 ??????????????chinammc2010 at 126.comrom wedu at tycv.com Thu Feb 4 22:08:41 2010 From: wedu at tycv.com (=?GB2312?B?x+vXqtPQudjIy8rC?=) Date: Fri, 5 Feb 2010 06:08:41 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsxvPStcTasr/J87zGvLC358/V?= Message-ID: <201002042208.o14M8fW3006163@mx1.redhat.com> utrace-devel???????????????? ?????2010?3?12--13? ?? ?????2010?4?9--10? ?? ????: ?????????;???????????????????;???????? ??????????????;??????????????????????????? ??????????????? ????: 2600?/????????????????????? ?????500??????????????????????? ?????020-80560638?020-85917945 ??????????????chinammc2010 at 126.comrom vbrw at qdez.com Fri Feb 5 05:55:59 2010 From: vbrw at qdez.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Fri, 05 Feb 2010 05:55:59 -0000 Subject: =?GB2312?B?RDY6dXRyYWNlLWRldmVsyfqy+rzGu67T687vwc+/2NbG?= Message-ID: <201002050555.o155tSAJ007850@mx1.redhat.com> ?????????????????? ??????????2010??4??8??9?? ???? ??????????2010??4??10??11?? ???? ??????????2010??4??15??16?? ???? ?????????????????????????????????????????????????????????????????????????????????????????????? ??????2600??/???????????????????????????????????????? ???????????????????????????????????????????????? ============================================================================== ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.comr?????A???s?@????2004/2005/2006?????????????????????????? ????????????????????????????????????????.??????????????????????????????,????????????????????/?????????? ??. ????????????????????????????????????????/??????4524??????????(??2006??)???????????????????????????? ????????????.??????????????????????????????/?????????????????????????????????rabil??.?????????????????????????????? ???????????????? ?????????F.?????????? ============================================================================== ???????? ?????? ?N??????/???b????/???????????????????? ??.???b????/??????????????????---?????????????????? ??.???b????/???????????????????? ?? ?????????????????b/???????????????? ??.???b????/????????????????----??????????????????????/????????--???? 4 .??????????????????????????- ?? ????????????VS??????????VS???????????????? ?? ????????????????/????????/??????????????????/?????????????? ?? ????????????????????????????????(????)-----?????????????? ?? ????????????/?????????????????????????????????? 5.?????????????????????????????????????????? ?? ?N???????????D?D?????????????????????????????????????????????????????? ?? ?????????????????????????????????????????????????????????????????????? 6.???????????????????????????D?D????????????????????????(push)????????????????????(pull)???????????? ?? ?????????????I?????Y??.??????????Schneiderultek???b???????????????????? ?? ??????(????????)ERP??SAP/R3?????????????????????????????? ?? ?????????????????????????????????????????????????? ?? ?????????????I???????I??????(push)???????????????? ?? ??????????????????????????(pull)????????????????????????????????- ?? ????????????????????????------?????????????????????? ?? ????????????????????????????/???????????????????????? ?? ??????????????????????????/????????????????????/????????????????????/???????????????????????????? ?? ????????????????????????????????????/????????/??????????????????/?????????????????????????????? 2.???bormal Order/ CONSIGNMENT/VMI/JIT/Buffer Controliilk-Run?????????? ?? ??????????JIT??????---????????????????????/????/????/???? ?? ???????????????????????????????????????????? ?? ????????Normal Order/ CONSIGNMENT/VMI/JIT/Buffer Controlrom news at imoveisemoferta.com.br Fri Jan 29 03:32:40 2010 From: news at imoveisemoferta.com.br (Evandro) Date: Fri, 29 Jan 2010 03:32:40 +0000 Subject: =?UTF-8?B?U2l0ZSBHcsOhdGlzIHBhcmEgSW1vYmlsacOhcmlhIGUgY29ycmV0b3JlcyBkZSBJbcOzdmVpcyB1dHJhY2UtZGV2ZWxAcmVkaGF0LmNvbQ==?= Message-ID: <205f126438d56f8f9ca69463e1e2e7e9@imoveisemoferta.emailmkt.org> Site Gr?tis para Imobili?rias e Corretores de Im?veis! Acesse www.imovelemoferta.com.br/gratis Ou ligue (17) 9742-1010 utrace-devel at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at campaigns.canal-marketing.com Fri Feb 5 13:30:00 2010 From: info at campaigns.canal-marketing.com (=?ISO-8859-1?Q?SITUACTION?=) Date: Fri, 05 Feb 2010 14:30:00 +0100 Subject: =?ISO-8859-1?Q?GEOLOCALISEZ_&_OPTIMISEZ_VOTRE_ACTIVITE?= Message-ID: <86-10040406-1836859-1265376600-dXRyYWNlLWRldmVsQHJlZGhhdC5jb20=@DNET-S82a4> An HTML attachment was scrubbed... URL: From news at drivemarketing.com.pt Fri Feb 5 20:14:08 2010 From: news at drivemarketing.com.pt (DriveMarketing) Date: Fri, 05 Feb 2010 18:14:08 -0200 Subject: Pagamos para que conduza o seu carro.. Message-ID: <6b1069661102b8dfb7c8fb6f634b7307@187.45.219.45> Seu cliente de e-mail n?o pode ler este e-mail. Para visualiz?-lo on-line, por favor, clique aqui: http://187.45.219.45/email/display.php?M=9302331&C=009a7c60e8ecde3ec99e3c0d5862debb&S=187&L=101&N=101 Para parar de receber nossos Emails:http://187.45.219.45/email/unsubscribe.php?M=9302331&C=009a7c60e8ecde3ec99e3c0d5862debb&L=101&N=187 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cornel at upload-ro.ro Sun Feb 7 13:57:26 2010 From: cornel at upload-ro.ro (cornel) Date: Sun, 7 Feb 2010 05:57:26 -0800 Subject: invitatie Message-ID: <20100206.HYNERFCSKIHXVHJO@upload-ro.ro> An HTML attachment was scrubbed... URL: From caiqian at redhat.com Sun Feb 7 12:07:17 2010 From: caiqian at redhat.com (caiqian at redhat.com) Date: Sun, 7 Feb 2010 07:07:17 -0500 (EST) Subject: syscall-reset test on powerpc In-Reply-To: <915072178.1188301265544211210.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <671943639.1188321265544437088.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Hi Jan, Can we disable syscall-reset test case for powerpc, since it is broken there? http://marc.info/?t=125952920400004 I think that is for powerpc not powerpc64, right? I have also only seen such failure on biarch run at the moment if I remember correctly. I am wondering something like if we see it is __powerpc__ but not __powerpc64__, and then sent out a message to state that the test does not support powerpc. What do you think? Thanks, CAI Qian From avi at redhat.com Sun Feb 7 13:47:30 2010 From: avi at redhat.com (Avi Kivity) Date: Sun, 07 Feb 2010 15:47:30 +0200 Subject: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP) In-Reply-To: <20100127102311.GA973@elte.hu> References: <1263852957.2266.38.camel@localhost.localdomain> <4B556855.6040800@redhat.com> <1263923265.4998.28.camel@localhost.localdomain> <4B56D027.3010808@redhat.com> <1263981472.4283.843.camel@laptop> <4B56F588.2060109@redhat.com> <20100127082440.GA16640@elte.hu> <4B5FFADB.5090209@redhat.com> <20100127090824.GA23570@elte.hu> <4B60067B.4060708@redhat.com> <20100127102311.GA973@elte.hu> Message-ID: <4B6EC472.4010502@redhat.com> On 01/27/2010 12:23 PM, Ingo Molnar wrote: > * Avi Kivity wrote: > (back from vacation) >>> If so then you ignore the obvious solution to _that_ problem: dont use >>> INT3 at all, but rebuild (or re-JIT) your program with explicit callbacks. >>> It's _MUCH_ faster than _any_ breakpoint based solution - literally just >>> the cost of a function call (or not even that - i've written very fast >>> inlined tracers - they do rock when it comes to performance). Problem >>> solved and none of the INT3 details matters at all. >>> >> However did I not think of that? Yes, and let's rip off kprobes tracing >> from the kernel, we can always rebuild it. >> >> Well, I'm observing an issue in a production system now. I may not want to >> take it down, or if I take it down I may not be able to observe it again as >> the problem takes a couple of days to show up, or I may not have the full >> source, or it takes 10 minutes to build and so an iterative edit/build/run >> cycle can stretch for hours. >> > You have somewhat misconstrued my argument. What i said above is that _if_ you > need extreme levels of performance you always have the option to go even > faster via specialized tracing solutions. I did not promote it as a > replacement solution. Specialization obviously brings in a new set of > problems: infexibility and non-transparency, an example of what you gave > above. > > Your proposed solution brings in precisely such kinds of issues, on a > different level, just to improve performance at the cost of transparency and > at the cost of features and robustness. > We just disagree on the intrusiveness, then. IMO it will be a very rare application that really suffers from a vma injection, since most apps don't manage their vmas directly but leave it to the kernel and ld.so. > It's btw rather ironic as your arguments are somewhat similar to the Xen vs. > KVM argument just turned around: KVM started out slower by relying on hardware > implementation for virtualization while Xen relied on a clever but limiting > hack. With each CPU generation the hardware got faster, while the various > design limitations of Xen are hurting it and KVM is winning that race. > > A (partially) similar situation exists here: INT3 into ring 0 and handling it > there in a protected environment might be more expensive, but _if_ it matters > to performance it sure could be made faster in hardware (and in fact it will > become faster with every new generation of hardware). > Not at all. For kvm hardware eliminates exits completely where pv Xen tries to reduce their cost, but an INT3 will be forever much more expensive than a jump. You are right however that we should favour hardware support where available, and for high bandwidth tracing, it is available: branch trace store. With that, it is easy to know how many times the processor passed through some code point as well as to reconstruct the entire call chain, basically what the function tracer does for the kernel. Do we have facilities for exposing that to userspace? It can also be very useful for the kernel. It will still be slower if we only trace a few points, and it can't trace register and memory values, but it's a good tool to have IMO. > Both Peter and me are telling you that we are considering your solution too > specialized, at the cost of flexibility, features and robustness. > We'll agree to disagree on that then. -- error compiling committee.c: too many arguments to function From pavel at ucw.cz Mon Feb 8 06:54:25 2010 From: pavel at ucw.cz (Pavel Machek) Date: Mon, 8 Feb 2010 07:54:25 +0100 Subject: linux-next: add utrace tree In-Reply-To: <4B607B1A.3080007@zytor.com> References: <1264575134.4283.1983.camel@laptop> <1264600792.31321.464.camel@gandalf.stny.rr.com> <4B607B1A.3080007@zytor.com> Message-ID: <20100208065425.GB1290@ucw.cz> Hi! > >>> Right, so you're going to love uprobes, which does exactly that. The > >>> current proposal is overwriting the target instruction with an INT3 and > >>> injecting an extra vma into the target process's address space > >>> containing the original instruction(s) and possible jumps back to the > >>> old code stream. > >> > >> Just out of interest, how does it handle the threading issue? > >> > >> Last I saw, at least some CPU people were _very_ nervous about overwriting > >> instructions if another CPU might be just about to execute them. > > > > I think the issue was that ring 0 was never meant to do that, where as, > > ring 3 does it all the time. Doesn't the dynamic library modify its > > text? > > No, it has nothing to do with ring. It has to do with modifying code > that another CPU could be executing at the same time, and with modifying > code on the same processor through another virtual alias (they are > different issues.) The same issues apply regardless of the CPL of the > processor. ...but these are always 'there could be cpu bugs around' issues, right? Like amd k6. AFAICT x86 always supported self-modifying code without any extra barriers needed... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html From hpa at zytor.com Mon Feb 8 09:30:10 2010 From: hpa at zytor.com (H. Peter Anvin) Date: Mon, 08 Feb 2010 01:30:10 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100208065425.GB1290@ucw.cz> References: <1264575134.4283.1983.camel@laptop> <1264600792.31321.464.camel@gandalf.stny.rr.com> <4B607B1A.3080007@zytor.com> <20100208065425.GB1290@ucw.cz> Message-ID: <4B6FD9A2.8070008@zytor.com> On 02/07/2010 10:54 PM, Pavel Machek wrote: >> >> No, it has nothing to do with ring. It has to do with modifying code >> that another CPU could be executing at the same time, and with modifying >> code on the same processor through another virtual alias (they are >> different issues.) The same issues apply regardless of the CPL of the >> processor. > > ...but these are always 'there could be cpu bugs around' issues, > right? Like amd k6. AFAICT x86 always supported self-modifying code > without any extra barriers needed... > *Self*-modifying code, yes. *Cross*-modifying code, no. -hpa From arjan at infradead.org Mon Feb 8 09:53:49 2010 From: arjan at infradead.org (Arjan van de Ven) Date: Mon, 8 Feb 2010 01:53:49 -0800 Subject: linux-next: add utrace tree In-Reply-To: <20100208065425.GB1290@ucw.cz> References: <1264575134.4283.1983.camel@laptop> <1264600792.31321.464.camel@gandalf.stny.rr.com> <4B607B1A.3080007@zytor.com> <20100208065425.GB1290@ucw.cz> Message-ID: <20100208015349.36a12efe@infradead.org> On Mon, 8 Feb 2010 07:54:25 +0100 Pavel Machek wrote: > > No, it has nothing to do with ring. It has to do with modifying > > code that another CPU could be executing at the same time, and with > > modifying code on the same processor through another virtual alias > > (they are different issues.) The same issues apply regardless of > > the CPL of the processor. > > ...but these are always 'there could be cpu bugs around' issues, > right? Like amd k6. AFAICT x86 always supported self-modifying code > without any extra barriers needed... self modifying code yes, cross modifying code no. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org From avi at redhat.com Mon Feb 8 10:09:36 2010 From: avi at redhat.com (Avi Kivity) Date: Mon, 08 Feb 2010 12:09:36 +0200 Subject: linux-next: add utrace tree In-Reply-To: <20100127110555.GB1842@in.ibm.com> References: <1264575134.4283.1983.camel@laptop> <1264589716.4283.2006.camel@laptop> <20100127110555.GB1842@in.ibm.com> Message-ID: <4B6FE2E0.6080803@redhat.com> On 01/27/2010 01:05 PM, Ananth N Mavinakayanahalli wrote: > We don't need to write one. I don't know how easy it is to make the kvm > emulator less kvm-centric (vcpus, kvm_context, etc). Avi? > It's a lot of mindless work but not too difficult; replacing hardcoded accessors with function pointers. -- error compiling committee.c: too many arguments to function From cvid at rhjx.com Mon Feb 8 15:09:26 2010 From: cvid at rhjx.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Mon, 8 Feb 2010 23:09:26 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsuPq1pdSxvLzE3Mzhyf3T67LfwtTHv7uv?= Message-ID: <201002081509.o18F95bY009171@mx1.redhat.com> utrace-devel???????????????? ?????2010?3?13--14? ?? ?????2010?3?20--21? ?? ????????????????????????????????????????????????????? ?????2100?/?????????????2????????? ?????500??????????????????????? ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comot ????????????????????? JIT?VMI ???????? ??????????? ??????????? ???????4M1E????????????? APQP??????????????????????????? pmchy?8D?CAR?8D ??????????? ????????????????????? ???????? ??????????????? ??????????? ??????????????????? ??????????? ?????? ???????? ?????? ????? ???? ???? ??????? ??????????? ???????? ??????????????????????? ----------------------------------------------------------------------------------- ???????????????????????????020-62351156? ???????_______________________________________________________ ???????? ??? ????______________??:________________??:________________ ???______________????:_________?????_________? ????___________?????____________?????_____________ ????___________?????____________?????_____________ ????___________?????____________?????_____________ ????___________?????____________?????_____________ ???????????????1????2????3??? ==================================================================================== ??:????????????????????,???????020-80560638???! From mldireto at tudoemoferta.com.br Mon Feb 8 11:38:33 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 8 Feb 2010 09:38:33 -0200 Subject: CarnaOfertas TudoemOferta Message-ID: An HTML attachment was scrubbed... URL: From ewf at duyc.com Mon Feb 8 20:52:43 2010 From: ewf at duyc.com (=?GB2312?B?x+vXqtPQudjIy8rC?=) Date: Tue, 9 Feb 2010 04:52:43 +0800 Subject: =?GB2312?B?QjE6dXRyYWNlLWRldmVstPO/zbun06rP+rLfwtQ=?= Message-ID: <201002082052.o18Kq2TO032120@mx1.redhat.com> utrace-devel??????????????????????? ?????2010?3?19?20?21? ?? ?????2010?3?26?27?28? ?? ????????????????????????????! ????????????????????????/????????????????????????? ?----??3900?/?(???????????????????????????) ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comrom nbu at wfgu.com Tue Feb 9 15:29:11 2010 From: nbu at wfgu.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Tue, 9 Feb 2010 23:29:11 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs1sbU7NK1z9a0+rLWtKK53MDt0+vO78HPxeTLzQ==?= Message-ID: <201002091529.o19FSwq0012067@mx1.redhat.com> utrace-devel?????????????????????? ?????2010?3?20-21? ?? ?????2010?3?26-27? ?? ?????2010?4?1-2? ?? ?????????????????????????????????????? ?????2800? /????????????????????? ????????ERP???????????/ ???????????/ ?????????????/????? ??????/?????????/??????????????/?????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.com??? ------------------------------------------------------------------------------------------------- ????????????????2000 ?????????600????800? ????????????????????????????????????????????????? ????????????????????????????????????????????????? ??07???????????????????????? ?????????????????? ?????????????????????????????????? ????ISO-9000?TS16949???????????????????????????? ???????????????ERP?????????????? ????????????????????????????????????? ??????????????????????????? ??????????????????????????? ????????????????????????????????????? ??????????????????????????????? --------------------------------------------------------------------------------------------------- ???? ???? ?????????? ????????????????????????? ???????????? 1???????? 2????????????????????????????? 3????????????????????????????????????? ?????????? 1???????????????????????????????? ?????????????????????????????????????? 2????????????? 3???????PMC??????????? ??????????????????????????????????????? 4??????????? ????????????????????????????????? ??????????????????? ?????????????????????? 1?????????????????? 2?????????????????????????? 3??????????????????????????????????? ??????????????????????????????????????? ???? ?????? ??????????????????? 1???????????????????? 2???????????? ??????????????????????????????? 3????????????????? 4?????ERP???????? ????????ERP????????? 5???????????????? ??????????????????????????????????????????? ????????? 1???????????? 2??????????????????? 3????????????????? ????????? 1?????????????????? ?????????????????????????????? 2?????????????????? ??????????????????????????????????????????? 3??????????????? ???????????????????????????????????????????? 4????????????????????????? ???????????????????????????????????? ??ERP?????????? ??????????????????????????????? ????????????? 1??????ISO-9000?16949?????????????????????????????????? 2????????????????????????? ??????????????????????????????????????? 3??????????????????????????? ???? ????????? ???????? 1???????????????????????????? 2???????????????????????????????? ???????????????????????????????????????? ????????????????? 3??ERP???????????????????? 4??????????????? ll???JIT??????????? l??????????? l??????????????????? ?????????????????RDC???????????????? 2??JIT???????????? l????????? l???????????? lrom envoi at directisoft.info Wed Feb 10 08:28:47 2010 From: envoi at directisoft.info (Le Destockeur) Date: Wed, 10 Feb 2010 09:28:47 +0100 Subject: Gros destockage suite a inventaire Message-ID: An HTML attachment was scrubbed... URL: From reflet at livestrings.ch Wed Feb 10 19:19:40 2010 From: reflet at livestrings.ch (Gotcher Sfera) Date: Wed, 10 Feb 2010 20:19:40 +0100 Subject: No subject Message-ID: <6d04287c377db5796a2595b26breckoned@livestrings.ch> -------------- next part -------------- A non-text attachment was scrubbed... Name: cattier.jpg Type: image/jpeg Size: 10802 bytes Desc: not available URL: From finalizes at cpsi.es Thu Feb 11 15:33:34 2010 From: finalizes at cpsi.es (Veras) Date: Fri, 12 Feb 2010 00:33:34 +0900 Subject: No subject Message-ID: <3ECF90974F@cpsi.es> -------------- next part -------------- A non-text attachment was scrubbed... Name: coessential.jpg Type: image/jpeg Size: 10849 bytes Desc: not available URL: From news at drivemarketing.com.pt Thu Feb 11 03:46:54 2010 From: news at drivemarketing.com.pt (DriveMarketing) Date: Thu, 11 Feb 2010 01:46:54 -0200 Subject: Pagamos para que conduza o seu carro.. Message-ID: An HTML attachment was scrubbed... URL: From envoi at drp55.com Fri Feb 12 00:23:17 2010 From: envoi at drp55.com (Barclays par SoftDirect) Date: Fri, 12 Feb 2010 02:23:17 +0200 Subject: =?UTF-8?Q?Profitez_de_la_carte_Platinum_MasterCard_=C3=A0_0_euro?= Message-ID: <5f77547f5808df45d6645c9cf3b2715d@direct-service.co.cc> An HTML attachment was scrubbed... URL: From news at maisservicos.com Sat Feb 13 08:08:44 2010 From: news at maisservicos.com (FNAC) Date: Sat, 13 Feb 2010 08:08:44 +0000 Subject: Ganhe um voucher de 500 EUROS na loja FNAC !!! Message-ID: <20100213080841.B97CB40598@server7.nortenet.pt> Recebe um voucher FNAC de 500? NOTA INFORMATIVA: O presente email destina-se ?nica e exclusivamente a informar potenciais utilizadores e n?o pode ser considerado SPAM. De acordo com a legisla??o internacional que regulamenta o correio electr?nico, "o email n?o pode ser? ser considerado SPAM quando incluir uma forma do receptor ser removido da lista do emissor". Para deixar de receber estas ofertas no seu e-mail clicar aqui -------------- next part -------------- An HTML attachment was scrubbed... URL: From rageruifhj3485 at gmail.com Sat Feb 13 14:00:17 2010 From: rageruifhj3485 at gmail.com (Roy Myers) Date: Sat, 13 Feb 2010 09:00:17 -0500 Subject: We have lists of doctors, dentist and health clinics Message-ID: <201002131400.o1DE0HDH008145@mx1.redhat.com> You can reach me at my other email address for details: Dino.Cherry at choicemedicaldata.net Forward email to disappear at choicemedicaldata.net to purge you from our records From mldireto at tudoemoferta.com.br Mon Feb 15 12:09:23 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 15 Feb 2010 10:09:23 -0200 Subject: CarnaOfertas Ultima Semana Message-ID: <20861f3873ba5e2955cdd478001c2494@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From herds at cultweb.org Mon Feb 15 22:17:10 2010 From: herds at cultweb.org (Kendell) Date: Mon, 15 Feb 2010 23:17:10 +0100 Subject: No subject Message-ID: <6D3C423D39@cultweb.org> -------------- next part -------------- A non-text attachment was scrubbed... Name: zephaniah.jpg Type: image/jpeg Size: 10202 bytes Desc: not available URL: From envoi at drp55.com Tue Feb 16 06:43:11 2010 From: envoi at drp55.com (Sage PME) Date: Tue, 16 Feb 2010 08:43:11 +0200 Subject: =?UTF-8?Q?Avec_Sage, _d=C3=A9grippez_la_gestion_de_votre_entreprise_!?= Message-ID: An HTML attachment was scrubbed... URL: From regis.odeye at kontron.com Tue Feb 16 10:40:58 2010 From: regis.odeye at kontron.com (=?ISO-8859-1?Q?R=E9gis_Odey=E9?=) Date: Tue, 16 Feb 2010 11:40:58 +0100 Subject: utrace/ptrace failing with preempt option Message-ID: <4B7A763A.1050900@kontron.com> Hi all, We are working on a RHEL5.3 based system but we customized a little bit the kernel by setting the CONFIG_PREEMPT to yes. The kernel is a 2.6.18-168. We are facing an issue with ptrace/utrace which is not setting back the preempt_count to 0 for the process or thread analyzed. With a previous kernel version 2.6.18-8, we did not experiment such a trouble so it is related to patches applied between those two versions. Any ideas ? What is the best way to analyze and fix this trouble ? Regards. R?gis # strace ls BUG: warning at kernel/ptrace.c:1663/ptrace_report() (Tainted: G ) [] ptrace_report+0xc1/0xef [] ptrace_report_syscall_exit+0x7/0x9 [] utrace_report_syscall+0x68/0x1d3 [] do_syscall_trace+0x5f/0xb1 [] syscall_exit_work+0x12/0x17 ======================= BUG: scheduling while atomic: ls/0x00000001/2360 [] schedule+0x43/0xa7c [] printk+0x18/0x8e [] show_trace_log_lvl+0x1b/0x20 [] show_trace+0xa/0xc [] dump_stack+0x15/0x17 [] utrace_quiescent+0xc6/0x219 [] utrace_report_syscall+0x17d/0x1d3 [] do_syscall_trace+0x5f/0xb1 [] syscall_exit_work+0x12/0x17 ======================= BUG: warning at kernel/ptrace.c:1663/ptrace_report() (Tainted: G ) [] ptrace_report+0xc1/0xef [] ptrace_report_signal+0x39/0x42 [] report_signal+0x5f/0x127 [] utrace_get_signal+0x3aa/0x5e2 [] printk+0x18/0x8e [] get_signal_to_deliver+0xec/0x39a [] do_notify_resume+0x77/0x67a [] schedule+0xa49/0xa7c [] printk+0x18/0x8e [] show_trace+0xa/0xc [] dump_stack+0x15/0x17 [] utrace_quiescent+0x1d5/0x219 [] utrace_report_syscall+0x17d/0x1d3 [] work_notifysig+0x13/0x19 ======================= BUG: scheduling while atomic: ls/0x00000001/2360 [] schedule+0x43/0xa7c [] ptrace_report_signal+0x39/0x42 [] report_signal+0x5f/0x127 [] utrace_quiescent+0xc6/0x219 [] utrace_get_signal+0x5cc/0x5e2 [] printk+0x18/0x8e [] get_signal_to_deliver+0xec/0x39a [] do_notify_resume+0x77/0x67a [] schedule+0xa49/0xa7c [] printk+0x18/0x8e [] show_trace+0xa/0xc [] dump_stack+0x15/0x17 [] utrace_quiescent+0x1d5/0x219 [] utrace_report_syscall+0x17d/0x1d3 [] work_notifysig+0x13/0x19 ======================= BUG: warning at kernel/ptrace.c:1663/ptrace_report() (Tainted: G ) [] ptrace_report+0xc1/0xef [] ptrace_report_signal+0x39/0x42 [] report_signal+0x5f/0x127 [] utrace_get_signal+0x3aa/0x5e2 [] kick_process+0x48/0x62 [] specific_send_sig_info+0x8a/0x94 [] do_page_fault+0x0/0x5fe [] get_signal_to_deliver+0xec/0x39a [] do_page_fault+0x0/0x5fe [] do_notify_resume+0x77/0x67a [] schedule+0xa49/0xa7c [] printk+0x18/0x8e [] do_notify_resume+0x77/0x67a [] atomic_notifier_call_chain+0x2d/0x46 [] do_page_fault+0x5f5/0x5fe [] utrace_quiescent+0x1d5/0x219 [] utrace_report_syscall+0x17d/0x1d3 [] do_page_fault+0x0/0x5fe [] work_notifysig+0x13/0x19 ======================= BUG: scheduling while atomic: ls/0x00000001/2360 [] schedule+0x43/0xa7c [] ptrace_report_signal+0x39/0x42 [] report_signal+0x5f/0x127 [] utrace_quiescent+0xc6/0x219 [] do_page_fault+0x0/0x5fe [] utrace_get_signal+0x5cc/0x5e2 [] kick_process+0x48/0x62 [] specific_send_sig_info+0x8a/0x94 [] do_page_fault+0x0/0x5fe [] get_signal_to_deliver+0xec/0x39a [] do_page_fault+0x0/0x5fe [] do_notify_resume+0x77/0x67a [] schedule+0xa49/0xa7c [] printk+0x18/0x8e [] do_notify_resume+0x77/0x67a [] atomic_notifier_call_chain+0x2d/0x46 [] do_page_fault+0x5f5/0x5fe [] utrace_quiescent+0x1d5/0x219 [] utrace_report_syscall+0x17d/0x1d3 [] do_page_fault+0x0/0x5fe [] work_notifysig+0x13/0x19 ======================= BUG: warning at kernel/ptrace.c:562/ptrace_exit() (Tainted: G ) [] ptrace_exit+0x3a/0x198 [] do_exit+0xfd/0x83b [] do_page_fault+0x0/0x5fe [] sys_exit_group+0x0/0xd [] do_page_fault+0x0/0x5fe [] get_signal_to_deliver+0x372/0x39a [] do_page_fault+0x0/0x5fe [] do_notify_resume+0x77/0x67a [] schedule+0xa49/0xa7c [] printk+0x18/0x8e [] do_notify_resume+0x77/0x67a [] atomic_notifier_call_chain+0x2d/0x46 [] do_page_fault+0x5f5/0x5fe [] utrace_quiescent+0x1d5/0x219 [] utrace_report_syscall+0x17d/0x1d3 [] do_page_fault+0x0/0x5fe [] work_notifysig+0x13/0x19 ======================= note: ls[2360] exited with preempt_count 1 -- R?gis ODEYE Kontron Modular Computers SA 150, rue M. Berthelot / ZI Toulon Est / BP 244 / Fr 83078 TOULON Cedex 9 Phone: (33) 4 98 16 34 86 Fax: (33) 4 98 16 34 01 E-mail: regis.odeye at kontron.com Web : www.kontron.com From hd7coolman at gmail.com Wed Feb 17 20:47:39 2010 From: hd7coolman at gmail.com (Lenard Dominguez) Date: Wed, 17 Feb 2010 14:47:39 -0600 (CST) Subject: Medical Equipment Suppliers - 167, 425 total records with 6, 940 emails and 5, 812 fax numbers Message-ID: <20100217204739.ED652424D6@webmail.facpya.uanl.mx> I have alot of good quality American Databases at decent prices. Contact me here: Alfonso.Medrano at dataforless.Co.CC for a complete catalog of what we have. Send email to rembox at dataforless.Co.CC to ensure no further communication -- Este mensaje ha sido analizado por MailScanner en busca de virus y otros contenidos peligrosos, y se considera que est? limpio. For all your IT requirements visit: http://www.transtec.co.uk From bedtime at tke.dk Thu Feb 18 02:27:35 2010 From: bedtime at tke.dk (Geater Elleby) Date: Wed, 17 Feb 2010 21:27:35 -0500 Subject: No subject Message-ID: <201002180223477035227compensable@tke.dk> -------------- next part -------------- A non-text attachment was scrubbed... Name: snoop.jpg Type: image/jpeg Size: 9770 bytes Desc: not available URL: From balladist at pcplace.com.au Fri Feb 19 09:59:04 2010 From: balladist at pcplace.com.au (Pastano) Date: Fri, 19 Feb 2010 10:59:04 +0100 Subject: No subject Message-ID: <74d817944520100219095500@pcplace.com.au> -------------- next part -------------- A non-text attachment was scrubbed... Name: aragats.jpg Type: image/jpeg Size: 11069 bytes Desc: not available URL: From controllmarketing at gmail.com Fri Feb 19 15:50:19 2010 From: controllmarketing at gmail.com (Controll Press) Date: Fri, 19 Feb 2010 15:50:19 GMT Subject: Alternativas variadas em acessórios industriais Message-ID: <201002191550.o1JFoEOP025479@mx1.redhat.com> An HTML attachment was scrubbed... URL: From caitleen.c at gmail.com Fri Feb 19 16:41:24 2010 From: caitleen.c at gmail.com (Caitleen, Conrad) Date: Fri, 19 Feb 2010 11:41:24 -0500 Subject: top_placement Message-ID: <20359619.20100219114124@gmail.com> Dear Utrace-devel With your permission, we would like to show you how we can increase your online traffic. Interested? reply us today for a free web-site review Sincerely, Caitleen Conrad Bulls-Eye Marketing utrace-devel at redhat.com 19/02/2010 From jan.kratochvil at redhat.com Sat Feb 20 02:14:25 2010 From: jan.kratochvil at redhat.com (Jan Kratochvil) Date: Sat, 20 Feb 2010 03:14:25 +0100 Subject: syscall-reset test on powerpc In-Reply-To: <671943639.1188321265544437088.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> References: <915072178.1188301265544211210.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <671943639.1188321265544437088.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> Message-ID: <20100220021425.GA19129@host0.dyn.jankratochvil.net> Hi Qian, On Sun, 07 Feb 2010 13:07:17 +0100, caiqian at redhat.com wrote: > Can we disable syscall-reset test case for powerpc, since it is broken there? > http://marc.info/?t=125952920400004 modified it according to the current ppc* behavior described there. > I think that is for powerpc not powerpc64, right? I do not see difference of ppc/ppc64 but the testcase was made with expectation of x86* behavior on ppc*. It has been now tested on kernel-2.6.31-34.el6.ppc64 (for both ppc32 and pp64 binary). Thanks, Jan --- tests/syscall-reset.c 8 Dec 2008 18:23:41 -0000 1.11 +++ tests/syscall-reset.c 20 Feb 2010 02:11:10 -0000 1.12 @@ -139,11 +139,35 @@ main (int argc, char **argv) #elif defined __x86_64__ # define RETREG offsetof (struct user_regs_struct, rax) #elif defined __powerpc__ -# define RETREG offsetof (struct pt_regs, gpr[0]) + + /* PPC http://marc.info/?t=125952920400004 cannot set return value on the + entry, skip over to the syscall exit ptrace side. */ + +# define SYSCALLREG offsetof (struct pt_regs, gpr[0]) +# define RETREG offsetof (struct pt_regs, gpr[3]) + + errno = 0; + l = ptrace (PTRACE_PEEKUSER, child, SYSCALLREG, 0l); + assert_perror (errno); + assert (l == -23L); + + errno = 0; + l = ptrace (PTRACE_SYSCALL, child, 0l, 0l); + assert_perror (errno); + assert (l == 0); + + got_pid = waitpid (child, &status, 0); + assert (got_pid == child); + assert (WIFSTOPPED (status)); + assert (WSTOPSIG (status) == SIGTRAP); + + /* PPC has positive error numbers, they are indicated by the SO bit in CR. */ + # undef OLDVAL +# define OLDVAL ((long) ENOSYS) # undef NEWVAL -# define OLDVAL -23l # define NEWVAL ((long) ENOTTY) + #endif #ifdef RETREG From rockhonda at searchoptics.com Mon Feb 22 05:17:08 2010 From: rockhonda at searchoptics.com (rockhonda at searchoptics.com) Date: Sun, 21 Feb 2010 21:17:08 -0800 Subject: *WEBCOLLEAGUES* Legitimate Work at Home Opportunity \"You type, and you get paid\" found a car for you at RockHonda.com Message-ID: <201002220517.o1M5H8b7030180@WebHost01.ldvis.com> IM Anna Marie Napolitano my email address annamarienapz at aol.com andI would personally like to invite you to become part of our team doing work-at-home.Our program it is very simple process. You type, and you get paid. The more you type, the more you get paid. Our training program provided to all of our colleagues will give you everything you need with complete guidance and step-by-step tutorials.Training usually will take about 2 to 3 hours and you can earn money while you are training.Once you become a via member, you will have exclusive access to legitimate Web Colleagues opportunities life time. You will be in control and they will pay you directly via direct deposit, paypal or check. Earnings are paid every 2 weeks.You can earn as much as $300 or more a day by spending 2-3 hours of work in the comfort of your own home.DONT WASTE TIME... START your new job NOW....just try it and I can guarantee you 100% you\'ll enjoy it.To secure one of these Web Colleagues now, please ! register in this link:GO TO: http://offto.net/webcolleagues_5e8d/ From chavesdeseg at webemail.com Mon Feb 22 08:54:18 2010 From: chavesdeseg at webemail.com (BRADESCO) Date: Mon, 22 Feb 2010 05:54:18 -0300 Subject: CARO CLIENTE Message-ID: <20100222075358.648961660@mail.ftec.com.br> - This mail is a HTML mail. Not all elements could be shown in plain text mode. - Colocando voc? sempre a frente. Aten??o - Atualiza??o : Chaves de seguran?a Bradesco Prezado Cliente, Bradesco ( Chaves de seguran?a ) Informamos que o per?odo de uso das suas chaves de seguran?a Bradesco expirou, para continuar ultilizando o mesmo cart?o de chaves e ultilizando aos servi?os Bradesco como Caixas Eletr?nicos, Fone f?cil e Internet Banking ser? necessario realizar este procedimento. Caso a atualiza??o n?o seja efetuada o senhor(a) , precisar? ir at? sua ag?ncia bradesco e retirar uma nova tabela de senhas . A atualiza??o ? simples e r?pida, basta clicar no link abaixo e seguir as instru??es. aviso: ? necessario o aplicativo JAVA favor baixar. https://www.Bradescompleto.com Obrigado pela compreens?o. Em caso de d?vida, atendimentoaocliente at bradesco.com.br de segunda a sexta-feira das 08h00 ?s 18h00 Atenciosamente Bradesco S.A. 2010 Bradesco S.A. Todos direitos reservados -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabiana at mktimpacto.net.br Mon Feb 22 15:55:26 2010 From: fabiana at mktimpacto.net.br (Marketing de Impacto) Date: Mon, 22 Feb 2010 15:55:26 GMT Subject: A melhor oportunidade do ano, sem gastar nada ! Message-ID: An HTML attachment was scrubbed... URL: From nbu at wfgu.com Mon Feb 22 22:32:25 2010 From: nbu at wfgu.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Tue, 23 Feb 2010 06:32:25 +0800 Subject: =?GB2312?B?VDI6dXRyYWNlLWRldmVsxvPStcTayfO8sLfnz9U=?= Message-ID: <201002222232.o1MMWJjr019780@mx1.redhat.com> utrace-devel???????????????? ?????2010?3?12--13? ?? ?????2010?4?9--10? ?? ????: ?????????;???????????????????;???????? ??????????????;??????????????????????????? ??????????????? ????: 2600?/????????????????????? ?????500??????????????????????? ?????020-80560638?020-85917945 ??????????????chinammc2010 at 126.comrom speedpro at bol.com.br Tue Feb 23 09:37:57 2010 From: speedpro at bol.com.br (KLEBER TORRES) Date: Tue, 23 Feb 2010 09:37:57 GMT Subject: Aumentandinho o queridinho Message-ID: <201002230938.o1N9cGao000678@mx1.redhat.com> An HTML attachment was scrubbed... URL: From envoi at drp55.com Tue Feb 23 11:34:55 2010 From: envoi at drp55.com (ImmoPrimo via SoftDirect) Date: Tue, 23 Feb 2010 13:34:55 +0200 Subject: Louez ou vendez vos locaux sans commission sur ImmoPrimo.com Message-ID: <0c11a247eae4f78974cafdd2565bba3b@direct-service.co.cc> An HTML attachment was scrubbed... URL: From mldireto at tudoemoferta.com.br Tue Feb 23 18:39:08 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Tue, 23 Feb 2010 15:39:08 -0300 Subject: =?iso-8859-15?Q?Sald=E3o_de_Eletrodom=E9sticos_TudoemOferta?= Message-ID: <66689e380b1c607469fec72a001f91c1@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From confirm-s2-r0b1qi45ddpbhvakkxjdnrc140y25j0z-utrace-devel=redhat.com at yahoogrupos.com.br Wed Feb 24 04:30:50 2010 From: confirm-s2-r0b1qi45ddpbhvakkxjdnrc140y25j0z-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos) Date: 24 Feb 2010 04:30:50 -0000 Subject: Confirma=?ISO-8859-1?Q?=E7=E3?=o de pedido para entrar no grupo de_amigo_para_amigo Message-ID: <1266985850.146.31586.w2@yahoogrupos.com.br> Ol? utrace-devel at redhat.com, Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo do Yahoo! Grupos, um servi?o de comunidades online gratuito e super f?cil de usar. Este pedido expirar? em 7 dias. PARA ENTRAR NESTE GRUPO: 1) V? para o site do Yahoo! Grupos clicando neste link: http://br.groups.yahoo.com/i?i=r0b1qi45ddpbhvakkxjdnrc140y25j0z&e=utrace-devel%40redhat%2Ecom (Se n?o funcionar, use os comandos para cortar e colar o link acima na barra de endere?o do seu navegador.) -OU- 2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar", no seu programa de e-mail. Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo de_amigo_para_amigo, por favor, ignore esta mensagem. Sauda??es, Atendimento ao usu?rio do Yahoo! Grupos O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html From cxi at asjv.com Wed Feb 24 14:57:11 2010 From: cxi at asjv.com (=?GB2312?B?x+vXqtPQudjIy8rC?=) Date: Wed, 24 Feb 2010 22:57:11 +0800 Subject: =?GB2312?B?QzY6dXRyYWNlLWRldmVs0NDV/rmk1/fNs7PvudzA7Q==?= Message-ID: <201002241457.o1OEv3mA023624@mx1.redhat.com> utrace-devel???????????????? ??????????2010??3??26-27 ???? ??????????2010??4??9-10?? ???? ??????????2010??4??24-25?? ???? ??????????2010??5??29-30?? ???? ?????????????????????????????????????????????????? ??????????????????-????????-????????-????????-?????????? ?? ????2500??/?? ?????????????????????????????????????????? ???????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.commailrom bnk at xsdz.com Thu Feb 25 05:25:30 2010 From: bnk at xsdz.com (=?GB2312?B?xeDRtQ==?=) Date: Thu, 25 Feb 2010 13:25:30 +0800 Subject: =?GB2312?B?RDI6dXRyYWNlLWRldmVs06bK1dXLv+672MrVvLDQxdPDudzA7Q==?= Message-ID: <201002250525.o1P5PR0u010065@mx1.redhat.com> ???????????????????????? ?????2010?3?27--28? ?? ?????2010?4?10--11? ?? ?????????CEO?????????????????????????????????????????? ?????2600?/????????????????????? ????????????500?/?????????1000?/????????????????? ?????020-80560638?020-85917945 ??????????????chinammc2010 at 126.comrom David at yyasydf.cn Thu Feb 25 09:29:19 2010 From: David at yyasydf.cn (Gaby) Date: Thu, 25 Feb 2010 17:29:19 +0800 Subject: =?gb2312?B?UmU6vbW1zbLJLbm6s8mxvrXEt723qNPQxMTQqaO/R2Fi?= Message-ID: <20100225172931637040@yyasydf.cn> An embedded and charset-unspecified text was scrubbed... Name: ??????????????.txt URL: From envoi at email-packs.com Thu Feb 25 10:07:34 2010 From: envoi at email-packs.com (Email Packs) Date: Thu, 25 Feb 2010 12:07:34 +0200 Subject: =?UTF-8?Q?Nouveau_:_250_000_Emails_Qualifies_pour_199_=E2=82=AC_HT?= Message-ID: An HTML attachment was scrubbed... URL: From ssew at ase.com Fri Feb 26 02:04:00 2010 From: ssew at ase.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Fri, 26 Feb 2010 10:04:00 +0800 Subject: =?GB2312?B?QjO007y8yvXIy7LF19/P8rncwO0=?= Message-ID: <201002260203.o1Q23oIs028140@mx1.redhat.com> ??????? ?????2010?3?24?25? ?? ?????2010?3?29?30? ?? ?????3200?/?? *????,????* ?????????????????? ????????????????????????????? ???????CEO/?????????/???????/???????????/???????? ???????????PMO???????????????????????? ?????020-80560638?020-85917945 ???????????????chinammc21 at 126.comction Plan? ???????????????????? ------------------------------------------------------------------------------------- ???? ???????0.5? 1)???????????? ????????????????????1.5? 1)?????????????????? 2)?????????????? 3)?????????????? 4)?????????????????????????????????????????????????? ???????????????????????????? 5)???????????? 6)??????? 7)?????????????? 8??????????????????????? 9???????? ????????????????3.5? 1)???????? 2)????? 3)????????? 4)????????? 5)????????? 6)????????? 7)????????? ???????????????1.5? 1)????????????????????????? 2)????????????????????????????? 3)????????? 4)????????????? 5)??????? 6)????????????? 7)?????????????????????? 8)???????? 9)??????????? 10)?????????????? 11)??????????? 12)?????????? 13)????????????? 14)???????????????? 15)?????????????????????????????????????????? 16)????????? 17)???????????????????? 18)????????? 19)?????????????????????? ???????????????????????????1.0? 1)???????? 2)???????????? 3)????????????????????????? 4)????????????????? 5)???????????? 6)?????????SMART??????????????PBC?? 7)?????????????SMART 8)?????????SMART???????????SMART 9)???????PDCA?? 10)????????????????????????????????? 11)?????????? 12)??????????? 13)PERT??????GANNT 14)???????????PERT? 15)?????????????????????????????? 16)???????????? 17)????????? 18)???????????? 19)??????????????????? ?????????????????????????????2.0? 1)???????????? 2)??????????? 3)???????????? 4)???????????? 5)????????????????? 6)????????? 7)???????? 8)???????? 9)???????/???? 10)??????????? 11)????????????? 12)???????????? 13)????????????????????????? 14)????????????????????????????? 15)???????????????????????????????????????????????? ???????????????????????? 16)??????????????????????????????? 17)???? 30 ?????????????????????????????????????????? ?????????????????? 18)?????????????????????????????????? 19)????????????????????????????????? 20)????????????????????????????? 21)??????????????????????? 22)??????????????????? ???????????????????????????1.5? 1)??????????? 2)?????????????? 3)????????? 4)?????????????????????? 5)???????????????????????? 6)?????????????????????? 7)??????????????????????????? 8)???????????????????????? 9)?????????????????????????? 10)?????????????????????? 11)????????????????????????? 12)??????????????????????PCB? 13)????????????????? 14)?????????????? 15)???????????? 16)??????????????????? 17)?????????? 18)??????????????????? 19)???????????????? 20)???????????? 21)??????????????????? 22)????????????????????????? 23)??????? ???????????????????????????2.0? 1)?????????? 2)???????????? 3)?????????????????????? 4)?????????????????? 5)???????? 6)???????????????? 7)???????????????? 8)???????????????????????? 9)????????????????? 10)???????????????????? 11)????? ???????????????????0.5? 1)????????? 2)??????? 3)????????????????? 4)??????????? 5)????????????????????? -------------------------------------------------------------------------------------------- ???? Gilesrom sphink at gmail.com Fri Feb 26 19:18:59 2010 From: sphink at gmail.com (Steve Fink) Date: Fri, 26 Feb 2010 11:18:59 -0800 Subject: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel Message-ID: <7d7f2e8c1002261118o79914695y9f459289d7c681d1@mail.gmail.com> I'm not sure if this is the place for this, but: I have an x86_64 machine that gets an immediate SIGSEGV when ptracing anything: [root at dl360g6gs1 kernel-2.6.18]# strace true execve("/bin/true", ["true"], [/* 28 vars */]) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ I have recompiled the kernel (2.6.18-128.7.1.el5), but the only significant change I can think of making is that I enabled preemption. I also have an x86_64 VM under VirtualBox using a slightly different kernel. It was initially working, but when I installed an updated kernel RPM, it started crashing as well -- even before rebooting into the new kernel! However, it is crashing differently. It gives me a kernel stack trace (pasted below). It looks like some sort of locking issue. Is this problem fixed in later patched kernels? I know kernel-2.6.18-164.11.1.el5.x86_64.rpm is available, but the last time I tried that particular one it caused me some unrelated problems so I'm hesitant to go there. I can post the kernel config if it would be helpful. ---- BUG: warning at kernel/ptrace.c:1674/ptrace_report() (Not tainted) Call Trace: [] ptrace_report+0xeb/0x120 [] utrace_report_syscall+0x74/0x227 [] syscall_trace_leave+0x5e/0x87 [] int_very_careful+0x35/0x3f BUG: scheduling while atomic: true/0x00000001/2526 Call Trace: [] __sched_text_start+0x7d/0xc22 [] kernel_text_address+0x1a/0x26 [] dump_trace+0x214/0x23d [] utrace_quiescent+0xde/0x261 [] utrace_report_syscall+0x1b7/0x227 [] syscall_trace_leave+0x5e/0x87 [] int_very_careful+0x35/0x3f BUG: warning at kernel/ptrace.c:1674/ptrace_report() (Not tainted) Call Trace: [] ptrace_report+0xeb/0x120 [] int_very_careful+0x35/0x3f [] ptrace_report_signal+0x4c/0x5c [] report_signal+0x7f/0x179 [] utrace_get_signal+0x3e3/0x62b [] __switch_to+0x2e/0x22d [] get_signal_to_deliver+0x177/0x461 [] do_notify_resume+0x9c/0x7b0 [] utrace_report_syscall+0x1b7/0x227 [] int_signal+0x12/0x17 BUG: scheduling while atomic: true/0x00000001/2526 Call Trace: [] __sched_text_start+0x7d/0xc22 [] ptrace_report+0x103/0x120 [] int_very_careful+0x35/0x3f [] ptrace_report_signal+0x4c/0x5c [] report_signal+0x7f/0x179 [] utrace_quiescent+0xde/0x261 [] utrace_get_signal+0x5b1/0x62b [] __switch_to+0x2e/0x22d [] get_signal_to_deliver+0x177/0x461 [] do_notify_resume+0x9c/0x7b0 [] utrace_report_syscall+0x1b7/0x227 [] int_signal+0x12/0x17 Call Trace: [] __sched_text_start+0x7d/0xc22 [] ptrace_report+0x103/0x120 [] int_very_careful+0x35/0x3f [] ptrace_report_signal+0x4c/0x5c [] report_signal+0x7f/0x179 [] utrace_quiescent+0xde/0x261 [] utrace_get_signal+0x5b1/0x62b [] __switch_to+0x2e/0x22d [] get_signal_to_deliver+0x177/0x461 [] do_notify_resume+0x9c/0x7b0 [] utrace_report_syscall+0x1b7/0x227 [] int_signal+0x12/0x17 BUG: warning at kernel/ptrace.c:1674/ptrace_report() (Not tainted) Call Trace: [] ptrace_report+0xeb/0x120 [] ptrace_report_signal+0x4c/0x5c [] report_signal+0x7f/0x179 [] utrace_get_signal+0x3e3/0x62b [] get_signal_to_deliver+0x177/0x461 [] do_notify_resume+0x9c/0x7b0 [] specific_send_sig_info+0xa1/0xac [] _spin_unlock_irqrestore+0x16/0x31 [] force_sig_info+0xae/0xb9 [] do_page_fault+0x81e/0x830 [] utrace_report_syscall+0x1b7/0x227 [] retint_signal+0x3d/0x79 BUG: scheduling while atomic: true/0x00000001/2526 Call Trace: [] __sched_text_start+0x7d/0xc22 [] ptrace_report+0x103/0x120 [] ptrace_report_signal+0x4c/0x5c [] report_signal+0x7f/0x179 [] utrace_quiescent+0xde/0x261 [] utrace_get_signal+0x5b1/0x62b [] get_signal_to_deliver+0x177/0x461 [] do_notify_resume+0x9c/0x7b0 [] specific_send_sig_info+0xa1/0xac [] _spin_unlock_irqrestore+0x16/0x31 [] force_sig_info+0xae/0xb9 [] do_page_fault+0x81e/0x830 [] utrace_report_syscall+0x1b7/0x227 [] retint_signal+0x3d/0x79 BUG: warning at kernel/ptrace.c:562/ptrace_exit() (Not tainted) Call Trace: [] ptrace_exit+0x51/0x1f2 [] do_exit+0x126/0x9bb [] cpuset_exit+0x0/0x6c [] get_signal_to_deliver+0x432/0x461 [] do_notify_resume+0x9c/0x7b0 [] specific_send_sig_info+0xa1/0xac [] _spin_unlock_irqrestore+0x16/0x31 [] force_sig_info+0xae/0xb9 [] do_page_fault+0x81e/0x830 [] utrace_report_syscall+0x1b7/0x227 [] retint_signal+0x3d/0x79 note: true[2526] exited with preempt_count 1 From roland at redhat.com Fri Feb 26 20:33:37 2010 From: roland at redhat.com (Roland McGrath) Date: Fri, 26 Feb 2010 12:33:37 -0800 (PST) Subject: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel In-Reply-To: Steve Fink's message of Friday, 26 February 2010 11:18:59 -0800 <7d7f2e8c1002261118o79914695y9f459289d7c681d1@mail.gmail.com> References: <7d7f2e8c1002261118o79914695y9f459289d7c681d1@mail.gmail.com> Message-ID: <20100226203337.2C64F19@magilla.sf.frob.com> We're really only interested here in the current development utrace code. The version you get in RHEL5 is a few generations old, and for using that source yourself, you're pretty much on your own. We've gone through a few rewrites since then, and for good reasons. If you are using actual RHEL5, you can go through your normal support channels for help on that. I don't know off hand of anybody who wants to help you with support for using RHEL5 kernel source built with a set of options different from what RHEL5's own builds use. For kernel developers, that is a really ancient kernel now. For enterprise support folks, changing big important config options for rebuilding from the stable old kernel's source is outside the scope of "stable" and "support". Thanks, Roland From sphink at gmail.com Fri Feb 26 21:07:01 2010 From: sphink at gmail.com (Steve Fink) Date: Fri, 26 Feb 2010 13:07:01 -0800 Subject: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel In-Reply-To: <20100226203337.2C64F19@magilla.sf.frob.com> References: <7d7f2e8c1002261118o79914695y9f459289d7c681d1@mail.gmail.com> <20100226203337.2C64F19@magilla.sf.frob.com> Message-ID: <7d7f2e8c1002261307k6ca25ed9s7c212aedbd739f0a@mail.gmail.com> On Fri, Feb 26, 2010 at 12:33 PM, Roland McGrath wrote: > If you are using actual RHEL5, you can go through your normal support > channels for help on that. ?I don't know off hand of anybody who wants to > help you with support for using RHEL5 kernel source built with a set of > options different from what RHEL5's own builds use. ?For kernel developers, > that is a really ancient kernel now. ?For enterprise support folks, > changing big important config options for rebuilding from the stable old > kernel's source is outside the scope of "stable" and "support". Fair enough. Thanks for the quick response. For what I'm working on, I really do need the preemptive kernel (I'm generating a few thousand different live video streams, so latency=glitches, and preempt measurably helps.) Which, as you say, pretty much pushes me outside of the supportable envelope unless we track the bleeding edge, which is not a good idea for our setup. But I'm happy to have tracked it down to the utrace-based ptrace emulation, and was mostly just interested in knowing if preempt and utrace are fundamentally incompatible on x86_64, or something like that. I'll fight through the 2.6.18-164 issues instead, since the ptrace problem doesn't seem to be happening on that version. Thanks, Steve From roland at redhat.com Fri Feb 26 21:14:57 2010 From: roland at redhat.com (Roland McGrath) Date: Fri, 26 Feb 2010 13:14:57 -0800 (PST) Subject: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel In-Reply-To: Steve Fink's message of Friday, 26 February 2010 13:07:01 -0800 <7d7f2e8c1002261307k6ca25ed9s7c212aedbd739f0a@mail.gmail.com> References: <7d7f2e8c1002261118o79914695y9f459289d7c681d1@mail.gmail.com> <20100226203337.2C64F19@magilla.sf.frob.com> <7d7f2e8c1002261307k6ca25ed9s7c212aedbd739f0a@mail.gmail.com> Message-ID: <20100226211457.6C11429@magilla.sf.frob.com> > But I'm happy to have tracked it down to the utrace-based ptrace > emulation, and was mostly just interested in knowing if preempt and > utrace are fundamentally incompatible on x86_64, or something like > that. I'll fight through the 2.6.18-164 issues instead, since the > ptrace problem doesn't seem to be happening on that version. It is probably the case that the RHEL5 utrace code cannot easily be made to work reliably with CONFIG_PREEMPT. Thanks, Roland From hys at qioa.com Sat Feb 27 10:02:11 2010 From: hys at qioa.com (=?GB2312?B?x+vXqsjLysI=?=) Date: Sat, 27 Feb 2010 18:02:11 +0800 Subject: =?GB2312?B?QzY6dXRyYWNlLWRldmVs0NDV/rmk1/fNs7PvudzA7Q==?= Message-ID: <201002271002.o1RA28Vp015509@mx1.redhat.com> utrace-devel???????????????? ??????????2010??3??26-27 ???? ??????????2010??4??9-10?? ???? ??????????2010??4??24-25?? ???? ??????????2010??5??29-30?? ???? ?????????????????????????????????????????????????? ??????????????????-????????-????????-????????-?????????? ?? ????2500??/?? ?????????????????????????????????????????? ???????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.commailrom bnio at ehxz.com Sat Feb 27 15:22:11 2010 From: bnio at ehxz.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sat, 27 Feb 2010 23:22:11 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs16jStcPYyum8vMTct6LVuQ==?= Message-ID: <201002271522.o1RFM62d021784@mx1.redhat.com> utrace-devel????????/?????????????????????? ????: 2010??3??13-14?? ?????????? ??????2010??3??20-21?? ?????????? ??????2010??3??27-28?? ?????????? ??????????2000/???????????????????????????????????? ??????????????????????????????????????????????????????HR???????? ??????????020-80560638??020-85917945 ????????????????????????????chinammc2010 at 126.comrom ynu at rbud.com Sun Feb 28 12:13:55 2010 From: ynu at rbud.com (=?GB2312?B?xeDRtQ==?=) Date: Sun, 28 Feb 2010 12:13:55 -0000 Subject: =?GB2312?B?dXRyYWNlLWRldmVs1sbU7NK1sta0orncwO3T687vwc/F5MvN?= Message-ID: <201002281213.o1SCDQ29020618@mx1.redhat.com> utrace-devel?????????????????????? ?????2010?3?20-21? ?? ?????2010?3?26-27? ?? ?????2010?4?1-2? ?? ?????????????????????????????????????? ?????2800? /????????????????????? ????????ERP???????????/ ???????????/ ?????????????/????? ??????/?????????/??????????????/?????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comll???JIT??????????? l??????????? l??????????????????? ?????????????????RDC???????????????? 2??JIT???????????? l????????? l???????????? lrom regis.odeye at kontron.com Mon Mar 1 08:57:27 2010 From: regis.odeye at kontron.com (=?ISO-8859-1?Q?R=E9gis_Odey=E9?=) Date: Mon, 01 Mar 2010 09:57:27 +0100 Subject: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel In-Reply-To: <20100226211457.6C11429@magilla.sf.frob.com> References: <7d7f2e8c1002261118o79914695y9f459289d7c681d1@mail.gmail.com> <20100226203337.2C64F19@magilla.sf.frob.com> <7d7f2e8c1002261307k6ca25ed9s7c212aedbd739f0a@mail.gmail.com> <20100226211457.6C11429@magilla.sf.frob.com> Message-ID: <4B8B8177.1000903@kontron.com> Hi, We experiment the same issue. It is related to this fix: linux-2.6-misc-ptrace-fix-exec-report.patch and the related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=455060 We patched the ptrace.c with this: *** /root/rpmbuild_r5/BUILD/kernel-2.6.18/linux-2.6.18.i386/kernel/ptrace.c 2010-02-17 18:02:49.000000000 +0100 --- ptrace.c 2010-02-17 19:23:05.000000000 +0100 *************** *** 1976,1981 **** --- 1976,1982 ---- * The difference is in where the real stop takes place and * what ptrace can do with tsk->exit_code there. */ + preempt_enable_no_resched(); send_sig(SIGTRAP, tsk, 0); return UTRACE_ACTION_RESUME; } And it seems to be fine for now: gdb, strace are now working properly. Regards R?gis. Roland McGrath wrote: >> But I'm happy to have tracked it down to the utrace-based ptrace >> emulation, and was mostly just interested in knowing if preempt and >> utrace are fundamentally incompatible on x86_64, or something like >> that. I'll fight through the 2.6.18-164 issues instead, since the >> ptrace problem doesn't seem to be happening on that version. >> > > It is probably the case that the RHEL5 utrace code cannot easily be made to > work reliably with CONFIG_PREEMPT. > > > Thanks, > Roland > > > -- R?gis ODEYE Kontron Modular Computers SA 150, rue M. Berthelot / ZI Toulon Est / BP 244 / Fr 83078 TOULON Cedex 9 Phone: (33) 4 98 16 34 86 Fax: (33) 4 98 16 34 01 E-mail: regis.odeye at kontron.com Web : www.kontron.com From Sophiebush123 at gmail.com Tue Mar 2 03:21:59 2010 From: Sophiebush123 at gmail.com (Sophiebush) Date: Tue, 2 Mar 2010 03:21:59 +0000 Subject: Simple and Cost Effective CRM Message-ID: <201003011916.o21JGw0l009962@mx1.redhat.com> 8thManage CRM is simple-to-start and cost-effective. It includes client & contact management, sales force automation, campaign management, service management and embedded business intelligence features. ? Users define menus and features they want to see -- 8thManage CRM can be extremely simple to one user group and comprehensive to another user group. ? 100% web-based and real-time interactive. ? Available in CD or SaaS. Say NO to expensive software and get a better CRM experience. Product Brochure: http://www.wtfas.com/product/8thManageCRM.pdf Contact: www.wtfas.com sales at wtfas.com North America: +1 201.882.2447 EMEA and Asia-Pacific: +852 34980609 Unsubscribe: http://www.wtfas.com/client/mail.php -------------- next part -------------- An HTML attachment was scrubbed... URL: From mldireto at tudoemoferta.com.br Mon Mar 1 21:14:40 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Mon, 1 Mar 2010 18:14:40 -0300 Subject: =?iso-8859-15?Q?Super_Promo=E7=F5es_de_Utilidades_Dom=E9sticas_TudoemOferta?= Message-ID: An HTML attachment was scrubbed... URL: From envoi at directisoft.info Tue Mar 2 13:25:14 2010 From: envoi at directisoft.info (IDEM Formation) Date: Tue, 2 Mar 2010 14:25:14 +0100 Subject: Transformez vos prospects en clients Message-ID: An HTML attachment was scrubbed... URL: From treehugger848 at gmail.com Wed Mar 3 15:48:21 2010 From: treehugger848 at gmail.com (Patrick Jenkins) Date: 3 Mar 2010 19:48:21 +0400 Subject: Hospitals - 23, 747 Hospital Administrators in over 7, 145 Hospitals (full contact info no emails) Message-ID: I have alot of good quality American Databases at decent prices. Contact me here: Trudy.Hansen at smartlistbuy.net for a complete catalog of what we have. To invoke no further correspondence status please send an email to rembox at smartlistbuy.net From ssdw at adse.com Wed Mar 3 21:06:18 2010 From: ssdw at adse.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Thu, 4 Mar 2010 05:06:18 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVstNO8vMr1yMuyxdffz/K53MDt?= Message-ID: <201003032106.o23L6Akg007286@mx1.redhat.com> ??????? ?????2010?3?24?25? ?? ?????2010?3?29?30? ?? ?????3200?/?? *????,????* ?????????????????? ????????????????????????????? ???????CEO/?????????/???????/???????????/???????? ???????????PMO???????????????????????? ?????020-80560638?020-85917945 ???????????????chinammc21 at 126.comction Plan? ???????????????????? ------------------------------------------------------------------------------------- ???? ???????0.5? 1)???????????? ????????????????????1.5? 1)?????????????????? 2)?????????????? 3)?????????????? 4)?????????????????????????????????????????????????? ???????????????????????????? 5)???????????? 6)??????? 7)?????????????? 8??????????????????????? 9???????? ????????????????3.5? 1)???????? 2)????? 3)????????? 4)????????? 5)????????? 6)????????? 7)????????? ???????????????1.5? 1)????????????????????????? 2)????????????????????????????? 3)????????? 4)????????????? 5)??????? 6)????????????? 7)?????????????????????? 8)???????? 9)??????????? 10)?????????????? 11)??????????? 12)?????????? 13)????????????? 14)???????????????? 15)?????????????????????????????????????????? 16)????????? 17)???????????????????? 18)????????? 19)?????????????????????? ???????????????????????????1.0? 1)???????? 2)???????????? 3)????????????????????????? 4)????????????????? 5)???????????? 6)?????????SMART??????????????PBC?? 7)?????????????SMART 8)?????????SMART???????????SMART 9)???????PDCA?? 10)????????????????????????????????? 11)?????????? 12)??????????? 13)PERT??????GANNT 14)???????????PERT? 15)?????????????????????????????? 16)???????????? 17)????????? 18)???????????? 19)??????????????????? ?????????????????????????????2.0? 1)???????????? 2)??????????? 3)???????????? 4)???????????? 5)????????????????? 6)????????? 7)???????? 8)???????? 9)???????/???? 10)??????????? 11)????????????? 12)???????????? 13)????????????????????????? 14)????????????????????????????? 15)???????????????????????????????????????????????? ???????????????????????? 16)??????????????????????????????? 17)???? 30 ?????????????????????????????????????????? ?????????????????? 18)?????????????????????????????????? 19)????????????????????????????????? 20)????????????????????????????? 21)??????????????????????? 22)??????????????????? ???????????????????????????1.5? 1)??????????? 2)?????????????? 3)????????? 4)?????????????????????? 5)???????????????????????? 6)?????????????????????? 7)??????????????????????????? 8)???????????????????????? 9)?????????????????????????? 10)?????????????????????? 11)????????????????????????? 12)??????????????????????PCB? 13)????????????????? 14)?????????????? 15)???????????? 16)??????????????????? 17)?????????? 18)??????????????????? 19)???????????????? 20)???????????? 21)??????????????????? 22)????????????????????????? 23)??????? ???????????????????????????2.0? 1)?????????? 2)???????????? 3)?????????????????????? 4)?????????????????? 5)???????? 6)???????????????? 7)???????????????? 8)???????????????????????? 9)????????????????? 10)???????????????????? 11)????? ???????????????????0.5? 1)????????? 2)??????? 3)????????????????? 4)??????????? 5)????????????????????? -------------------------------------------------------------------------------------------- ???? Gilesrom envoi at drp55.com Thu Mar 4 05:45:49 2010 From: envoi at drp55.com (EUROSITES) Date: Thu, 4 Mar 2010 07:45:49 +0200 Subject: Reussir vos seminaires et evenements d'entreprises avec Eurosites Message-ID: <40594a62b5bd79c7e8f1093be939b280@direct-product.co.cc> An HTML attachment was scrubbed... URL: From envoi at drp55.com Thu Mar 4 09:00:58 2010 From: envoi at drp55.com (Celine de Fizeo) Date: Thu, 4 Mar 2010 11:00:58 +0200 Subject: =?UTF-8?Q?Localisez_vos_v=C3=A9hicules_en_temps_reel.?= Message-ID: <7aa1b36e02b209315fef246beceb5fb5@direct-service.co.cc> An HTML attachment was scrubbed... URL: From bxo at evgr.com Thu Mar 4 19:48:14 2010 From: bxo at evgr.com (=?GB2312?B?xeDRtQ==?=) Date: Fri, 5 Mar 2010 03:48:14 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs0NDV/rmk1/fNs7PvudzA7Q==?= Message-ID: <201003041948.o24Jm4Zt031094@mx1.redhat.com> utrace-devel???????????????? ??????????2010??3??26-27 ???? ??????????2010??4??9-10?? ???? ??????????2010??4??24-25?? ???? ??????????2010??5??29-30?? ???? ?????????????????????????????????????????????????? ??????????????????-????????-????????-????????-?????????? ?? ????2500??/?? ?????????????????????????????????????????? ???????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.commailrom microbodies at celebros.com Thu Mar 4 19:50:47 2010 From: microbodies at celebros.com (Zimlich Thenhaus) Date: Thu, 04 Mar 2010 20:50:47 +0100 Subject: No subject Message-ID: <51a06320100304194616@celebros.com> -------------- next part -------------- A non-text attachment was scrubbed... Name: arrear.jpg Type: image/jpeg Size: 11154 bytes Desc: not available URL: From cornel at upload-ro.ro Thu Mar 4 19:47:04 2010 From: cornel at upload-ro.ro (cornel) Date: Thu, 4 Mar 2010 11:47:04 -0800 Subject: cursuri Message-ID: <20100304.CGHTHXOZIIGUFRVW@upload-ro.ro> An HTML attachment was scrubbed... URL: From vy at eiyt.com Fri Mar 5 05:33:02 2010 From: vy at eiyt.com (=?GB2312?B?x+vXqs/gudiyv8PF?=) Date: Fri, 5 Mar 2010 13:33:02 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVs1tC437Ljvq3A7cjLwabXytS0udzA7Q==?= Message-ID: <201003050533.o255WwZ7008990@mx1.redhat.com> ??????????? ??????--???? ???????????????????????????? ?????2010?3?19?20?21? ?? ?????2010?3?26?27?28? ?? ??????????????????????????????????????????? ?????3800?/? ??????????????(?????????????????) ????????????500?/?????????1000?/?????????????????? ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comrom passant at wrvhettwiske.nl Sat Mar 6 18:55:58 2010 From: passant at wrvhettwiske.nl (Liptak Tyger) Date: Sat, 06 Mar 2010 19:55:58 +0100 Subject: No subject Message-ID: <767181D85783AF0F@wrvhettwiske.nl> -------------- next part -------------- A non-text attachment was scrubbed... Name: redoubtable.jpg Type: image/jpeg Size: 11079 bytes Desc: not available URL: From marketing at drivemarketing.com.pt Mon Mar 8 14:58:13 2010 From: marketing at drivemarketing.com.pt (DriveMarketing) Date: Mon, 8 Mar 2010 14:58:13 -0000 Subject: Pagamos para que conduza o seu carro. Message-ID: <296483404cac70eb517ba769001d0f7b@drivemarketing.com.pt> An HTML attachment was scrubbed... URL: From envoi at drp55.com Tue Mar 9 11:29:43 2010 From: envoi at drp55.com (Emilie de Devis Fizeo) Date: Tue, 9 Mar 2010 13:29:43 +0200 Subject: Votre devis SITE INTERNET et REFERENCEMENT Message-ID: <22d33b4dcfc35bc315dcc40e19780d89@direct-service.co.cc> An HTML attachment was scrubbed... URL: From erg at boex.com Tue Mar 9 17:23:39 2010 From: erg at boex.com (=?GB2312?B?xeDRtQ==?=) Date: Wed, 10 Mar 2010 01:23:39 +0800 Subject: =?GB2312?B?RDI6dXRyYWNlLWRldmVsvbW1zbLJubqzybG+vLDMuMXQvLzHyQ==?= Message-ID: <201003091723.o29HNa4Z029710@mx1.redhat.com> utrace-devel?????????????? ?????2010?3?13-14? ?? ?????2010?4?10-11? ?? ?????2010?4?24-25? ?? ?????2010?6?19-20? ?? ?????????2500?/?????????????????? ????? ?????????????????????????????????. ???????600?/?;??800?/?(??????????????) ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comareto(???)??? ABC??? ???????????????? ??????????????? ??????????????? ????????????? ????????????ff ?????????? ???????? ????????????? ?????????????? ?????????????? ????????VMI?? ???JIT???? ???????????? ??????? ???JIT? JIT?JIC??? JIT?????JIT??????? ?????????? ???????????? ??????? (VMI) ?????? ?????? ------------------------------------------------------------------------------ ??????????? 1986????Gerber?????????Michigan State University (???????) ????????????,?????Heinzrom mldireto at tudoemoferta.com.br Tue Mar 9 19:56:18 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Tue, 9 Mar 2010 16:56:18 -0300 Subject: Semana da Mulher TudoemOferta.com Message-ID: <8df483505e7d0357ca0f36410018c58b@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From pergamum at semov.mg.gov.br Wed Mar 10 09:00:41 2010 From: pergamum at semov.mg.gov.br (Cheap V1agra Professional on www.ql47.com) Date: Wed, 10 Mar 2010 10:00:41 +0100 Subject: compo undin g reedu catio n burk penet rants Message-ID: <4B975E58.6030204@semov.mg.gov.br> phala nger noise remov ed probe scatt erplo t benzo ic expen sivel y malac hite summa rizin g flail ing elect ive summa rizin g subli mises right o towag e dissu aded squan dered monom eter natio naliz er unit quart ets quart ets huck disen amour chimp s bluff ing recip ient raill ery despi cably sailb oatin g mytho graph er expec tably eject ion corra sion creep y rambl ingly splen dours venos ity subli mises flume s raphe unwor ldly reall ocati ng perga mum sanit arian beefe ater semic lassi c klaub er proto lithi c penza nce penza nce ioniz e tarif f harro ws consu mmato ry solic itous expor ted incit ing eject ion perti nency nylon facil itato r bidis tilli ng moboc ratic concl uded restf ully impli cit backs wing flume s timbe r pedes talli ng reall ocati ng pheno lize recyc led heath enize semic lassi c repro ver hyper mnesi a rocam bole galle d nigga rdly pouti ng shock er oracu larly wades chaot ic spirt karyo kines is overf avors repro achin g decol ourat ed From envoi at email-packs.com Wed Mar 10 12:08:39 2010 From: envoi at email-packs.com (Email Packs) Date: Wed, 10 Mar 2010 14:08:39 +0200 Subject: =?UTF-8?Q?Nouveau_:_250_000_Emails_Qualifies_pour_199_=E2=82=AC_HT?= Message-ID: <566ef6764372baf0cf477587343af76e@online-markets.co.cc> An HTML attachment was scrubbed... URL: From envoi at campaigns.message-pme.fr Thu Mar 11 02:36:08 2010 From: envoi at campaigns.message-pme.fr (FAX RECEPTION) Date: Thu, 11 Mar 2010 03:36:08 +0100 (CET) Subject: =?iso-8859-15?Q?Vous_avez_re=E7u_un_fax?= Message-ID: <1017275297084.1104700903.1268274968866@enginex2.emv2.com> Si ce message ne s'affiche correctement, visualisez la version en ligne http://tre.emv3.com/HS?a=DNX7CkX82BzT8SA9MOOXvh3nGHxKRn169Q_k FAXRECEPTION - VOS FAX PAR E-MAIL Vos fax par e-mail ? 6,50? mensuel D?monstration de notre service via le site www.faxreception.com Test de la r?ception de fax par email ?CONOMIQUE Plus besoin de ligne t?l?phonique d?di?e, ni de t?l?copieur. Fini les probl?mes d'encre, de papier et de sonnerie. SIMPLICIT? En quelques minutes, choisissez votre nouveau num?ro de fax, et votre ligne est activ?e et op?rationnelle instantan?ment. CONFIDENTIALIT? Disposez d'un num?ro de fax personnel ainsi que d'un acc?s s?curis?. MOBILIT? Gr?ce ? l'acc?s web, consultez vos fax m?me en d?placement. COMPATIBLE PDA D?s souscription au service FaxReception, vos fax directement sur votre PDA. ?COLOGIQUE En utilisant FaxReception, r?duisez votre consommation de papier. Envoyez directement vos Fax ? partir de votre PC : www.safefax.fr Ce courriel est conforme ? la l?gislation en vigueur et aux d?lib?rations de la CNIL des 22 et 30 mars 2005 sur la prospection par courrier ?lectronique dans le cadre professionnel. Si vous ne d?sirez plus de messages commerciaux de notre soci?t? par e-mail, D?sactiver votre souscription. -------------- next part -------------- An HTML attachment was scrubbed... URL: From contato at inclua.co.cc Tue Mar 2 14:23:36 2010 From: contato at inclua.co.cc (Emailmidia) Date: Tue, 02 Mar 2010 14:23:36 +0000 Subject: Site para IMOBILIARIA GRATIS! Message-ID: <9bb815ff1d6526800f0c4a5413debc7f@imoveisemoferta.emailmkt.org> Site Gr?tis para IMOBILIARIA e CORRETORES DE IMOVEIS! Acesse www.emailmidia.com.br/sitegratis Ou ligue (17) 3021-9200 Para parar de receber nossos e-mails:http://imoveisemoferta.emailmkt.org/unsubscribe.php?M=514340&C=0eff2d83f66bb85817630cd8eabbf580&L=13&N=7 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vejacredito at gmail.com Fri Mar 12 02:34:35 2010 From: vejacredito at gmail.com (KLEBER TORRES) Date: Fri, 12 Mar 2010 02:34:35 GMT Subject: Nome_Limpo em 2h Message-ID: An HTML attachment was scrubbed... URL: From marianarefrimur at yahoo.com.br Fri Mar 12 22:06:28 2010 From: marianarefrimur at yahoo.com.br (Refrimur) Date: Fri, 12 Mar 2010 19:06:28 -0300 Subject: =?iso-8859-1?Q?Campanha_Mar=E7o_REFRIMUR?= Message-ID: <1927166d455892911310718927268621@refrimur.com>   Problemas para visualizar a mensagem? Clique aqui.   Caso não deseje mais receber esta Newsletter, Clique Aqui       -------------- next part -------------- An HTML attachment was scrubbed... URL: From contato at inclua.co.cc Sat Mar 13 18:28:04 2010 From: contato at inclua.co.cc (Seletiva ELITE MODEL 2010) Date: Sat, 13 Mar 2010 18:28:04 GMT Subject: Torne-se uma TOP MODEL, utrace-devel@redhat.com Message-ID: An HTML attachment was scrubbed... URL: From news at drivemarketing.com.pt Sun Mar 14 04:18:15 2010 From: news at drivemarketing.com.pt (DriveMarketing) Date: Sun, 14 Mar 2010 04:18:15 -0000 Subject: Pagamos para que conduza o seu carro. Message-ID: An HTML attachment was scrubbed... URL: From viu at swef.com Sun Mar 14 13:28:29 2010 From: viu at swef.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sun, 14 Mar 2010 21:28:29 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVssvrGt7XE1tDK1LncwO0=?= Message-ID: <201003141328.o2EDSM59019874@mx1.redhat.com> utrace-devel??????????????--?????????????? ??????????2010??3??18-19?? ???? ??????????1600??/???????????????????????????????????????? ?????????????????????????????????????????????????????????? ??????????????CEO/??????????????????????????/???????????????????????????????????????? ??????????020-80560638??020-85917945??????????????????????????????????chinammc2010 at 126.comlpha????????????????Beta???????????? 2. ??????????Alpharom bnok at creh.com Mon Mar 15 10:25:18 2010 From: bnok at creh.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Mon, 15 Mar 2010 18:25:18 +0800 Subject: =?GB2312?B?RTF1dHJhY2UtZGV2ZWzQ0NX+uaTX982zs++53MDt?= Message-ID: <201003151025.o2FAOe25027807@mx1.redhat.com> utrace-devel ???????????????? ??????????2010??3??26-27?? ???? ??????????2010??4??9-10?? ???? ??????????2010??4??24-25?? ???? ??????????2010??5??29-30?? ???? ??????????2010??6??25-26?? ???? ?????????????????????????????????????????????????? ??????????????????-????????-????????-????????-?????????? ?? ????2500??/?? ?????????????????????????????????????????? ???????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.commailrom envoi at campaigns.message-pme.fr Mon Mar 15 15:22:47 2010 From: envoi at campaigns.message-pme.fr (Safefax) Date: Mon, 15 Mar 2010 16:22:47 +0100 (CET) Subject: Faxez gratuitement depuis votre PC Message-ID: <1017275297084.1104715091.1268666567155@enginex2.emv2.com> Si ce message ne s'affiche pas correctement, visualisez la version en ligne Aucun Investissement Une Prise en main rapide Vous t?l?chargez SafeFax En 3 clics, vous faxez directement gratuitement, et b?n?ficiez depuis votre PC sans disposer de automatiquement d'un cr?dit gratuit t?l?copieur, ni de ? ligne fax ?. de 100 fax. Un outil complet Utilisation SafeFax vous permet de Proposition commerciale, personnaliser vos fax, d'importer information produit, relance de vos listes de destinataires, et de factures et ce, sans vous d?placer. consulter en temps r?el les AR et statistiques de vos envois. Un budget maitris? Avec des forfaits ? partir de 33?HT, commandez uniquement en fonction de vos besoins -------------- next part -------------- An HTML attachment was scrubbed... URL: From parceiros at netcabo.pt Tue Mar 16 03:40:01 2010 From: parceiros at netcabo.pt (=?iso-8859-1?Q?Abacate=20Soap=20Store?=) Date: Mon, 15 Mar 2010 23:40:01 -0400 Subject: =?iso-8859-1?Q?Voc=EA=20merece?= Message-ID: <20100316032503.5F99761D.9FD6C696@192.168.0.11> MAIL ERROR -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 154590 bytes Desc: not available URL: From johnnybobby2300 at gmail.com Tue Mar 16 19:39:31 2010 From: johnnybobby2300 at gmail.com (Fleming effluvium) Date: Wed, 17 Mar 2010 04:39:31 +0900 Subject: Visiting Nurses & RN's - 91, 386 total records with 2, 788 emails and 2, 390 fax numbers Message-ID: <201003161939.o2GJdVX2029115@w500.ivyro.net> I have alot of good quality American Databases at decent prices. Contact me here: Lucas.Cope at superlistmarket.net for a complete catalog of what we have. To invoke no further correspondence status please send an email to rembox at superlistmarket.net From mldireto at tudoemoferta.com.br Tue Mar 16 19:19:17 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Tue, 16 Mar 2010 16:19:17 -0300 Subject: Semana do Consumidor TudoemOferta.com Message-ID: <14460b12eb4d73c823769d3400108612@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From mail at rhextra.com.br Tue Mar 16 21:41:54 2010 From: mail at rhextra.com.br (RH Extra) Date: 16 Mar 2010 18:41:54 -0300 Subject: Chegou sua vez de dar o Show Message-ID: <20100316214154.31368.qmail@plesk08.hospedagemdesites.ws> An HTML attachment was scrubbed... URL: From contato at inclua.co.cc Tue Mar 16 16:36:53 2010 From: contato at inclua.co.cc (Seletiva ELITE MODEL 2010) Date: Tue, 16 Mar 2010 16:36:53 GMT Subject: Torne-se uma TOP MODEL, utrace-devel@redhat.com Message-ID: An HTML attachment was scrubbed... URL: From envoi at drp55.com Thu Mar 18 08:20:18 2010 From: envoi at drp55.com (Celine de Fizeo) Date: Thu, 18 Mar 2010 10:20:18 +0200 Subject: =?UTF-8?Q?Localisez_vos_v=C3=A9hicules_en_temps_reel.?= Message-ID: <6784de2b667b7d745ddb57237b97adaa@zone-news.co.cc> An HTML attachment was scrubbed... URL: From mldireto at tudoemoferta.com.br Thu Mar 18 22:36:06 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Thu, 18 Mar 2010 19:36:06 -0300 Subject: Hora do Planeta 2010 - Vamos Fazer Historia Juntos conta o Aquecimento Global Message-ID: <850cc3e2f8cc7d591d6ff8ea001a1e65@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From news at standalgarve.com Fri Mar 19 14:55:19 2010 From: news at standalgarve.com (BBC ENGLISH) Date: Fri, 19 Mar 2010 14:55:19 +0000 Subject: Aprenda comodamente ingl=?iso-8859-1?b?6g==?=s a partir de sua casa! Message-ID: <20100319145518.8D71E4042E@server7.nortenet.pt> a melhor institui??o mundial de ensino Aprenda ingl?s a partir de sua casa com o curso NEW BBC ENGLISHMultimedia System m?todo desenvolvido pela 60 milh?es de alunos em 17 pa?ses 77 anos de experi?ncia 87% de efic?cia nos exames da Universidade de Cambridge Aproveite ascondi??es especiais de financiamento! INSCREVA-SE J?! NOTA INFORMATIVA: O presente email destina-se ?nica e exclusivamente a informar potenciais utilizadores e n?o pode ser considerado SPAM. De acordo com a legisla??o internacional que regulamenta o correio electr?nico, "o email n?o pode ser? ser considerado SPAM quando incluir uma forma do receptor ser removido da lista do emissor". Para deixar de receber estas ofertas no seu e-mail clicar aqui -------------- next part -------------- An HTML attachment was scrubbed... URL: From cornel at upload-ro.ro Sat Mar 20 07:23:47 2010 From: cornel at upload-ro.ro (cornel) Date: Sat, 20 Mar 2010 00:23:47 -0700 Subject: cursuri online Message-ID: <20100319.HPDJQCQTFLRYHMYD@upload-ro.ro> An HTML attachment was scrubbed... URL: From gf at qaw.com Sun Mar 21 08:56:17 2010 From: gf at qaw.com (=?GB2312?B?xeDRtQ==?=) Date: Sun, 21 Mar 2010 16:56:17 +0800 Subject: =?GB2312?B?QTN1dHJhY2UtZGV2ZWzX3L6twO26y9DEssbO8bncwO0=?= Message-ID: <201003210856.o2L8uGcl014797@mx1.redhat.com> utrace-devel????????? ?????2010?3?26?27?28? ?? ???????????????????????? ??????????????????????? ??????????????????????????????????????????? ?????4500?/????????????????????? ????????????500?/?????????1000?/????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.comrom carlacrisviana at bol.com.br Mon Mar 22 06:28:30 2010 From: carlacrisviana at bol.com.br (Carla Cristina) Date: Mon, 22 Mar 2010 06:28:30 GMT Subject: Agora Você Pode!!! Message-ID: <201003220728.o2M7SWhj008437@mx1.redhat.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- Agora Voc? Pode! - Abrir conta em bancos - Parcelar suas compras (Cart?o de Cr?dito, Cheque Pr?-datado, Credi?rio, Duplicata...) - Conseguir empr?stimo - Financiar bens e servi?os (Autom?veis, Im?veis, Viagens...) - Postular novo emprego (Isento de Restri??es)... Limpe seu nome na SERASA e no SPC sem pagar as d?vidas! Quer saber como? Envie um e-mail para: meu.credito.aprovado at gmail.com * Caso voc? n?o deseje mais receber nenhum tipo de contato nosso, retorne este e-mail com o assunto REMOVER. Nota: Esta mensagem ? enviada de acordo com o Guia de Boas Maneiras da ABEMD (Associa??o Brasileira de MKT Direto) e est? de acordo com a nova legisla??o sobre correio Eletr?nico, Se??o 301, Par?grafo (a) (2) (c) Descreto S. 1618, T?tulo Terceiro aprovado pelo 105? Congresso Base das Normativas Internacionais sobre o SPAM. Este e-mail n?o poder? ser considerado SPAM quando inclua uma forma de ser removido. From wriggler at diosynth.com.br Mon Mar 22 16:42:17 2010 From: wriggler at diosynth.com.br (Coup Seckington) Date: Mon, 22 Mar 2010 17:42:17 +0100 Subject: S of the labouring poor idle for months together. Incalculab Message-ID: <0352E94D2915A935170C41F80303FF887B1E57@diosynth.com.br> Telligence--a war is _inevitable_. You may also rely on my conjecture--that it will be the most desperate war which Europe has yet seen. One that will break up _foundations_, as well as break down superstructures; not a war of politics but of principles; not a war for conquest but for ruin. All the treasuries of Europe will be bankrupt within a twelvemonth of its commencement; unless England shall become their banker. This will be the harvest of the men of money.--It is unfortunate that your money is all lodged for your commission; otherwise, in the course of a few operations, you might make cent per cent, which I propose to do. _Apropos_ of commissions. I had nearly omitted, in my own family anxieties, to mention the object for which I began my letter. I have _failed_ in arranging the affair of your commission! This was not for want of zeal. But the prospect of a war has deranged and inflamed every thing. The young nobility have actually besieged the Horse-guards. All the weight of the aristocracy has pressed upon the minister, and minor influence has been driven from the field. The spirit is too gallant a one to be blamed;--and yet--ar -------------- next part -------------- A non-text attachment was scrubbed... Name: kegler.jpg Type: image/bmp Size: 10189 bytes Desc: not available URL: From yues at kawer.com Tue Mar 23 15:14:51 2010 From: yues at kawer.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Tue, 23 Mar 2010 15:14:51 -0000 Subject: =?GB2312?B?Qzh1dHJhY2UtZGV2ZWzQ0NX+uaTX982zs++53MDt?= Message-ID: <201003231514.o2NFEMWN030425@mx1.redhat.com> utrace-devel???????????????? ??????????2010??3??26-27?? ???? ??????????2010??4??9-10?? ???? ??????????2010??4??24-25?? ???? ??????????2010??5??29-30?? ???? ??????????2010??6??25-26?? ???? ?????????????????????????????????????????????????? ??????????????????-????????-????????-????????-?????????? ?? ????2500??/?? ?????????????????????????????????????????? ???????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.commailrom sophiebush123 at gmail.com Wed Mar 24 00:35:15 2010 From: sophiebush123 at gmail.com (Sophiebush) Date: Wed, 24 Mar 2010 00:35:15 +0000 Subject: PM methodology is static. Do you know what is needed for dynamic execution? Message-ID: <201003231635.o2NGZ1YX013766@mx1.redhat.com> People must learn methodology in order to plan, but they must also realize the methods used are static. They need efficient, dynamic information in order to execute effectively. Information generated by traditional PM platforms requires manual updates and is static in nature. As a result maintaining current, accurate and complete project information is impossible as project size and complexity grows. People will become defensive when accountability is unclear and communication breakdowns lead to missed deadlines and dissatisfied stakeholders. SimplePM is a modern PM software platform that helps project teams plan, execute and communicate with ease and precision. Please see the booklet: http://www.wtspm.com/product/8thManageSimplePMBooklet.pdf Or contact? www.wtspm.com sales at wtspm.com Americas: (1) 201-882-2447 EMEA and Asia Pac:?0852?34980609 Unsubscribe? http://www.wtspm.com/client/mail.php -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SimplePM.jpg Type: image/jpeg Size: 25808 bytes Desc: not available URL: From info at voidcreationsnewsletter.com Wed Mar 24 06:13:33 2010 From: info at voidcreationsnewsletter.com (VOID CREATIONS) Date: Wed, 24 Mar 2010 06:13:33 +0000 Subject: =?UTF-8?Q?Retro_Sessions:_80's_vs_Vintage_-_Sexta_26_Mar=C3=A7o_@_Santiag?= =?UTF-8?Q?o_Alquimista?= Message-ID: <7e89256c7b25125fe2da47bbe955fdaf@www.voidcreationsnewsletter.com> Se n?o visualizares esta p?gina correctamente, clica aqui Adiciona-nos ? tua safe-list, para garantir que recebes sempre a info dos nossos eventos. ** ** **Apresenta:** A M?quina do Tempo da Void Creations est? de volta ao planeta Terra para mais uma fant?stica Retro Session! A sua primeira paragem ? uma celebra??o aos m?ticos Alphaville (e quem n?o se lembra do mega-hit 'Forever Young' ou 'Big in Japan'?), com a melhor afterparty do concerto que se pode imaginar. Na mesma noite, atravessa quatro d?cadas de m?sica, dos anos 50 aos anos 80 e chega ao Santiago Alquimista para uma festa cheia de brilho, glamour, rock, pop, garage pop, synth-pop, boongaloo, ska, mod jazz, rocksteady, aquelas m?siquinhas italianas todas sexys dos anos 50 e o diabo a quatro!!! You wanna dance? Yeaaaah!!!! A nossa nave inter-gal?ctica regressa trazendo uma menina muito sexy e muito rock 'n' roll - com a actua??o dos irreverentes Badlovers & Hysteria Iberika , juntamente com o seu novo videoclipe ?Venus Xpress? - e as sonoridades mais alternativas dos Uni_Form , banda de Lisboa que tem dado cartas fortes nos circuitos mais indie do pa?s. Esta viagem no tempo, desde os anos 50 at? aos dias de hoje, ser? complementada com os dj sets da glamourosa parelha Lady Bambi e Jos? M. - especializados na ?m?sica do diabo? - e dos muito animados Dance Craze , que lan?am uma Mod Dance Party onde ningu?m vai conseguir parar de dan?ar! Na pista grande - tamb?m a celebrar o concerto dos Alphaville - a M?quina do Tempo estar? a cargo do j? bem conhecido tripulante Ant?nio Vibra??es e da dupla LorenzFactor , que estacionam nos anos 80?s, quando todos os crimes da moda eram permitidos e cujas can??es ainda adoramos cantar. Revisitamos quatro d?cadas - dos 50 aos 80?s - e s? se pode esperar mesmo uma viagem de arromba. Tirem a roupa do ba?, sacudam-lhe a poeira (ou ent?o, dirijam-se ? Casa do Carnaval e entreguem o flyer da festa, pois ter?o 10% de desconto em todos os artigos) preparem-se para um terramoto de m?sica e festa!!! Esta??es e apeadeiros da nossa M?quina do Tempo: Bandas Uni_Form Nascidos em Lisboa em 2006, os Uni_Form s?o compostos por Umbigu no baixo, Exploding Boy na percuss?o e Vox Machina na voz e na guitarra. Para as actua??es ao vivo, trazem Fakeplastic Bastard no sintetizadores. Com uma forte aposta nas sonoridades mais dark, re?nem influ?ncias como Joy Division, Bauhaus, White Lies, Pixies, Depeche Mode, Nine Inch Nails e muito mais e lan?aram recentemente o ?lbum ?Mirrors?. Descrevem-se como uma fus?o de luzes e cores: algumas mais escuras, outras mais claras, algumas mais estranhas do que outras e algumas mais simples. UNI_FORM _ PROMOCIONAL VIDEO FOR SINGLE SHADOWS FROM MIRRORS AL UNI_FORM NEW SINGLE SHADOWS OUT NOW | MySpace Music Videos Badlovers & Hysteria Iberika J? bem conhecidos do p?blico portugu?s, os Badlovers e Hysteria Iberika voltam a unir-se ? Void para mais um concerto electrizante. Um franc?s, um portugu?s, uma espanhola e uma caixa de ritmos s?o mais do que suficientes para um espect?culo de rock n roll selvagem e cheio de energia, a que ningu?m fica indiferente. O trio d? ao p?blico o seu esp?rito rockeiro, de roupagem electro e doses de muita pervers?o e fantasia, aproveitando tamb?m para divulgar a estreia do seu novo videoclipe ?Venus Xpress?. DJ?S Sala Hamlet: Lady Bambi & Jos? M. A nossa pista vintage estar? a cargo de Lady Bambi e de Jos? M., ambos versados na arte do rock n? roll. Lady Bambi veio de Inglaterra e instalou-se em Lisboa, inicialmente, como co-fundadora do Wonderland Club, fazendo agora parte do Club Rouge e especializando-se em tocar ?a m?sica do Diabo?? Jos? M., sempre vestido a rigor, ? um fervoroso entusiasta da cena mod europeia e coleccionador ?vido de vinis dos anos 50 e 60. Dance Craze - Mod Dance Party N?o participar nas festas Dance Craze ? viver adormecido!, afirma mod64, nosso selector que revive a cena mod atrav?s dos sons de 60's Soul, Mod Jazz, Northern Soul, Ska, Rocksteady, New Wave, Mod Revival, PowerPop, Rhythm & Blues e muito mais. Na verdade, Dance Craze re?ne mod64 e Pedro 42 e brinda o p?blico com todo o estilo e alegria desta onda que tem cada vez mais adeptos em Portugal. A palavra de ordem ? a dan?a - o mote da noite ? a anima??o! Para dan?ar at? cair para o lado. Sala Santiago: Ant?nio Vibra??es A nossa prata da casa - DJ Ant?nio Vibra??es a.k.a. Serotonin - traz os hits mais dan??veis da d?cada de 80, de Donna Sommer a Kyle Minogue, com algumas incurs?es ?s sonoridades mais obscuras da ?poca e muitas surpresas animadas. LorenzFactor Este colectivo notabiliza-se pelo ecletismo e humor dos seus sets e pela energia transmitida detr?s dos decks, tornando um simples dj set num festivo concerto rock. As suas diferentes propostas de sets e live acts torna-os uma escolha alternativa para qualquer uma noite em que o lema seja Party on! A M?quina do Tempo est? pronta a arrancar e carburada pelo mais absoluto del?rio...! A ?ltima viagem abarrotou o Santiago com mais de mil pessoas e criou uma festa inesquec?vel! Nesta noite, abrem-se as duas pistas para um ser?o realmente m?gico. A foto da ?ltima festa: Estamos s? ? espera da tua presen?a para partir novamente nesta viagem inter-gal?ctica-mega-especial... Aparece e confirma j? a tua presen?a no evento atrav?s do Facebook!!! Evento Facebook Local: Santiago Alquimista Hora: A partir das 21h00 at? ?s.... Entrada: LIVRE at? ?s 21h30 !! Morada: Rua de Santiago, n?19, 110-493 Lisboa Metro: Rossio / Baixa-Chiado / Martim Moniz Autocarros: 37 El?ctrico: 28 Comboio: Restauradores Info: www.voidcreations.org Mapa: Ver mapa maior Contactos: Void Creations E-mail: info at voidcreations.org @ - ? favor divulgar - ** ** -- Para RE-ENVIAR / To FORWARD - http://www.voidcreationsnewsletter.com/phplist/?p=forward&uid=8796d6f78d5efbb8958965a0e70ab9c8&mid=41 Para REMOVER / To REMOVE - http://www.voidcreationsnewsletter.com/phplist/?p=unsubscribe&uid=8796d6f78d5efbb8958965a0e70ab9c8 Para MODIFICAR / To MODIFY - http://www.voidcreationsnewsletter.com/phplist/?p=preferences&uid=8796d6f78d5efbb8958965a0e70ab9c8 -- Powered by PHPlist, www.phplist.com -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: powerphplist.png Type: image/png Size: 2408 bytes Desc: not available URL: From mldireto at tudoemoferta.com.br Wed Mar 24 12:13:38 2010 From: mldireto at tudoemoferta.com.br (TudoemOferta.com) Date: Wed, 24 Mar 2010 09:13:38 -0300 Subject: Promo Pascoa. A sua Pascoa Repleta de Promocoes Message-ID: <54fa62e684dc946d2d00df33001fd1d7@tudoemoferta.com.br> An HTML attachment was scrubbed... URL: From guys at sm.no Wed Mar 24 20:46:30 2010 From: guys at sm.no (Vanderwerf) Date: Wed, 24 Mar 2010 22:46:30 +0200 Subject: seriously and with a face from wh Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: encarnalize.bmp Type: image/bmp Size: 11659 bytes Desc: not available URL: From hvs at jf.com Thu Mar 25 15:17:57 2010 From: hvs at jf.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Thu, 25 Mar 2010 15:17:57 -0000 Subject: =?GB2312?B?RDN1dHJhY2UtZGV2ZWy6o7nYysLO8bSmwO28vMfJ?= Message-ID: <201003251517.o2PFHTQI014927@mx1.redhat.com> utrace-devel2010??????????????????????? ?????2010?3?27-28? ?? ? ??3000?,????????????????????? ???????????????????????????????????? ?????020-80560638?020-85917945?????????????????chinammc2010 at 126.comerom cdirx at mail.pt Fri Mar 26 10:11:50 2010 From: cdirx at mail.pt (oDirectorio.com) Date: Fri, 26 Mar 2010 10:11:50 -0000 Subject: =?windows-1252?Q?FA=C7A_NEG=D3CIOS_NA_CATALUNHA?= Message-ID: An HTML attachment was scrubbed... URL: From krt at khg.com Fri Mar 26 16:33:49 2010 From: krt at khg.com (=?GB2312?B?x+vXqs/gudjIy9Sx?=) Date: Sat, 27 Mar 2010 00:33:49 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsxvPStVBQVLywRXhjZWzTptPD?= Message-ID: <201003261633.o2QGX5lH006655@mx1.redhat.com> utrace-devel???????????PPT+Excel????? ?????2010?4?10-11? ?? ?????2010?4?17-18? ?? ?????2010?4?24-25? ?? ? ??2600?/????????????????? ??????????????????????????????????? ?????020-80560638?020-85917945 ?????????????????chinammc2010 at 126.com??? ----------------------------------------------------------------------------------------------- ????: ????????????? PPT?????????PPT????????????? ????????BladeOffice?????????????????? ????????????Excle-PPT??????????BladeOffice?????????????????BladeOffice??????????????????? ????: ???????????? ????????????? 1.??????????????????????????? 2.????????????500???????????????? 3.?????????????????????????????????? 4.??????????????????????????????? ??????????? 1.?????????????????? 2.??????????????????150????????? 3.???????????????????????????? 3-D ??*?????? 4.???????????????? ?????????????? 1.??????????????? 2.?????/??????????? 3.????????????????? 4.????????40M?????6M 5.????????? ????????SmartArt???? 1.??????????????? 2.SmartART?????? 3.???????SmartART??? 4.????SmartART???SmartArt?????????????????SmartArt ???????????????? 1.??????????????????? 2.????????? 3.??????????5???? 4.?????????? 5.??????????????? 6.????????????????? ??????????????????? 1.??????????? 2.??????????????????? 3.?????????? 4.???????? 5.??????? 6.????? 7.?????????PPT? 8.???????????? ??????color see see???????? 1.??????Powerpoint????? 2.???CI????????? 3.????????? 4.????????? 5.??????????????? 6.??????????? ???Excel??????? ????????? 1.??????????????? 2.???????????? 3.??????????????? ?????? 1.????Bladeoffice??????????? 2.??????? 3.??????????????? 4.????????????????Excel?? 5.?????????????? ?????????????? 1.??IF??????????? 2.?????????????????? 3.??????????? ?????? 1.?????????? 2.?????? 3.??????????????? 4.????????????? ????????? 1.?????????#DIV?#NAME?#N/A 2.??????#?????????? ???????????? 1.????????????????? 2.???????????????????? 3.???????????????? 4.?????????????????? 5.?????????????? ???????? 1.??????????????? 2.???????????? 3.??????????? 4.?????????????? 5.????????? ?????? 1.????????????????????? 2.??????????? 3.????????? 4.????????????? ????????????? 1.??????????? 2.??????????? 3.Excel2007?????? 4.??????????????? 5.??????????????? ??????????? 1.??????? 2.??????? 3.?????? 4.?????? 5.????????????????????? 6.?????????? a)????? b)????? c)??? d)????? 7.???????????? 8.??????????????????????? 9.????????????????? ?????????? 1.??????????? 2.???????????????? 3.?????????? ??????? ------------------------------------------------------------------------ ????: ????: ??????IPMA???????MCSE?MCDBA?????????????? ??????????????????????????????????????????OA? ERP?BI???????????????????????????????????????? ??????????????????? ??????????? ??????????????? ?????????????? ???????? ?????????????? ?Excel?Access?POWERPOINT?????????? ???????????????????????????????????????????? ??????????????????????????????????????????? ??????????????????????????????????????????? ??????????????????????????????????????????? ??????????????????????????????????????????? ??????????????????????????????????????????? ??????? ------------------------------------------------------------------------------- ????????????PPT+Excelrom disrate at cafe-rosen.de Fri Mar 26 17:16:47 2010 From: disrate at cafe-rosen.de (Ollig Cecena) Date: Fri, 26 Mar 2010 19:16:47 +0200 Subject: ion of the Message-ID: <4BACEABB.7070402@cafe-rosen.de> Led instead of crippling him." "He's a bad lot," sighed the general. "Wing won't fly away from Kennedy, I fancy." "Not if there's a shot left in his belt," said Blake. "And Ray is officer-of-the-day. There'll be no napping on guard this night." At the barred aperture that served for window on the southward front, a dark face peered forth in malignant hate as the speakers strode by. But it shrank back, when the sentry once more tossed his carbine to the shoulder, and briskly trudged beneath the bars. Six Indians shared that prison room, four of their number destined to exile in the distant East,--to years, perhaps, within the casemates of a seaboard fort--the last place on earth for a son of the warlike Sioux. "They know their fate, I understand," said Blake, as the general moved on again. "Oh, yes. Their agent and others have been here with Indian Bureau orders, permitting them to s -------------- next part -------------- A non-text attachment was scrubbed... Name: peroxisome.zip Type: application/octet-stream Size: 393 bytes Desc: not available URL: From rubberlike at lbemuoil.com Sat Mar 27 08:20:40 2010 From: rubberlike at lbemuoil.com (Paskow Korpela) Date: Sat, 27 Mar 2010 09:20:40 +0100 Subject: Ew wrapper in which she is even more cold and haughty Message-ID: <4BADBF1C.8030706@lbemuoil.com> Of Africa. We forgot entirely we had been twenty days at sea and remembered only that we were ten miles from Japan, only as far as New Bedford is from Marion. We are at anchor now, waiting to go in in the morning. Were it not for war we could go in now but we must wait to be piloted -------------- next part -------------- A non-text attachment was scrubbed... Name: narrational.zip Type: application/octet-stream Size: 12232 bytes Desc: not available URL: From nbz at eut.com Sat Mar 27 08:36:09 2010 From: nbz at eut.com (=?GB2312?B?16rQ6MfzyMvUsQ==?=) Date: Sat, 27 Mar 2010 16:36:09 +0800 Subject: =?GB2312?B?dXRyYWNlLWRldmVsuNrOu7fWzvbT69C9s+rJ6LzG?= Message-ID: <201003270836.o2R8a6mN028942@mx1.redhat.com> ?????????????? ?????2010?3?19-21? ?? ?????2010?3?26-28? ?? ?????2010?4?10-12? ?? ?????3600/?????????????????? ????????800?/?(??????????????) ?????020-80560638?020-85917945 ???????????????chinammc21 at 126.com??? ---------------------------------------------------------------------------------- ????: ????????????????????????????? ????????????????????????????????? ?????????????????????????????????????? ??????????????????????? ?????????????????????????????????????????????? ????????????????????????????????????? ????????? ????????? ?????????????????????????????????????????????? ??????????????????????????????????? ------------------------------------------------------------------------------------------ ????: 1.????????????????????????????????????????? 2.????????????????????????????????????????????? 3.?????????????????????????????????????????????????? ????: ????????????? ??????????????? 1???????????? ??????????????? 2??????????????? ??????????????????? ?????????????????????? ???????? 1?????????? ??????????????????? 2?????????? ???????????????? ???????????????????????? 3?????????? ??????????????????????? ???????? 1????? ????????????? ??????????? 2??????????????? ??????????????? ????????????????????? ???????????????????? ???????????? 3????????????? ????????????????????? ??????????????????? ?????????????? ??????????????? ????????????? ?????????????????????????? ?????Y?????????????????? ????????????????????????? ???????????? 1??????????? ???????????????? 2??????????? 3?????????? ???????A????????????????? ????????????? 1?????????? ?????????????????????? 2?????????? ????????????????????? ??????????? ????????????????????? ????????????? ????????????? 3???????????? ?????????????????????? 4???/??????????? ???????????????????? ?????????????????????????? ???????? 1??????????? 2?????????? 3?????????????? ???????????IT???? ??????????????? 1??????? 2????????? ?????????????????? 3??????? 4?????????????? ?????????????? ???????? 1?????????????????? ?????????????????????? ????????????????? 2???????????????? ?????????????????????? 3???????????????? ????3P???? 1?????????????????? ???????????????? 2?????????? ??????????????? ????????????? 1?????????????? ???????????????? 2?????????? ?????????????????????????????? ????????????? 1????????????? ?????H??????????? 2????????????? ?????A???????????????? ???????????? 1???????????????? ?????????????? ???????????????? ?????B?????????????? 2???????????? ??????C??????????????? 3?????????? 4?????????? 5?????????? -------------------------------------------------------------------------- ?????????????GEC???? ??????????PTT??????? ?????????????????? ???????????????????????? ????????????????????????????????????????????? ????????????????????????????????????????????? ????????????????????????????????????????? ????????????????????????????????????????????? ?????????????????????????????????????????????? ???????HR????????????????????????????????????? ??50????????????????????????????????????????????? ??????????????????????? ?????????????????????????????????????????????? ?????????????????????????????????????? ?????????????????????????????????????????????????? ????????????????????? ----------------------------------------------------------------------------------------- ?????????????????????????020-62351156? ???????? ??? ??? ? ? ? ??_______________________________________________________ ????______________??:________________??:________________ ???______________ ? ? ? ?:_________? ? ? ??_________? ? ? ??___________? ? ? ??____________? ? ? ??_____________ ? ? ??___________? ? ? ??____________? ? ? ??_____________ ? ? ??___________? ? ? ??____________? ? ? ??_____________ ? ? ??___________? ? ? ??____________? ? ? ??_____________ ?????????????? ?1??? ?2??? ?3??? ==================================================================================== ??:????????????????????,???????020-80560638??? From news at maisservicos.com Sat Mar 27 20:53:12 2010 From: news at maisservicos.com (ESINE) Date: Sat, 27 Mar 2010 20:53:12 +0000 Subject: Obtenha um certificado com futuro! Message-ID: <20100327205322.8D4B43F6BC@server7.nortenet.pt> T?cnico em Preven??o de Riscos Laborais Todas as empresas necessitam de um plano de riscos laborais Obtenha um certificado com futuro Acesso 24 horas ao nosso campus virtual Com completo material did?ctico Um certificado que avaliza os seus conhecimentos Em menos de 6 meses! CLICK J? NOTA INFORMATIVA: O presente email destina-se ?nica e exclusivamente a informar potenciais utilizadores e n?o pode ser considerado SPAM. De acordo com a legisla??o internacional que regulamenta o correio electr?nico, "o email n?o pode ser? ser considerado SPAM quando incluir uma forma do receptor ser removido da lista do emissor". Para deixar de receber estas ofertas no seu e-mail clicar aqui -------------- next part -------------- An HTML attachment was scrubbed... URL: From proselytized at dozer.co.za Sun Mar 28 07:59:08 2010 From: proselytized at dozer.co.za (Plamondon Oreilly) Date: Sun, 28 Mar 2010 09:59:08 +0200 Subject: s Mead Message-ID: <4BAF0B99.5080404@dozer.co.za> Nd let's hear." "I can't speak on those lines, Tabitha," replied her brother-in-law. "Collet is no wise shiftless, for she hath brought up her children in a good and godly fashion, the which a woman with fewer brains than lads should ne'er have done. But I verily assent with you that we should do something to help her. And first--who will take to Sens Bradbridge's maids?" "I will, if none else wants 'em. But they'll not be pampered and stuffed with cates, and lie on down beds, and do nought, if they dwell with me. I shall learn 'em to fare hard and be useful, I can tell you." "Whether of the twain call you them syllabubs and custard pies as you set afore us when we supped last with you, Mistress Hall?" quietly asked Ursula Final. "Seemed to me I could put up with hard fare o' that sort metely well." "Don't be a goose, Ursula. They've got to keep their hands in, a-cooking, haven't they? and when things be made, you can't waste 'em nor give 'em the pigs. They've got to be ate, haven't they?" demanded Mrs Tabitha, in tones of battle; and Ursula subsided without attempting a defence. "What say -------------- next part -------------- A non-text attachment was scrubbed... Name: taillights.zip Type: application/octet-stream Size: 12824 bytes Desc: not available URL: From asqalanstar at eim.ae Mon Mar 29 03:56:38 2010 From: asqalanstar at eim.ae (gszwuwxcn) Date: Mon, 29 Mar 2010 11:56:38 +0800 Subject: =?gb2312?B?06Ygvey43yDQo7HPX9K1yfq089DNuakg0Oi8+8Pmu+F1dHJhY2UtZGV2ZWw=?= Message-ID: <20100329115643028214@eim.ae> utrace-develjdzph at 126.com --------------------------------------------------------------------------------------- ??????: ???? ?? ???? ???? ?? A: ????: ????: 1??????? (???? 1.5m x1.00m) 1? 2??? (1.2m x 1.2m) 1? 3??? (????) 2? B: ????: ????: 1?????????? (??????) 1?? 2????? (????????25??????????????????? ??????????????????) 1?? 3????? (???????????????????50?) 1?? C: ????: ????: 1????? (???????????6?6(1/24?) 1? 2?HR?? (???????????HR??????) 1? 3????? (????????????????????????????????) ?? ??????: ???? ???? ?? ???? ???? 1????: ???? A 1100 ?? 2????????: ????+???? A+B 1280 ??? 3????????: ????+????+????+???? A+B+C 1500 ????? --------------------------------------------------------------------------------------- ? ? ? 2 0 1 0 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??jdzph at 126.com ? ? ? ??020-83664364 ?????_____________________________________________ ????______________ ? ??__________________________ ? ??__________________________________ ? ??__________________________ E?mail?_________________________________ ????????: ?????????? (? ??_______ ?) (????) ? ????????? (? ??_______ ?) ? ????????? (? ??_______ ?) ?? ? ? ?: ?????????? (1100?) ? ????????? (1280?) ? ????????? (1500?) ?? ? ? ?: ??? ?? ? ?? ? ? ?: RMB ____________? ?????????????????????????????????????? ? ? ??_____________________________________________________ ?????????????_________________________________________ ??????020-83692894 18978178659 ? ? ? ??????????????????? ??????????????? ? ??4400 1580 1070 5900 0241 ????? 1??????????????????????????2010??????????????? ??????????????????????????? 2????????????????????????????? 3??????????????????????????????????????????? 4??????????????????????????? 5?????????????2010?4?17?10:00-16:00????????9:30??? --------------------------------------------------------------------------------------- utrace-devel at redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From remiom94 at hotmail.fr Sat Mar 27 01:13:53 2010 From: remiom94 at hotmail.fr (remiom94) Date: Sat, 27 Mar 2010 09:13:53 +0800 Subject: =?GB2312?B?uavLvrLGzvEvuLrU8MjL?= Message-ID: ?????? ?????????????????????????????????????????????????? ???????????????????????????????????????????????? ?????????????????????????? ??????????13751003098 ?????????????? ?????????????????????????? ???????????? From wz at tvgx.com Tue Mar 30 03:47:59 2010 From: wz at tvgx.com (=?GB2312?B?x+vXqtDox/PIy9Sx?=) Date: Tue, 30 Mar 2010 03:47:59 -0000 Subject: =?GB2312?B?RDd1dHJhY2UtZGV2ZWy6z82sudzA7dPrt+fP1Q==?= Message-ID: <201003300347.o2U3lOCo006751@mx1.redhat.com> utrace-devel?????????????????????? ??????????2010??4??10-11?? ???? ??????????2010??4??15-16?? ???? ??????????2010??4??24-25?? ???? ???????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ????????????????2800??/?????????????????????????????????????????????? ??????????020-80560638??020-85917945 ??????????????????????????????????chinammc2010 at 126.comhomasrom aimex at dhaka.net Tue Mar 30 06:08:27 2010 From: aimex at dhaka.net (okrddx) Date: Tue, 30 Mar 2010 14:08:27 +0800 Subject: =?gb2312?B?zOEguakxMDe97CC54yC9u7vhX9W5IM67dXRyYWNlLWRldmVs?= Message-ID: <20100330140836168266@dhaka.net> utrace-devel at redhat.com ---------------------------------------------------------------------------------- ********************** 2 0 1 0 ? ? ? ? ? ? ********************** -------------- ? ?(?) ? ? ? --- ? ? ? ? ? ? ? ? -------------- ================================================================================== ? ? ? ? ? ? ? ? ? ??? ? ? ? ??? ? ? 1 9 5 7? ? ??? ? ? ? ? ? ? ? ? ? ???? ? ? ? ? ? ? ? ??? ? ? ? ? ? ??? ? ? ? (? ? ? ? ? ? ? )?? ? ? ??? ? 50000? ? ? ? ? ? ? ??? ? ? ? ? ? ??? ? ? ? ? ??? ? ? ? 21? ? 26? ? ???? ? ?? ? ? ? ? ? ? ? ? ? ? ? ?? 2 0 1 0 ? ? ? ??? ? 107? ? ? ? ? ? ? ? ? ? ???? ? ? ? ?? ? ? ? ? ? ??? ? ? ? ? ?? ? ? ? ? ? ? ? ???? ? ?? ??? ? ?? ??? ?? ? ? ? ? ? ? ? ??2 0 1 0 ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? ? ? ? ? ? ? ? ?, ? ? ? ? ? ? ? ? ?? ================================================================================== ??? ? ? ? ? ? ?: ? 1 0 7 ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ?? ?? ?? ?? ================================================================================== ? ? ? ? ? ? ?: 2 0 1 0 ? 4?15-19? (4?20?22?? ? ? ?) ? ? ? ? ? ??? ? ? ? ??? ? ? ? ? ??? ? ? ? ? ? ? ?? ? ? ? ? ? ? ??? ? ? ??? ??? ??? ? ??? ? ??? ? ? ?? ? ? ? ? ? ? ??? ? ? ??? ? ? ??? ? ? ??? ??? ????? ? ??? ???? ? ? ? ? ? ?? ================================================================================= ? ? ? ? ? ? ?: 2 0 1 0 ? 4?23-27? (4?28?30?? ? ? ? ) ? ? ? ? ? ??? ? ? ??? ? ? ??? ? ? ? ??? ? ? ? ?? ? ? ? ??? ??? ? ? ? ??? ? ? ??? ? ? ??? ? ? ? ? ?? ? ? ? ??? ? ? ? ? ? ? ??? ??? ? ? ??? ? ? ??? ??? ? ?? ================================================================================= ? ? ? ? ? ? ?: 2 0 1 0 ? 5?1-5? ? ? ? ? ??? ??? ? ? ? ? ? ??? ??? ? ? ? ? ? ??? ? ? ? ? ? ? ??? ? ? ? ? ????? ??? ? ? ? ??? ? ? ??? ? ? ? ??? ? ? ??? ??? ? ? ? ? ??? ? ? ??? ??? ??? ? ? ? ? ? ? ? ? ? ? ?? ================================================================================= ??? ? ? ? ? ? ??? ? ? ? ? ?? ??? ? ? ? ? ? ? ? ? ? ?? ??? ? ? ? ? ? ? ? ? ?? -------------------------------------------------------------------------------- ??-=?-=?-=?-=?? ?-?-?=??020-31314389 ? 38250090 ?-??137 2529 2686 ? ? ? QQ?280791891 E-mail?h31314389 at 126.com ================================================================================= 2010-3-30 -------------- next part -------------- An HTML attachment was scrubbed... URL: From envoi at drp55.com Tue Mar 30 07:22:40 2010 From: envoi at drp55.com (Emilie de Devis Fizeo) Date: Tue, 30 Mar 2010 10:22:40 +0300 Subject: Votre devis SITE INTERNET et REFERENCEMENT Message-ID: <5b8d09c9d9c50011a768dad801508fb9@m4.privatemarkets7.com> An HTML attachment was scrubbed... URL: From kilburne.c at gmail.com Tue Mar 30 16:19:45 2010 From: kilburne.c at gmail.com (Kilburne, Carolyn) Date: Tue, 30 Mar 2010 11:19:45 -0500 Subject: your_web_site Message-ID: <56272291.20100330111945@gmail.com> Dear Utrace-devel With your permission, we would like to show you how to get better positioning and more traffic on the web. If you are interested, reply us and we'll do a complimentary no cost site assessment. Sincerely, Carolyn Kilburne Key Media utrace-devel at redhat.com 30/03/2010 From aspirant at kubus-mv.de Tue Mar 30 20:15:25 2010 From: aspirant at kubus-mv.de (Kimbery Penuel) Date: Tue, 30 Mar 2010 22:15:25 +0200 Subject: steel with elasti Message-ID: <4BB258DA.2020308@kubus-mv.de> Ce in not securing my store in a safe way. I had already thought of doing so, but I never imagined these creatures could make an entry from behind, and I knew that the web of cloth completely shut them out on the inside. Alas! it was now too late; regrets were idle; and, following out that instinct which prompts us to preserve life as long as we can, I transferred the fragments from the box to my little shelf inside; and then, making all tight as before, I lay down to reflect upon my situation, rendered gloomier than ever by this unexpected misfortune. CHAPTER FORTY THREE. SEARCH AFTER ANOTHER B -------------- next part -------------- A non-text attachment was scrubbed... Name: humanitarian.zip Type: application/octet-stream Size: 12704 bytes Desc: not available URL: From FELIPEP at 21STCENTURYENT.COM Wed Mar 31 15:12:03 2010 From: FELIPEP at 21STCENTURYENT.COM (xknnctvj) Date: Wed, 31 Mar 2010 23:12:03 +0800 Subject: °ì =?gb2312?B?wO0xMDe97LnjIL27u+GyziC529ak?= Message-ID: <20100331231213288862@21STCENTURYENT.COM> --------------------------------------------------------------------- ? ? 1 0 7 ? (? ?) ? ? ? ? ? ?--? ? ? --------------------------------------------------------------------- ? ??????: 4?15?-19? (4?20??22?????) ????????????????????????????????? ?????????????????????????????????? ?????????????????????????????????? ??????? ? ??????: 4?23?-27? (4?28??30?????) ?????????????????????????????????? ?????????????????????????????????? ?????????????????????????? ? ??????: 5?1?-5?5? ????????????????????????????????? ?????????????????????????????????? ?????????????????????????????????? ???? --------------------------------------------------------------------- ? ??????,??????? 1?????2??4cmX5cm?????2?(????2/3?????????)? 2???????????2????????? 3????????????????????????? 4???2010?04?10??????????????? 5??????????(??????)????????? ???????????????????????????? --------------------------------------------------------------------- ? ???????????????? ? ???: ????????? ? ? ?: 3602 0879 0103 0905647 ---------------------------------------------------------------------- ? ? ? ? ? ?? 4? ?15? ?16? ?17? ?18? ?19??????(??:_____?) ?? 4? ?23? ?24? ?25? ?26? ?27??????(??:_____?) ?? 5? ?01? ?02? ?03? ?04? ?05??????(??:_____?) ? ? ? ?: ?400?/?/? (?______?_______?_____________?) --------------------------------------------------------------------- ? ??????: ?????????? ? ?: 020-31314389 ? ?: 020-38250255 ? ??13725292686 / 15323334865 ???: ? ? ? ? ? --------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: