From Tortore.Duilio at irts-lr.net  Sun Jan  4 02:02:29 2009
From: Tortore.Duilio at irts-lr.net (Qureshi.Zahir)
Date: Sun, 04 Jan 2009 02:02:29 +0000
Subject: Prescription free!
Message-ID: <57e901c96e10$1b683f11$c026c77b@[123.199.38.192]>

what is the differences? Vixen

Inhabit

Appeareth

Grandam

Recompense

Appeareth
 Lucina

Endanger

Vixen

Inhabit

Term

Recompense

Appeareth
 Cracked

Inhabit

Appeareth

Lucina

Inhabit

Sack
 
 all the solutions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090104/f30d5d88/attachment.htm>

From Adams.Matt at hypashots.com  Sun Jan  4 02:11:26 2009
From: Adams.Matt at hypashots.com (Ilkka.Jari)
Date: Sun, 04 Jan 2009 02:11:26 +0000
Subject: Britney Spears Favorite L.A. Hotel
Message-ID: <692501c96e11$1eea9c1e$c311694f@not-defined-pppoe.amur.ru>

which one is better than other Varied

Information

Adventure

Graciously

Ruins

Adventure
 Lowest

Expositor

Varied

Information

Thornier

Ruins

Adventure
 Curiously

Information

Adventure

Lowest

Information

Solyman
 
 read about it here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090104/1680e763/attachment.htm>

From correo.comercial at telefonica.net  Mon Jan  5 20:29:13 2009
From: correo.comercial at telefonica.net (Correo Comercial)
Date: Mon, 5 Jan 2009 21:29:13 +0100
Subject: Reg =?iso-8859-1?q?=E1?= late un caprichito...
Message-ID: <495FDCCF000D062D@ctsmtpout2.frontal.correo> (added by
	postmaster@telefonica.net)

PUBLI
Publicidad Adistech Europe, S.L.
 
*Por la compra de dos unidades o m?s, precios especiales!!! Cons?ltanos al 93 481 4162.
                                                      Adistech Europe, S.L.
                                                       adistech.europesl at gmail.com
PD: Para cualquier consulta, puedes ponerte en contacto con nuestro equipo al tel. (+34) 93 481 4162 
Si deseas darte de baja de nuestras listas de distribuciones, por favor pulsa aqu?  (poniendo en el asunto la palabra "baja").
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090105/9fa7456c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Oferta LCD.jpg
Type: image/jpeg
Size: 84537 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090105/9fa7456c/attachment.jpg>

From amandinha at lives.com  Tue Jan  6 01:35:36 2009
From: amandinha at lives.com (amandinha at lives.com)
Date: Mon, 5 Jan 2009 23:35:36 -0200
Subject: oi
Message-ID: <20090106013531.7380820000B8@manticoke.hst.terra.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090105/6454fd56/attachment.htm>

From confirm-s2-qnvruz3ybnpkja4bmab5sdqoh2zmayts-utrace-devel=redhat.com at yahoogrupos.com.br  Tue Jan  6 19:55:46 2009
From: confirm-s2-qnvruz3ybnpkja4bmab5sdqoh2zmayts-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos)
Date: 6 Jan 2009 19:55:46 -0000
Subject: Confirma =?iso-8859-1?q?=E7=E3?= o de pedido para entrar no grupo
	de_amigo_para_amigo
Message-ID: <1231271746.17.82765.w114@yahoogrupos.com.br>


Ol? utrace-devel at redhat.com,

Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo 
do Yahoo! Grupos, um servi?o de comunidades online gratuito e 
super f?cil de usar.

Este pedido expirar? em 7 dias.

PARA ENTRAR NESTE GRUPO: 

1) V? para o site do Yahoo! Grupos clicando neste link:

   http://br.groups.yahoo.com/i?i=qnvruz3ybnpkja4bmab5sdqoh2zmayts&e=utrace-devel%40redhat%2Ecom 

  (Se n?o funcionar, use os comandos para cortar e colar o link acima na
   barra de endere?o do seu navegador.)

-OU-

2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar",
   no seu programa de e-mail.

Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo
de_amigo_para_amigo, por favor, ignore esta mensagem.

Sauda??es,

Atendimento ao usu?rio do Yahoo! Grupos 


O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html
 

From confirm-s2-xax5jes2z4w41ohf4lfq2iqvyol5xkky-utrace-devel=redhat.com at yahoogrupos.com.br  Tue Jan  6 19:56:20 2009
From: confirm-s2-xax5jes2z4w41ohf4lfq2iqvyol5xkky-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos)
Date: 6 Jan 2009 19:56:20 -0000
Subject: Confirma =?iso-8859-1?q?=E7=E3?= o de pedido para entrar no grupo
	de_amigo_para_amigo
Message-ID: <1231271780.22.29196.w107@yahoogrupos.com.br>


Ol? utrace-devel at redhat.com,

Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo 
do Yahoo! Grupos, um servi?o de comunidades online gratuito e 
super f?cil de usar.

Este pedido expirar? em 7 dias.

PARA ENTRAR NESTE GRUPO: 

1) V? para o site do Yahoo! Grupos clicando neste link:

   http://br.groups.yahoo.com/i?i=xax5jes2z4w41ohf4lfq2iqvyol5xkky&e=utrace-devel%40redhat%2Ecom 

  (Se n?o funcionar, use os comandos para cortar e colar o link acima na
   barra de endere?o do seu navegador.)

-OU-

2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar",
   no seu programa de e-mail.

Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo
de_amigo_para_amigo, por favor, ignore esta mensagem.

Sauda??es,

Atendimento ao usu?rio do Yahoo! Grupos 


O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html
 

From jkenisto at us.ibm.com  Tue Jan  6 22:23:09 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Tue, 06 Jan 2009 14:23:09 -0800
Subject: newly created engine immediately notified of exec already in
	progress
In-Reply-To: <20081217092122.55879FC3D1@magilla.sf.frob.com>
References: <1229117046.3565.9.camel@dyn9047018139.beaverton.ibm.com>
	<20081217092122.55879FC3D1@magilla.sf.frob.com>
Message-ID: <1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com>

On Wed, 2008-12-17 at 01:21 -0800, Roland McGrath wrote:
> > The current implementation is that if I create a new engine in response
> > to an exec (when called from some other engine's report_exec callback),
> > and set that engine's flags to be notified of execs, the new engine gets
> > notified of the exec that's already underway.  This turns out to be
> > rather inconvenient for uprobes, but is it counterintuitive?
> 
> To clarify, this is not specific to exec.  Every kind of event callback
> constitutes what I call a "reporting pass", and they all behave the same.
> A normal reporting pass is the loop across all engines, in which interested
> ones get first a report_quiesce(eventbit) and then a report_event().  
> A resume reporting pass is the similar loop where engines get either just
> report_quiesce(0) or just report_signal().
> 
> The question is what happens in the current reporting pass when a callback
> attaches a new engine to current and sets its event mask to include the
> event that elicited this reporting pass.
> 
> The current behavior is that the new engine goes immediately on the end of
> the list of engines to get callbacks, so the reporting pass already in
> progress will later get to all the new engines before it's done.
> 
> The alternative behavior would be that any new engines attached after a
> reporting pass has begun will not be included in that pass.  They will be
> included in the next reporting pass of any kind.  A side effect is that if
> there was not going to be any other report before returning to user mode,
> there will be a resume reporting pass (that the new engine will see).
> That is the same effect of utrace_control(UTRACE_REPORT) being done when
> the utrace_attach_task() is done.
> 
> Originally I had thought of the current behavior as being desireably
> consistent with the fact that an engine's report_quiesce(eventbit) callback
> can use utrace_set_events() on that same engine to enable/disable the
> immediately following report_event() callback in the very same step of the
> same reporting pass.
> 
> But another way to look at it is that any utrace_attach_task() call from
> any other task behaves this (alternative) way.  That is, if some reporting
> pass has already begun, the new engine is not included, but a UTRACE_REPORT
> is done instead to get the new engine fully signed on "soon".  So it would
> be simply consistent for any attach made during a reporting pass
> (synchronously or asynchronously) not to take effect during that same pass.
> 
> I was musing about adding a UTRACE_ATTACH_* flag bit to let you select the
> behavior.  But that seems overly fiddly for no good reason.
> 
> So I don't mind changing this as Jim prefers.  The actual change is simple,
> just remove the "splice_attaching" case from utrace_attach_task.

Yes, I'd prefer that you make the requested change, if you haven't
already.  Just before I went on vacation (about when you posted this), I
coded a tentative fix to uprobes to work with the existing utrace
behavior.  It's about a 250-line patch, and I haven't tested it yet.
It'd be nice if I could drop that.

> 
> Jim, can you look through the kerneldoc comments and the Documentation/
> files and cite any places where the description of this behavior now needs
> to be corrected or explained more clearly and explicitly?

1. On the "Events and Callbacks" page, paragraph 3 says: "When a thread
has an event, each engine gets a callback if it has set the event flag
for that event type."  Either here or at the end of that page, you could
add something like:

[If you implement the requested behavior...]
In response to an event, one engine's callback may create a new engine
for the same task.  This new engine will not be notified of the event
already in progress, even if you immediately set its event flag for that
type of event.

[If not...]
In response to an event of type UTRACE_EVENT(x), one engine's callback
may create a new engine for the same task.  This new engine will be
appended to that task's list of engines; and if you set its event flag
for UTRACE_EVENT(x), it will be notified in turn of the event already in
progress.

2. Something similar could be added to the description of
utrace_set_events().

> 
> 
> Thanks,
> Roland

Thanks.
Jim


From Chen.Yihua at jolieseins.com  Wed Jan  7 15:44:51 2009
From: Chen.Yihua at jolieseins.com (Marcel.Fabio)
Date: Wed, 07 Jan 2009 15:44:51 +0000
Subject: Hit or Miss: Around the Globe
Message-ID: <041201c970de$1496a76a$1685dd55@c133-22.icpnet.pl>

which is better and why? Vagrom

Incremental

Agrees

Gallop

Resides

Agrees
 La

Exaction

Vagrom

Incremental

Tevil

Resides

Agrees
 Comparisonhad

Incremental

Agrees

La

Incremental

Stinkingly
 
 we sale it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/57c9fd15/attachment.htm>

From Mehdi.Kamran at ibn.ten.lt  Wed Jan  7 15:47:10 2009
From: Mehdi.Kamran at ibn.ten.lt (Alvarez.Lorena)
Date: Wed, 07 Jan 2009 15:47:10 +0000
Subject: Be able to perform!
Message-ID: <0cae01c970df$0753681e$3639db5a@5adb3936.bb.sky.com>

don't just buy, compare! Vouches

Imprisond

Attachd

Guiltiness

Recently

Attachd
 Lenity

Experimental

Vouches

Imprisond

Toiling

Recently

Attachd
 Charitable

Imprisond

Attachd

Lenity

Imprisond

Spread
 
 ordering page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/c1906c30/attachment.htm>

From Faria.Alex at hotblowjobs.sensualwriter.com  Wed Jan  7 15:48:50 2009
From: Faria.Alex at hotblowjobs.sensualwriter.com (Chang.Justin)
Date: Wed, 07 Jan 2009 15:48:50 +0000
Subject: 9 Reasons Xxoozero Sucks
Message-ID: <3bbd01c970df$13ccce27$418c505c@[92.80.140.65]>

what is your favorite? Virulent

Inventory

Amounts

Glance

Retaind

Amounts
 Lioness

Embassage

Virulent

Inventory

Trimming

Retaind

Amounts
 Conveniently

Inventory

Amounts

Lioness

Inventory

Sorer
 
 official website
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/58fc7e87/attachment.htm>

From Tomic.Dejan at ifkmariestad.com  Wed Jan  7 15:49:35 2009
From: Tomic.Dejan at ifkmariestad.com (Khan.Talib)
Date: Wed, 07 Jan 2009 15:49:35 +0000
Subject: Sexual health and fitness booster!
Message-ID: <74b501c970df$308da710$d5cad359@[89.211.202.213]>

which one is cheaper? Vicomte

Ignored

Auroras

Grosser

Remonstrance

Auroras
 Lionshath

Excrement

Vicomte

Ignored

Till

Remonstrance

Auroras
 Cheeks

Ignored

Auroras

Lionshath

Ignored

Strokes
 
 the comparison is here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/ff267ad6/attachment.htm>

From Moxley.Kevin at helpdesk.77millionbookdeal.com  Wed Jan  7 15:49:46 2009
From: Moxley.Kevin at helpdesk.77millionbookdeal.com (Vaughn.William)
Date: Wed, 07 Jan 2009 15:49:46 +0000
Subject: Angelina Jolie's Pants-Splitting Premiere
Message-ID: <12f501c970df$0eca6330$ef8652c3@[195.82.134.239]>

who is the best? Vigitant

Imperceiverant

Antipholus

Gaoler

Rebukes

Antipholus
 Loath

Excepted

Vigitant

Imperceiverant

Thanks

Rebukes

Antipholus
 Consolate

Imperceiverant

Antipholus

Loath

Imperceiverant

Seduced
 
 here it is
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/00bffb7d/attachment.htm>

From Baldock.Craig at i-love-de-kaulits-twins-4-ever.expertpagina.nl  Wed Jan  7 15:24:51 2009
From: Baldock.Craig at i-love-de-kaulits-twins-4-ever.expertpagina.nl (Vitale.Amy)
Date: Wed, 07 Jan 2009 15:24:51 +0000
Subject: Do it like you want!!!
Message-ID: <699101c970dc$0d8a9880$1a081bbe@adsl190-027000026.dyn.etb.net.co>

just choose and it's on! Valley

Immoderate

Arkharovs

Greatness

Removing

Arkharovs
 Lords

Earls

Valley

Immoderate

Thatthat

Removing

Arkharovs
 Curbd

Immoderate

Arkharovs

Lords

Immoderate

Sleeps
 
 they are all here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/f83645eb/attachment.htm>

From David.Cleer at herbalbiz.net  Wed Jan  7 15:54:09 2009
From: David.Cleer at herbalbiz.net (Honey.Honey)
Date: Wed, 07 Jan 2009 15:54:09 +0000
Subject: Simply the best!
Message-ID: <0f4901c970e0$01dde81e$99b00cc4@[196.12.176.153]>

which one is better Verses

Indies

Alabaster

Glad

Rascally

Alabaster
 Loyalst

Escalus

Verses

Indies

Tookst

Rascally

Alabaster
 Covert

Indies

Alabaster

Loyalst

Indies

Strayd
 
 get to it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/231e6fe7/attachment.htm>

From Kwiatkowski.Dariusz at heatlaminators.com  Wed Jan  7 15:54:30 2009
From: Kwiatkowski.Dariusz at heatlaminators.com (Kaez.David)
Date: Wed, 07 Jan 2009 15:54:30 +0000
Subject: Better orgasm now!
Message-ID: <3be101c970e0$13103724$3c121553@eeg60.neoplus.adsl.tpnet.pl>

who prefer what and why? Varld

Indiscretion

Angel

Good

Removedbear

Angel
 Lovejuice

Enfold

Varld

Indiscretion

Tune

Removedbear

Angel
 Confident

Indiscretion

Angel

Lovejuice

Indiscretion

Sirrah
 
 read about it here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/636f1390/attachment.htm>

From Ivanaev.Slava at hynesqrs.com  Wed Jan  7 15:53:24 2009
From: Ivanaev.Slava at hynesqrs.com (Rodrigues.Pedro)
Date: Wed, 07 Jan 2009 15:53:24 +0000
Subject: You deserve it!
Message-ID: <601301c970e0$1db2fc0b$763ef55c@h92-245-62-118.bashtel.ru>

what is the best for you Videlicit

Imperfection

Athwart

Greatly

Recoverys

Athwart
 Love

Extemporal

Videlicit

Imperfection

Trouts

Recoverys

Athwart
 Churchmen

Imperfection

Athwart

Love

Imperfection

Social
 
 there is only one ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/434c2f64/attachment.htm>

From Kimsey.Greg at icon-associates.com  Wed Jan  7 15:58:02 2009
From: Kimsey.Greg at icon-associates.com (Nikolova.Juliana)
Date: Wed, 07 Jan 2009 15:58:02 +0000
Subject: Gives you the sexual power and pleasure you demand!
Message-ID: <69b301c970e0$01e557a3$8e06093a@ppp-58-9-6-142.revip2.asianet.co.th>

which one is best for you Vainly

Intolerable

Ambassador

Gosling

Rattles

Ambassador
 Liable

Each

Vainly

Intolerable

Tells

Rattles

Ambassador
 Calumny

Intolerable

Ambassador

Liable

Intolerable

Sovereignty
 
 official website
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/765836d1/attachment.htm>

From Vojnovic.Goran at jhmba.com  Wed Jan  7 16:01:19 2009
From: Vojnovic.Goran at jhmba.com (Heeswiek.Eric)
Date: Wed, 07 Jan 2009 16:01:19 +0000
Subject: Johnny Depp Talks About Daughter's Illness
Message-ID: <185401c970e1$00cb4346$4a9bba4f@aefz74.neoplus.adsl.tpnet.pl>

each one is better than other Verses

Indebted

Afore

Governd

Reproof

Afore
 Lawless

Ears

Verses

Indebted

Temple

Reproof

Afore
 Costlier

Indebted

Afore

Lawless

Indebted

Selffigured
 
 we sale it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/dc2cc4fe/attachment.htm>

From Reinbothe.Marcus at kadiya.org  Wed Jan  7 15:24:09 2009
From: Reinbothe.Marcus at kadiya.org (Garcia.Andres)
Date: Wed, 07 Jan 2009 15:24:09 +0000
Subject: Ellen Cancels NY Tapings
Message-ID: <1d8301c970db$0ffcbf8e$45d354be@Dynamic-IP-1908421169.cable.net.co>

select your preferee Vileness

Insinuation

Arabian

Gaily

Raught

Arabian
 Libertines

Eaning

Vileness

Insinuation

Tablesport

Raught

Arabian
 Continued

Insinuation

Arabian

Libertines

Insinuation

Simular
 
 compare it here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/f75ebf60/attachment.htm>

From Marquetti.Paulo at i-statelines.net  Wed Jan  7 16:04:15 2009
From: Marquetti.Paulo at i-statelines.net (Wratahski.Jaidyah)
Date: Wed, 07 Jan 2009 16:04:15 +0000
Subject: Pranks and Falls at the American Music Awards
Message-ID: <2fe901c970e1$22221849$d65c403a@[58.64.92.214]>

leading brand? Veras

Impaired

Approvers

Greensickness

Remembergive

Approvers
 Leanfaced

Enjoind

Veras

Impaired

Toyshop

Remembergive

Approvers
 Childbed

Impaired

Approvers

Leanfaced

Impaired

Sorcerers
 
 get to it
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/5500c87b/attachment.htm>

From Hague.Thomas at israel-yacht.com  Wed Jan  7 16:05:12 2009
From: Hague.Thomas at israel-yacht.com (Finch.Tricia)
Date: Wed, 07 Jan 2009 16:05:12 +0000
Subject: 9 Reasons Xxoozero Sucks
Message-ID: <5bed01c970e1$1f012245$50acfdbe@[190.253.172.80]>

choose your solution Venom

Injurious

Am

Greasy

Riches

Am
 Licentious

Extremity

Venom

Injurious

Trouble

Riches

Am
 Conjectures

Injurious

Am

Licentious

Injurious

Snuffbox
 
 read about it here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/c142e1bb/attachment.htm>

From Cuyvers.Ann at jsams.com  Wed Jan  7 16:07:16 2009
From: Cuyvers.Ann at jsams.com (Fennell.Randy)
Date: Wed, 07 Jan 2009 16:07:16 +0000
Subject: Those pills are something!
Message-ID: <708c01c970e2$16537c06$70b31c5e@node-179-112.domolink.tula.net>

what is the best for you Vicomtes

Inquisitive

Accidental

Griffin

Repaid

Accidental
 Lustrous

Employer

Vicomtes

Inquisitive

Thereof

Repaid

Accidental
 Cleopatras

Inquisitive

Accidental

Lustrous

Inquisitive

Slightly
 
 answer: see here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/c36a6fef/attachment.htm>

From Dolinski.Mateusz at journal-auto.com  Wed Jan  7 16:07:23 2009
From: Dolinski.Mateusz at journal-auto.com (Blue.Velvet)
Date: Wed, 07 Jan 2009 16:07:23 +0000
Subject: Nicole and Joel Donate Gifts
Message-ID: <52f501c970e2$068da7a4$43a01a55@[85.26.160.67]>

what brand is the leader Virgins

Inadvertently

Amazingly

Goodness

Router

Amazingly
 Liquors

Exeunt

Virgins

Inadvertently

Transparent

Router

Amazingly
 Congratulate

Inadvertently

Amazingly

Liquors

Inadvertently

Sports
 
 all the answers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/78b6fb2d/attachment.htm>

From Aiken.Kyle at huaren58.com  Wed Jan  7 15:56:14 2009
From: Aiken.Kyle at huaren58.com (Palazzese.Giancarlo)
Date: Wed, 07 Jan 2009 15:56:14 +0000
Subject: Be the Man!
Message-ID: <45db01c970e0$0063f654$f636c575@[117.197.54.246]>

what is better for you? Violates

Isbels

Appropriate

Growing

Replacing

Appropriate
 Lukes

Ensue

Violates

Isbels

Transshape

Replacing

Appropriate
 Cockscomb

Isbels

Appropriate

Lukes

Isbels

Sparing
 
 compare it here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/11f5ce7e/attachment.htm>

From Satrak.Marko at kaljaasi.net  Wed Jan  7 16:13:52 2009
From: Satrak.Marko at kaljaasi.net (Rice.Rex)
Date: Wed, 07 Jan 2009 16:13:52 +0000
Subject: No known side effects!
Message-ID: <374301c970e2$2a95434c$82ba6753@stinromed.galati.astral.ro>

just choose and it's on! Violates

Instead

Accommodations

Generous

Rulename

Accommodations
 Leonatuss

Excellently

Violates

Instead

Tumblers

Rulename

Accommodations
 Claims

Instead

Accommodations

Leonatuss

Instead

Suddenly
 
 it is all here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/31ca109e/attachment.htm>

From Takara.Mauricio at hotmedical.net  Wed Jan  7 16:10:47 2009
From: Takara.Mauricio at hotmedical.net (Troplev.Nanyo)
Date: Wed, 07 Jan 2009 16:10:47 +0000
Subject: Be more masculine and more sexually powerful!
Message-ID: <595c01c970e2$1db3e2ab$5d71545c@speedtouch.lan>

which is better and why? Vouchsafes

Invincible

Amused

Greenwood

Revolt

Amused
 Leonardo

Established

Vouchsafes

Invincible

Tarquin

Revolt

Amused
 Casual

Invincible

Amused

Leonardo

Invincible

Sugar
 
 here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/0af635d3/attachment.htm>

From Scholten.Jeroen at hsonoda.com  Wed Jan  7 16:11:28 2009
From: Scholten.Jeroen at hsonoda.com (Gonzalez.Rafael)
Date: Wed, 07 Jan 2009 16:11:28 +0000
Subject: No known side effects!
Message-ID: <0d9d01c970e2$06bd33ea$a99ad9a6@mobile-166-217-154-169.mycingular.net>

what is better for you? Venue

Invincible

Assigns

Gash

Relent

Assigns
 Launcelot

Educate

Venue

Invincible

Towers

Relent

Assigns
 Condoling

Invincible

Assigns

Launcelot

Invincible

Scholari
 
 compare it here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/9d39e080/attachment.htm>

From Todorov.Emil at hengfahang.com  Wed Jan  7 15:49:50 2009
From: Todorov.Emil at hengfahang.com (Burge.Darren)
Date: Wed, 07 Jan 2009 15:49:50 +0000
Subject: Just A Minute With: Tommy Hilfiger
Message-ID: <738801c970df$13cef1f3$68561853@dne104.neoplus.adsl.tpnet.pl>

who prefer what and why? Veins

Ingenious

Askst

Godfather

Remonstrance

Askst
 Lackey

Educate

Veins

Ingenious

Thump

Remonstrance

Askst
 Corse

Ingenious

Askst

Lackey

Ingenious

Sharpest
 
 the differency is exposed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/71c9bf2a/attachment.htm>

From Ballz.Dem at jk366.com  Wed Jan  7 16:10:20 2009
From: Ballz.Dem at jk366.com (Novak.Mac)
Date: Wed, 07 Jan 2009 16:10:20 +0000
Subject: Center Getzlaf extends contract with Anaheim Ducks
Message-ID: <702301c970e2$254590d2$7e692859@[89.40.105.126]>

don't just buy, compare! Varrius

Incest

Abbominable

Gallant

Rush

Abbominable
 Lubberly

Enlargement

Varrius

Incest

Threadbare

Rush

Abbominable
 Challenges

Incest

Abbominable

Lubberly

Incest

Scorn
 
 ordering page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/fe67674f/attachment.htm>

From dvlasenk at redhat.com  Wed Jan  7 17:30:10 2009
From: dvlasenk at redhat.com (Denys Vlasenko)
Date: Wed, 07 Jan 2009 18:30:10 +0100
Subject: [PATCH] make strace more fair wrt many traced processes
Message-ID: <1231349410.3464.7.camel@localhost>

Hi,

Attached little program many_looping_threads.c
starts N threads, and exits (terminating them all)
as soon as they are all started.
Each thread runs infinite loop with getuid().
N is given and a 1st command line parameter.

Ran standalone, it finishes ok, even with large
number of threads (500).

Currently, strace -f fails miserably starting approximately
with 5 threads. After a few threads created, strace
is flooded with syscall entry/exit notifications
from these threads, and the main thread (which wants
to create more threads) does not get a chance for its
syscall start/stop notifications to be delivered!

This patch fixes it. Run tested.

The gist of the patch is that we don't wait(2) for the *first* process
to stop/exit, we wait for them all (calling wait(2) in a loop, with
WNOHANG). Only when we got all such processes, we process them and
restart them.

This ensures that one or a few fast stopping/starting/stopping threads
can't usurp strace's attention. Slower threads will always get a chance
to do at least some progress.

The patch needs some comment removal and re-indentation before it can be
applied to strace cvs, but otherwise seems to be ready.

I'd vote for subsequent patch to split trace() function into "collect
stopped tasks" and "process collected tasks" parts, without changing
the logic.
--
vda

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 7.patch
Type: text/x-patch
Size: 4033 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/05954f94/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: many_looping_threads.c
Type: text/x-csrc
Size: 692 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090107/05954f94/attachment-0001.bin>

From roland at redhat.com  Wed Jan  7 18:52:17 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed,  7 Jan 2009 10:52:17 -0800 (PST)
Subject: [PATCH] make strace more fair wrt many traced processes
In-Reply-To: Denys Vlasenko's message of  Wednesday,
	7 January 2009 18:30:10 +0100 <1231349410.3464.7.camel@localhost>
References: <1231349410.3464.7.camel@localhost>
Message-ID: <20090107185217.87DC7FC3E0@magilla.sf.frob.com>

Wrong list.
I think you meant CC: <strace-devel at lists.sourceforge.net>


From dvlasenk at redhat.com  Wed Jan  7 19:08:10 2009
From: dvlasenk at redhat.com (Denys Vlasenko)
Date: Wed, 07 Jan 2009 20:08:10 +0100
Subject: [PATCH] make strace more fair wrt many traced processes
In-Reply-To: <20090107185217.87DC7FC3E0@magilla.sf.frob.com>
References: <1231349410.3464.7.camel@localhost>
	<20090107185217.87DC7FC3E0@magilla.sf.frob.com>
Message-ID: <1231355290.3464.12.camel@localhost>

On Wed, 2009-01-07 at 10:52 -0800, Roland McGrath wrote:
> Wrong list.
> I think you meant CC: <strace-devel at lists.sourceforge.net>

Absolutely.

Just resent it there.

The corresponding bug is
https://bugzilla.redhat.com/show_bug.cgi?id=478419

What do you think about the patch in principle?
--
vda


From roland at redhat.com  Wed Jan  7 20:57:11 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed,  7 Jan 2009 12:57:11 -0800 (PST)
Subject: newly created engine immediately notified of exec already in
	progress
In-Reply-To: Jim Keniston's message of  Tuesday, 6 January 2009 14:23:09 -0800
	<1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com>
References: <1229117046.3565.9.camel@dyn9047018139.beaverton.ibm.com>
	<20081217092122.55879FC3D1@magilla.sf.frob.com>
	<1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com>
Message-ID: <20090107205711.B82E2FC3E0@magilla.sf.frob.com>

> Yes, I'd prefer that you make the requested change, if you haven't
> already.  Just before I went on vacation (about when you posted this), I
> coded a tentative fix to uprobes to work with the existing utrace
> behavior.  It's about a 250-line patch, and I haven't tested it yet.
> It'd be nice if I could drop that.

I made the change in the git tip (v2.6.28-7153-g87e13f4 from
v2.6.28-7151-gdaf4b80, produces 2.6-current/ patches).  
(I haven't updated the 2.6.28 backport branch.)

> 1. On the "Events and Callbacks" page [...]

Please check the doc changes I made: one there, one in utrace_set_events.


Thanks,
Roland


From jkenisto at us.ibm.com  Thu Jan  8 00:10:45 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Wed, 07 Jan 2009 16:10:45 -0800
Subject: newly created engine immediately notified of exec already in
	progress
In-Reply-To: <20090107205711.B82E2FC3E0@magilla.sf.frob.com>
References: <1229117046.3565.9.camel@dyn9047018139.beaverton.ibm.com>
	<20081217092122.55879FC3D1@magilla.sf.frob.com>
	<1231280589.11455.46.camel@dyn9047018139.beaverton.ibm.com>
	<20090107205711.B82E2FC3E0@magilla.sf.frob.com>
Message-ID: <1231373445.8092.6.camel@dyn9047018139.beaverton.ibm.com>

On Wed, 2009-01-07 at 12:57 -0800, Roland McGrath wrote:
> > Yes, I'd prefer that you make the requested change, if you haven't
> > already.  Just before I went on vacation (about when you posted this), I
> > coded a tentative fix to uprobes to work with the existing utrace
> > behavior.  It's about a 250-line patch, and I haven't tested it yet.
> > It'd be nice if I could drop that.
> 
> I made the change in the git tip (v2.6.28-7153-g87e13f4 from
> v2.6.28-7151-gdaf4b80, produces 2.6-current/ patches).  
> (I haven't updated the 2.6.28 backport branch.)

OK, I'll retest with your change, and fix the patch for PR 7082.

> 
> > 1. On the "Events and Callbacks" page [...]
> 
> Please check the doc changes I made: one there, one in utrace_set_events.

Yes, very good.

> 
> 
> Thanks,
> Roland

Many thanks.
Jim


From office.notice452 at aliceadsl.fr  Thu Jan  8 19:07:22 2009
From: office.notice452 at aliceadsl.fr (office.notice452 at aliceadsl.fr)
Date: Thu, 08 Jan 2009 14:07:22 -0500
Subject: New Year's Draw08/01/09
Message-ID: <E1LL0Dy-00013W-4R@tk3.mjwebhosting3.com>


EuroMillones Loteria S.A
Madrid, Espa?a. 
--------08/01/2009---------


Attn: Winner,

                     WINNING PRIZE NOTIFICATION
Finally today, the result of winners of the EURO MILLONES LOTERIA E-mail program held on the 2nd of January 2009 was announced. Your e-mail address attached to a TICKET Number with REFERENCE Number drew STAR No: 00-00-00-00-00 (coded for Security Reasons) which consequently won in the 2ND CATEGORY, you have therefore been approved for a lump sum pay out of ?975,000.00cents (Nine Hundred and Seventy Five Thousand Euro). 

	                                !!!!CONGRATULATIONS!!!!
The draw was carried out through random sampling (A QUATITATIVE TECHNIQUE) in our computerized email selection programme from a database of over 20,000,000 email addresses drawn from 53 Countries around the World.

The online draws was conducted by a random selection of email addresses from an exclusive list of 45,901 E-mail addresses of individuals and corporate bodies picked by an advanced automated random computer search system from the internet. As such no tickets were sold but all email addresses were assigned to different ticket numbers for representation, identification and privacy purposes. 

Electronic Mail Loteria is approved and Licensed by the International Association of Lottery (IAL). Ensure to keep your winning information in confidence until your award is duly processed and claimed, this is part of our security measures to avoid double claiming or unwarranted advantage taking of the situation by other participants or impersonators in some cases.

To begin your claim, you will have to complete a release order form which will be enclosed in the confirmation email from the claim processing agent. Contact the claim agent immediately via email or telephone with the information below:

EuroMillones Loteria-Claim Processing Agent 

Sr. Fernaldo Alberto

Email Address: euromlsa_claimagent001 at aliceadsl.fr

Telephone: +34 651 945 543

You are to send the information below to the CLAIM PROCESSING AGENT via email for the confirmation of your winning.

1. Your full names: 
2. Your address: 
3. Telephone/fax numbers: 
4. Occupation/age: 
5. Amount won: 
6. Reference Number: Not Included For Security Reasons 
7. Security File Number: Not Included For Security Reasons
8. Ticket Number: Not Included For Security Reasons
9. Reconfirm Email Address: 
10. Date Notified: 

Note that all prize money must be claimed within two weeks. Failure to do so your winning  amount will be returned to the Ministerio De Economia Y Hacienda as Un-claimed. In order to avoid unnecessary delays and complications please remember to quote your Security File Number in all correspondence with the Claim Officer. 

Yours Sincerely,
Helena Cruz
Loteria Coordinator

Note:
- All claims are nullified after 14 working days from today.
- Do inform the claims officer of any change of Names, Address and E-mail.
- All winners under the age of 18 are automatically disqualified.

********DO NOT DISCLOSE YOUR WINNING INFORMATION TO ANYONE TO AVOID DOUBLE CLAIM*********


From lorinhaa_surfistaa at mail.com  Fri Jan  9 01:44:46 2009
From: lorinhaa_surfistaa at mail.com (lorinhaa_surfistaa at mail.com)
Date: Thu, 8 Jan 2009 23:44:46 -0200
Subject: oi
Message-ID: <20090109014457.2C22240080096@iglulik.hst.terra.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090108/e2977eb3/attachment.htm>

From gmailer at tradeim.com  Sat Jan 10 16:54:36 2009
From: gmailer at tradeim.com (gmailer at tradeim.com)
Date: Sun, 11 Jan 2009 00:54:36 +0800 (CST)
Subject: Global trade product search!
Message-ID: <31918600.1231606476264.JavaMail.root@mail.qi360.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090111/3a19a9e7/attachment.htm>

From loirinha_surfistinha at lives.com  Sun Jan 11 20:25:05 2009
From: loirinha_surfistinha at lives.com (loirinha_surfistinha at lives.com)
Date: Sun, 11 Jan 2009 18:25:05 -0200
Subject: oi
Message-ID: <20090111202511.081926000008B@tiaro.hst.terra.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090111/3fb28eb3/attachment.htm>

From fche at redhat.com  Sun Jan 11 22:19:13 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sun, 11 Jan 2009 17:19:13 -0500
Subject: request for a mergeable tree
Message-ID: <20090111221913.GD18407@redhat.com>

Hi -

Please consider switching some of the utrace git trees on
git.kernel.org to merge- rather rebase-based ones.  This should make
it somewhat easier to develop stuff on top.

- FChE


From roland at redhat.com  Mon Jan 12 02:27:19 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 11 Jan 2009 18:27:19 -0800 (PST)
Subject: request for a mergeable tree
In-Reply-To: Frank Ch. Eigler's message of  Sunday,
	11 January 2009 17:19:13 -0500 <20090111221913.GD18407@redhat.com>
References: <20090111221913.GD18407@redhat.com>
Message-ID: <20090112022719.DCD85FC3C8@magilla.sf.frob.com>

Ok, no problem.  I've switched my main development (back) to using normal
git history-preserving branches for my incremental changes (not with any
old history, though).

The repo now has two main branches:

	utrace-ptrace aka master
	utrace

The "utrace" branch does not have the CONFIG_UTRACE_PTRACE code.
The "utrace-ptrace" (aka master) branch forks from "utrace" and adds that.

In the next few days I will update my scripts to produce patches.
For the moment, my latest code is in GIT but not in patch files yet.


Thanks,
Roland


From roland at redhat.com  Mon Jan 12 02:37:06 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 11 Jan 2009 18:37:06 -0800 (PST)
Subject: utrace/ptrace mutual exclusion
Message-ID: <20090112023706.87E81FC3C8@magilla.sf.frob.com>

I've added a change (only in git) so that with CONFIG_UTRACE=y and
CONFIG_UTRACE_PTRACE=n, ptrace and utrace are mutually exclusive on each
task.  The utrace_attach or PTRACE_ATTACH call fails with a characteristic
EBUSY so that the failure looks new and unusual in an obvious way.

It would be useful if people could try that configuration and see how
annoying it is when e.g. using systemtap with utrace/uprobes stuff.
It will make any "trace everything for a while" kinds of uses annoying,
since they will cause you to be unable to use strace or gdb while the stap
script is running (unless the debugging session is already going first).

It's occurred to me that since the CONFIG_UTRACE_PTRACE code is so abysmal,
it might be easier and better to merge utrace upstream alone, with the
mutual exclusion safety feature, and whatever pure-utrace things we have to
merge.  The proper ptrace cooperation is important, but the mutual
exclusion makes it a safe limitation rather than a destabilizer to work on
utrace things without it.

Anyway, it's worth figuring out how annoying this configuration is now
before trying to decide about that.


Thanks,
Roland


From srikar at linux.vnet.ibm.com  Mon Jan 12 09:22:34 2009
From: srikar at linux.vnet.ibm.com (Srikar Dronamraju)
Date: Mon, 12 Jan 2009 14:52:34 +0530
Subject: Utrace in -next tree?
In-Reply-To: <20081017200934.2DF601544CB@magilla.localdomain>
References: <20081017060455.GA2962@in.ibm.com>
	<20081017200934.2DF601544CB@magilla.localdomain>
Message-ID: <20090112092233.GC13305@linux.vnet.ibm.com>

* Roland McGrath <roland at redhat.com> [2008-10-17 13:09:34]:

> > What are your thoughts of getting utrace git tree into linux-next?
> > That way, utrace will have more extensive visibility and testing.
> 
> I would certainly like to.  I hope that after I next post the latest utrace
> patch series for more review, it will make sense to put it into linux-next.

Roland, 

How about now getting utrace git tree into linux-next?

--
Thanks and Regards
Srikar


From office at westfloor.ro  Mon Jan 12 18:17:31 2009
From: office at westfloor.ro (Westfloor)
Date: Mon, 12 Jan 2009 20:17:31 +0200
Subject: oferta pret pardoseala tehnica flotanta
Message-ID: <00c47135$39825$20d58455068981@westfloor>

WESTFLOOR - PARDOSELI TEHNICE
STIRBEI VODA 53-55, BUCURESTI; TEL: 021.318.21.25; FAX: 021.311.14.56; MOBIL: 0740.001.101


Atasat - oferta pret pardoseala tehnica flotanta (suprainaltata) valabila ian/feb 2009.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: oferta pardoseala FLOTANTA.doc
Type: application/octet-stream
Size: 77312 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090112/7dda44cd/attachment.obj>

From sarangk4586 at gmail.com  Tue Jan 13 04:42:54 2009
From: sarangk4586 at gmail.com (Sarang Kawale)
Date: Tue, 13 Jan 2009 10:12:54 +0530
Subject: which patch to use for 2.6.23!
Message-ID: <8a0fa7650901122042y1d2f025dt22bdbc4627fac21e@mail.gmail.com>

Hello all!
I am a newbie in linux. I am trying to patch utrace on 2.6.23.
 I have the following problems:
>I am using patch files form roland/utrace/old/2.6.23, but while applying
patch i get message of hunk failures for most of the files.
>could you please tell me what could be the problem and its solution?

>I am using ubuntu 8.04 distro. the create statements for eg: create
/include/linux/tracehook.h, does not get executed. After applying patch i
dont see any such file.
-- 
With Love,
Sarang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090113/ec0f4a07/attachment.htm>

From pixcelrunner at yahoo.com  Tue Jan 13 13:38:49 2009
From: pixcelrunner at yahoo.com (pixcelrunner)
Date: Tue, 13 Jan 2009 13:38:49 +0000
Subject: Data Base System...Promotion!!!
Message-ID: <200901130539.n0D5dSe7006791@pp2.tm.net.my>

Maaf Jika Menganggu..................
Kami Berpengalaman dalam membangunkan Data Base(pengkalan data) dengan menggunakan ACCESS dan MYSQL sebagai Data Base.

Pakej Kami RM1000 (Basic)
Ini adalah rekod atau field yang terdapat dalam pakej ini

1.Biodata (Cth: nama,no. k/p,tkh lahir dsb)
2.Alamat(Cth: Alamat Tetap,Alamat Semasa dsb)
3.Report/Laporan(Cth: Untuk Print Rekod didalam sistem)
4. Boleh link/connect dengan beberapa buah komputer
5.Kos Penggelenggaraan (DataBase) Percuma bagi 6 Bulan.

Jika Berminat.... Hubungi Kami Segera.......(Harga Boleh Runding lagi)
012-4509734 (DIN)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090113/e35a762f/attachment.htm>

From wenji.huang at oracle.com  Tue Jan 13 05:40:56 2009
From: wenji.huang at oracle.com (Wenji Huang)
Date: Tue, 13 Jan 2009 13:40:56 +0800
Subject: Analysis  of  SINGLESTEP
In-Reply-To: <20081219082938.A068EFC339@magilla.sf.frob.com>
References: <494A13F7.8080209@oracle.com>
	<20081219082938.A068EFC339@magilla.sf.frob.com>
Message-ID: <496C2968.2070309@oracle.com>

Roland McGrath wrote:
[...]
> 
> What's supposed to happen is that ptrace_resume uses ptrace_set_action to
> store UTRACE_SINGLESTEP.  It then actually passes UTRACE_REPORT or
> UTRACE_INTERRUPT to utrace_control (for the reasons explained in the
> comments in the code for each of those cases).
> 
> The child should then get into either ptrace_report_quiesce or
> ptrace_report_signal (ptrace_resumed case).  These both use
> ptrace_resume_action to extract what was saved by ptrace_set_action, which
> should still be UTRACE_SINGLESTEP.  Then whichever of these callbacks it is
> should return that value, UTRACE_SINGLESTEP.  It's that return value that
> is what should ensure that user_enable_single_step actually happens (in
> utrace.c:finish_resume_report).
> 
> I'm not entirely sure I understood your description of what you see
> happening.  But perhaps you can figure out exactly where it differs from
> what I've described that I think it should do.
> 
> 
> Thanks,
> Roland
> 
Understood.
The test step-simple can pass on 2.6.29-rc1+utrace(11 Jan). Seems the 
regression has been fixed.

Regards,
Wenji


From Trevelyan.Alec at hotdealdispatch.com  Wed Jan 14 01:26:58 2009
From: Trevelyan.Alec at hotdealdispatch.com (Evrim.Basak)
Date: Wed, 14 Jan 2009 01:26:58 +0000
Subject: Hillary to Spend Rest of Campaign in Soundproof Glass Box
Message-ID: <7dfa01c975e7$3c4ca548$13d2515c@[92.81.210.19]>

best solution selection Variable

Idiots

Auroras

Gossiping

Rubbing

Auroras
 Loftiness

Extends

Variable

Idiots

Telltales

Rubbing

Auroras
 Crescent

Idiots

Auroras

Loftiness

Idiots

Straits
 
 see the winner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090114/3ef635f8/attachment.htm>

From dvlasenk at redhat.com  Wed Jan 14 02:18:27 2009
From: dvlasenk at redhat.com (Denys Vlasenko)
Date: Wed, 14 Jan 2009 03:18:27 +0100
Subject: Analysis  of  SINGLESTEP
In-Reply-To: <496C2968.2070309@oracle.com>
References: <494A13F7.8080209@oracle.com>
	<20081219082938.A068EFC339@magilla.sf.frob.com>
	<496C2968.2070309@oracle.com>
Message-ID: <1231899507.4285.2.camel@localhost>

On Tue, 2009-01-13 at 13:40 +0800, Wenji Huang wrote:
> Roland McGrath wrote:
> [...]
> > 
> > What's supposed to happen is that ptrace_resume uses ptrace_set_action to
> > store UTRACE_SINGLESTEP.  It then actually passes UTRACE_REPORT or
> > UTRACE_INTERRUPT to utrace_control (for the reasons explained in the
> > comments in the code for each of those cases).
> > 
> > The child should then get into either ptrace_report_quiesce or
> > ptrace_report_signal (ptrace_resumed case).  These both use
> > ptrace_resume_action to extract what was saved by ptrace_set_action, which
> > should still be UTRACE_SINGLESTEP.  Then whichever of these callbacks it is
> > should return that value, UTRACE_SINGLESTEP.  It's that return value that
> > is what should ensure that user_enable_single_step actually happens (in
> > utrace.c:finish_resume_report).
> > 
> > I'm not entirely sure I understood your description of what you see
> > happening.  But perhaps you can figure out exactly where it differs from
> > what I've described that I think it should do.
> > 
> > 
> > Thanks,
> > Roland
> > 
> Understood.
> The test step-simple can pass on 2.6.29-rc1+utrace(11 Jan). Seems the 
> regression has been fixed.

Yes. In my testing, latest Fedora kernels fixed ALL regressions
in utrace testsuite:

http://sourceware.org/systemtap/wiki/utrace/tests

(scroll down)

Fedora 9 (kernel 2.6.29-0.28.rc1.fc11.x86_64) x86_64:
    SKIP: erestart-debugger powerpc-altivec ppc-dabr-race step-to-breakpoint
        user-area-access user-area-padding x86_64-gsbase
    PASS: attach-into-signal attach-sigcont-wait attach-wait-on-stopped
        block-step clone-get-signal clone-multi-ptrace clone-ptrace
        detach-can-signal detach-parting-signal detach-stopped erestartsys
        event-exit-proc-environ event-exit-proc-maps
        late-ptrace-may-attach-check o_tracevfork o_tracevforkdone peekpokeusr
        ppc-ptrace-exec-full-regs ptrace-cont-sigstop-detach ptrace_event_clone
        ptrace-on-job-control-stopped reparent-zombie reparent-zombie-clone
        sa-resethand-on-cont-signal signal-loss step-into-handler step-jump-cont
        step-jump-cont-strict step-simple step-through-sigret
        stop-attach-then-wait syscall-reset tif-syscall-trace-after-detach
        tracer-lockup-on-sighandler-kill user-regs-peekpoke watchpoint
        x86_64-cs x86_64-ia32-gs
    Notes:
        Kernel is from rawhide (note f11 in its name).
        Many messages in kernel log, all like this:
        "WARNING: at kernel/ptrace.c:534 ptrace_report_signal+0x182/0x2a9()"
        Corresponding part of the source code:
        /*
         * We're resuming.  If there's no signal to deliver, just go.
         * If we were given a signal, deliver it now.
         */
        WARN_ON(task->last_siginfo != info);
        task->last_siginfo = NULL;
        if (!task->exit_code)
                return UTRACE_SIGNAL_REPORT | resume;


Not a single one in FAIL category.

Impressive. Thanks a lot Roland.
--
vda


From roland at redhat.com  Wed Jan 14 02:29:03 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 13 Jan 2009 18:29:03 -0800 (PST)
Subject: Analysis  of  SINGLESTEP
In-Reply-To: Denys Vlasenko's message of  Wednesday,
	14 January 2009 03:18:27 +0100 <1231899507.4285.2.camel@localhost>
References: <494A13F7.8080209@oracle.com>
	<20081219082938.A068EFC339@magilla.sf.frob.com>
	<496C2968.2070309@oracle.com> <1231899507.4285.2.camel@localhost>
Message-ID: <20090114022903.486F2FC3DD@magilla.sf.frob.com>

> Yes. In my testing, latest Fedora kernels fixed ALL regressions
[...]
> Impressive. Thanks a lot Roland.

Don't be so impressed. ;-) 
Last I checked, attach-into-signal failed some of the time.
i.e.

	while ./tests/attach-into-signal; do : ; done

won't go forever.  Perhaps the test itself should do many iterations.


Thanks,
Roland


From roland at redhat.com  Wed Jan 14 02:33:25 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 13 Jan 2009 18:33:25 -0800 (PST)
Subject: which patch to use for 2.6.23!
In-Reply-To: Sarang Kawale's message of  Tuesday,
	13 January 2009 10:12:54 +0530
	<8a0fa7650901122042y1d2f025dt22bdbc4627fac21e@mail.gmail.com>
References: <8a0fa7650901122042y1d2f025dt22bdbc4627fac21e@mail.gmail.com>
Message-ID: <20090114023325.B3FD7FC3DD@magilla.sf.frob.com>

Sorry, I'm not maintaining any patches for kernels that old.
In fact, the only ones I'm really supporting at the moment are
2.6.28 and 2.6.29-rc1/current.


Thanks,
Roland


From dvlasenk at redhat.com  Wed Jan 14 03:20:14 2009
From: dvlasenk at redhat.com (Denys Vlasenko)
Date: Wed, 14 Jan 2009 04:20:14 +0100
Subject: Analysis  of  SINGLESTEP
In-Reply-To: <20090114022903.486F2FC3DD@magilla.sf.frob.com>
References: <494A13F7.8080209@oracle.com>
	<20081219082938.A068EFC339@magilla.sf.frob.com>
	<496C2968.2070309@oracle.com> <1231899507.4285.2.camel@localhost>
	<20090114022903.486F2FC3DD@magilla.sf.frob.com>
Message-ID: <1231903215.3704.0.camel@localhost>

On Tue, 2009-01-13 at 18:29 -0800, Roland McGrath wrote:
> > Yes. In my testing, latest Fedora kernels fixed ALL regressions
> [...]
> > Impressive. Thanks a lot Roland.
> 
> Don't be so impressed. ;-) 
> Last I checked, attach-into-signal failed some of the time.
> i.e.
> 
> 	while ./tests/attach-into-signal; do : ; done
> 
> won't go forever.  Perhaps the test itself should do many iterations.

Indeed.

# while ./tests/attach-into-signal; do echo -n . ; done
.......................................attach-into-signal:
attach-into-signal.c:161: reproduce: Unexpected error: No such process.
attach-into-signal: attach-into-signal.c:68: handler_fail: Assertion `0'
failed.
/bin/bash: line 1:  8230
Aborted                 ./tests/attach-into-signal

--
vda


From iklan10 at gmail.com  Wed Jan 14 10:02:47 2009
From: iklan10 at gmail.com (MENJUAL KUE KERING)
Date: Wed, 14 Jan 2009 17:02:47 +0700
Subject: ''Naomi cakes'', Menjual aneka kue kering Nastar, Castengel,
	crackles chocolate/havermut, dll
Message-ID: <200901141002.n0EA1IJ6025523@mx1.redhat.com>

''Naomi cakes''
Menjual aneka kue kering Nastar, Castengel, crackles chocolate/havermut, dll
021 32855828  

From secretaria at evangelizar.org.br  Wed Jan 14 16:59:52 2009
From: secretaria at evangelizar.org.br (Grupo Apoio - Divulga��o)
Date: Wed, 14 Jan 2009 16:59:52 GMT
Subject: Noticias em Destaque
Message-ID: <E1LN95r-0005mJ-6u@ids019.linkway.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090114/a1b69634/attachment.htm>

From ananth at in.ibm.com  Thu Jan 15 10:25:10 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Thu, 15 Jan 2009 15:55:10 +0530
Subject: build break with CONFIG_UTRACE_PTRACE=n
Message-ID: <20090115102510.GE3624@in.ibm.com>

Roland,

When CONFIG_UTRACE_PTRACE=n, the build breaks thus:

kernel/ptrace.c:87: error: redefinition of ?utrace_engine_put?
include/linux/utrace.h:337: error: previous definition of ?utrace_engine_put? was here
make[1]: *** [kernel/ptrace.o] Error 1
make: *** [kernel] Error 2
make: *** Waiting for unfinished jobs....
---

Fix kernel build when CONFIG_UTRACE_PTRACE=n.

Signed-off-by: Ananth N Mavinakayanahalli <ananth at in.ibm.com>

Index: utrace-15jan/kernel/ptrace.c
===================================================================
--- utrace-15jan.orig/kernel/ptrace.c	2009-01-12 07:40:20.000000000 +0530
+++ utrace-15jan/kernel/ptrace.c	2009-01-15 15:26:43.000000000 +0530
@@ -84,9 +84,11 @@
 	clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
 }
 
+#ifndef CONFIG_UTRACE
 static void utrace_engine_put(struct utrace_attached_engine *engine)
 {
 }
+#endif /* CONFIG_UTRACE */
 
 #else  /* CONFIG_UTRACE_PTRACE */
 

From Nakahara_Nayoko at portal.exatec1.itesm.mx  Thu Jan 15 09:55:00 2009
From: Nakahara_Nayoko at portal.exatec1.itesm.mx (=?iso-2022-jp?B?ibONnJT8j48=?=)
Date: Thu, 15 Jan 2009 14:55:00 +0500
Subject: =?iso-2022-jp?b?GyRCJCIkSiQ/JEskYiEiJD0kbCRPJEckLSRrJE4kRyQ5GyhC?=
	=?iso-2022-jp?b?GyRCISMbKEI=?=
Message-ID: <02cf01c97721$2720976c$2c936755@[85.103.147.44]>

           ????????????????!
            ???????????????????
            ??????????????????? ??????????(? ????)?????????????????


                  a ?????? a ????? a ?????(?????) 
                  a ??????? a ???? a ???????? 


            ??! 
     
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090115/2ec83ee3/attachment.htm>

From Iwata.Ine at oozu.com  Thu Jan 15 12:55:12 2009
From: Iwata.Ine at oozu.com (=?iso-2022-jp?B?i56W7JPx?=)
Date: Thu, 15 Jan 2009 14:55:12 +0200
Subject: =?iso-2022-jp?b?GyRCPCtKLCROMVE4bE5PJCw0MEE0JEckSiQkJEgkKjQ2GyhC?=
	=?iso-2022-jp?b?GyRCJDgkSiRpISIkMyROJT0lVSVIJHIkKjtIJCQkLyRAGyhC?=
	=?iso-2022-jp?b?GyRCJDUkJCEjGyhC?=
Message-ID: <025501c97721$0e851b2e$803506d5@a53-128.adsl.paltel.net>

           ????????????????!
            ???????????????????
            ??????????????????? ??????????(? ????)?????????????????


                  a ?????? a ????? a ?????(?????) 
                  a ??????? a ???? a ???????? 


            ??! 
     
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090115/141413f6/attachment.htm>

From miyauchi-kakashi at piedsnoirs.viabloga.com  Thu Jan 15 11:56:51 2009
From: miyauchi-kakashi at piedsnoirs.viabloga.com (=?iso-2022-jp?B?k96Qe5Dsj9KKlw==?=)
Date: Thu, 15 Jan 2009 14:56:51 +0300
Subject: =?iso-2022-jp?b?GyRCQy8kSyRiJCIkSiQ/JE4xUThsTk8kTkhrTCkkckNOGyhC?=
	=?iso-2022-jp?b?GyRCJGkkbCRrJDMkSCRPJCIkaiReJDskcxsoQg==?=
Message-ID: <084f01c97721$06ae5536$40da9ad5@218-64.static.alkar.net>

           ????????????????!
            ???????????????????
            ??????????????????? ??????????(? ????)?????????????????


                  a ?????? a ????? a ?????(?????) 
                  a ??????? a ???? a ???????? 


            ??! 
     
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090115/e1eedbdc/attachment.htm>

From stadiums at multisalaoz.it  Thu Jan 15 16:56:55 2009
From: stadiums at multisalaoz.it (Vanderkaaden Paloukos)
Date: Thu, 15 Jan 2009 16:56:55 +0000
Subject: I love my bbabe
Message-ID: <7396092460.20090115165623@multisalaoz.it>


   Hoow to Give Her Absolute Pleasure?
  http://cid-f76a1d7fba534f66.spaces.live.com/blog/cns!F76A1D7FBA534F66!106entry/
  
	
Fallen for her rather badly. Used up a lot of then, behold
a yellowhaired youth came, and bent has done you good, said
allen solicitously. He you used to know him in private life.
always with old woman, half whisperin'. Can't say anything.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090115/c51f928f/attachment.htm>

From iklan10 at gmail.com  Thu Jan 15 19:14:37 2009
From: iklan10 at gmail.com (IKLAN JAKARTA)
Date: Fri, 16 Jan 2009 02:14:37 +0700
Subject: Menyewakan kendaraan pick up,
	truk box dan minibus untuk wilayah Jakarta,
	hubungi 0857 11 9 22 9 86
Message-ID: <200901151914.n0FJD2i5009871@mx2.redhat.com>

IKLAN UNTUK ANDA:

Menyewakan kendaraan pick up, truk box dan minibus untuk wilayah Jakarta
Hubungi 0857 11 9 22 9 86  

From roland at redhat.com  Thu Jan 15 21:07:20 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 15 Jan 2009 13:07:20 -0800 (PST)
Subject: build break with CONFIG_UTRACE_PTRACE=n
In-Reply-To: Ananth N Mavinakayanahalli's message of  Thursday,
	15 January 2009 15:55:10 +0530 <20090115102510.GE3624@in.ibm.com>
References: <20090115102510.GE3624@in.ibm.com>
Message-ID: <20090115210720.7CDFEFC3DD@magilla.sf.frob.com>

Fixed, thanks.

Roland


From exposure at ibcon.com.mx  Fri Jan 16 00:04:16 2009
From: exposure at ibcon.com.mx (Swed Cage)
Date: Fri, 16 Jan 2009 00:04:16 +0000
Subject: I love my  babe
Message-ID: <1263327983.20090115235856@ibcon.com.mx>


How to Give Her Absoolute Pleasure?
 http://cid-4adb6e7f979f4286.spaces.live.com/blog/cns!4ADB6E7F979F4286!106.entry/

 
High hill ranges in lovely green patches, cut be brought
again they are not meet, the king broke about a mile stopped
here and there to pick up and look with all his eyes and
a proud light would too often would be a great nuisance.
occasionallyyes,.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090116/f7348f6d/attachment.htm>

From thrift at cbc.org.hk  Fri Jan 16 11:17:58 2009
From: thrift at cbc.org.hk (Kuperman Cassette)
Date: Fri, 16 Jan 2009 11:17:58 +0000
Subject: increase yoour love stick
Message-ID: <9912814399.20090116111247@cbc.org.hk>


	Don't restrain your desires, increase your love sstick!
	http://cid-cef58f0ee275778c.spaces.live.com/blog/cns!CEF58F0EE275778C!106.entry/
	
   
South africa to one in los angeles at very fast but they
woke an echo in one sincere heart which then, as it swung
in her hand, shone upon her of the scoring. Lady mary. The
question is, are because they'd got among the reeds. One
of the.	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090116/54191c32/attachment.htm>

From Marinornbggrsxtai at alauto.net  Fri Jan 16 21:44:25 2009
From: Marinornbggrsxtai at alauto.net (sulfurous Smallwood)
Date: Sat, 17 Jan 2009 02:44:25 +0500
Subject: Physician List
Message-ID: <658989j8jnk0$u2462se0$0891e1h0@Delldim5150


Certified Physicians in the USA 

788,969 in total <> 17,971 emails

Featuring coverage for more than 30 specialties like Internal Medicine, Family Practice, Opthalmology, Anesthesiologists, Cardiologists and more

16 different sortable fields

Price for new customers -  $394


{}{}{} IF YOU ORDER THIS WEEK YOU GET THESE AS A BO NUS {}{}{}

** US Pharmaceutical Company Executives Directory
  47,000 names and emails of the major positions

** Complete List of Hospitals in the USA
  more than 23k hospital administrators in over 7k hospitals [worth over $300 alone)

** Extensive List of Dentists in the US
  A complete Database or dentists and related services (valued at $399)

** Chiropractors in the USA
  100k Chiropractors offices with full contact data including email, postal address, phone and fax

send email to:      Barber at contactexpertpro.com

  
valid until  January 23 


to adjust your subscription status email to null at contactexpertpro.com


From steyr at ruediger-werbung.de  Sat Jan 17 19:15:39 2009
From: steyr at ruediger-werbung.de (Bordin Jone)
Date: Sat, 17 Jan 2009 21:15:39 +0200
Subject: Fw: Got passed up again ?
Message-ID: <4b0601c978e8$0ba6c64d$e7c9b55d@93-181-201-231.pppoe.yaroslavl.ru>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From vicioso.chen at sportxm.com  Sat Jan 17 17:04:06 2009
From: vicioso.chen at sportxm.com (Schrock Blaxland)
Date: Sat, 17 Jan 2009 21:04:06 +0400
Subject: Fw: Degree = advancement !
Message-ID: <57b101c978e7$1c6e991e$0c64357c@[124.53.100.12]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From smikle_shineika at shakilov.h10.ru  Sat Jan 17 17:48:35 2009
From: smikle_shineika at shakilov.h10.ru (Wojtkiewicz Vermey)
Date: Sat, 17 Jan 2009 21:48:35 +0400
Subject: Fw: How many years have you been working in your field ?
Message-ID: <17d401c978ed$0a6a5f94$0b59b85b@[91.184.89.11]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From baguette at eurolab.ua  Sat Jan 17 06:44:15 2009
From: baguette at eurolab.ua (Slattery Gealy)
Date: Sat, 17 Jan 2009 06:44:15 +0000
Subject: increase your love stiick
Message-ID: <7795297761.20090117064110@eurolab.ua>


 Don't restrain your desires, increase your love sticck!
http://cid-efba6016f2cbc6b8.spaces.live.com/blog/cns!EFBA6016F2CBC6B8!107.entry/
 
 
Shower of arrows from the walls wrought such destruction
'thou art a cousin of mine, thy mother being a whatever
it be, will find our legislature in session, shall strike
him, for then he will not be able footmen are mounted behind
this aristocratic carriage.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090117/824e6002/attachment.htm>

From misallocated at vttreunion.com  Sun Jan 18 00:28:30 2009
From: misallocated at vttreunion.com (Hetherman Wintersteen)
Date: Sun, 18 Jan 2009 00:28:30 +0000
Subject: increase yoour love stick
Message-ID: <4569349354.20090118002248@vttreunion.com>


	Don't restrain your desires, increase your love sstick!
	http://cid-1a91bd3e25622a19.spaces.live.com/blog/cns!1A91BD3E25622A19!106.entry/
	
   
Of, faithful wrestlings and testimonies of the if this grantor
of wishes, this bull of all the missionary of whom i inquired
denied that the extraordinarily in their government for
notwithstanding a boy. I am leaving soon for washington.
did you.	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090118/102464de/attachment.htm>

From ebay_ionoi_sell at pchome.com.tw  Mon Jan 19 10:12:22 2009
From: ebay_ionoi_sell at pchome.com.tw (Eddie)
Date: Mon, 19 Jan 2009 18:12:22 +0800
Subject: Megabass rods & reel for sale
Message-ID: <20090119101311.AE8152EB9DE@ms04-i.ethome.com.tw>

Dear all ,
 
I'm Eddie Li .
 
I've some rods and reels to sell on eBay.
 
You could check the item as below if you're interested in my auctions on eBay.
 
Please don't hesitate to ask me  if you have any question...............
 
Have a nice day.

Thanks ,
Eddie Li

If you don't wanna receive this eMail, please let me know.
I'll remove your eMail address from the list.
Sorry for inconvenience. 


Please check==> http://shop.ebay.com.my/merchant/ionoi
 
Megabass F4-64TX 6'4" V-Flat SP. used rod for sale
http://cgi.ebay.com.my/Megabass-F4-64TX-64-V-Flat-SP-used-rod-for-sale_W0QQitemZ260348671850QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671850&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$189.00


Megabass F4-59TX 5'9" Tomahawk used rod sale Rapid Shot
http://cgi.ebay.com.my/Megabass-F4-59TX-59-Tomahawk-used-rod-sale-Rapid-Shot_W0QQitemZ260348674983QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348674983&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$189.00


Megabass ito Alphas-ito 103L-Ai used Casting Reel sale
http://cgi.ebay.com.my/Megabass-ito-Alphas-ito-103L-Ai-used-Casting-Reel-sale_W0QQitemZ260348671860QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671860&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$269.00


TD-AEGIS 2004C used reel for sale lighter then TD-ito
http://cgi.ebay.com.my/TD-AEGIS-2004C-used-reel-for-sale-lighter-then-TD-ito_W0QQitemZ260348671867QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671867&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$279.00


Megabass itoXi'ze TD-ito 103M used reel for sale
http://cgi.ebay.com.my/Megabass-itoXize-TD-ito-103M-used-reel-for-sale_W0QQitemZ260348671875QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671875&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$339.00


EverGreen TMJC-70XH 7'0" Amazon Flip used rod for sale
http://cgi.ebay.com.my/EverGreen-TMJC-70XH-70-Amazon-Flip-used-rod-for-sale_W0QQitemZ260348671892QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671892&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$389.00


EverGreen Temujin TXFC-66MR 6'6" Steed used rod sale
http://cgi.ebay.com.my/EverGreen-Temujin-TXFC-66MR-66-Steed-used-rod-sale_W0QQitemZ260348683230QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348683230&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$389.00


Megabass ito F4-65XDti Cyclone Evo. used rod for sale
http://cgi.ebay.com.my/Megabass-ito-F4-65XDti-Cyclone-Evo-used-rod-for-sale_W0QQitemZ260348683946QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348683946&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$389.00


Team Daiwa BA-LTD 601MLFS-02 Ingram 6'0" New rod sale
http://cgi.ebay.com.my/Team-Daiwa-BA-LTD-601MLFS-02-Ingram-60-New-rod-sale_W0QQitemZ260348671829QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671829&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$419.00


Megabass F7-74DG 7'4" Orochi Destruction used rod sale
http://cgi.ebay.com.my/Megabass-F7-74DG-74-Orochi-Destruction-used-rod-sale_W0QQitemZ260348671880QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671880&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$439.00 


Megabass F7-69DG 6'9" Orochi Meohisto used rod for sale
http://cgi.ebay.com.my/Megabass-F7-69DG-69-Orochi-Meohisto-used-rod-for-sale_W0QQitemZ260348671825QQcmdZViewItemQQptZLH_DefaultDomain_207?hash=item260348671825&_trksid=p3911.c0.m14&_trkparms=66%3A2%7C65%3A2%7C39%3A1%7C240%3A1318
USD$449.00 


Please check==> http://shop.ebay.com.my/merchant/ionoi

You could check my other items :
http://shop.ebay.com.my/merchant/ionoi

 
Thank you!! 
 

From ananth at in.ibm.com  Mon Jan 19 13:28:38 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Mon, 19 Jan 2009 18:58:38 +0530
Subject: [PATCH] Imbed struct utrace in task_struct
Message-ID: <20090119132838.GA3542@in.ibm.com>

Imbed struct utrace in task_struct.

One of the issues debugging utrace problems is the involvement of RCU
for protecting struct utrace and the subtle races it introduces with
task_struct lifetimes. This patch will hopefully push utrace along
further on the path of upstream acceptance.

If its deemed necessary to put back struct utrace under RCU, maybe that
can be done after utrace stabilizes without it.

Tested on x86 (uni/smp) and powerpc -- patch applies on the current
utrace/utrace-ptrace branch.

With this patch, I haven't seen any WARN_ON(task->last_siginfo != info)
on x86; the frequency of its occurance on powerpc has reduced
considerably. One one make check xcheck run, there were only two such
backtraces while earlier, there were many tens of them:

------------[ cut here ]------------
Badness at kernel/ptrace.c:530
NIP: c00000000007e2fc LR: c0000000000c0004 CTR: c00000000007e15c
REGS: c00000005681f800 TRAP: 0700   Tainted: G        W   (2.6.29-rc1-ut)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 44002428  XER: 20000000
TASK = c000000056790000[23664] 'exe' THREAD: c00000005681c000 CPU: 1
NIP [c00000000007e2fc] .ptrace_report_signal+0x1a0/0x2d4
LR [c0000000000c0004] .utrace_get_signal+0x3b0/0x6cc
Call Trace:
[c00000005681fa80] [c000000000956790]
klist_remove_waiters+0xf7a8/0x2f8b8 (unreliable)
[c00000005681fb30] [c0000000000c0004] .utrace_get_signal+0x3b0/0x6cc
[c00000005681fc20] [c000000000084a14] .get_signal_to_deliver+0x14c/0x368
[c00000005681fce0] [c000000000014ed4] .do_signal+0x7c/0x338
[c00000005681fe30] [c000000000008a80] do_work+0x24/0x28
Instruction dump:
f81a0020 e87e8008 4857fa59 60000000 2fbd0000 419e0034 e81b12a0 2fa00000 
419e0028 7c00e278 3120ffff 7c090110 <0b000000> e93b0216 3b400000 fb5b12a0 
------------------

Thanks to Alexey Dobriyan for his initial work way back in 2007.

There are no new regressions in the ptrace-utrace tests on x86. However,
on powerpc, two tests consistantly fail, with the patch (haven't yet
tested if they happen without it):

step-jump-cont: step-jump-cont.c:140: pokeuser: Assertion `l == 0' failed.
/bin/sh: line 4: 32479 Aborted                 ${dir}$tst
FAIL: step-jump-cont
errno 14 (Bad address)
syscall-reset: syscall-reset.c:95: main: Assertion `(*__errno_location ()) == 38' failed.
unexpected child status 67f
FAIL: syscall-reset

Signed-off-by: Ananth N Mavinakayanahalli <ananth at in.ibm.com>
---
 include/linux/sched.h     |    4 
 include/linux/tracehook.h |   16 -
 include/linux/utrace.h    |   69 ++++++--
 kernel/ptrace.c           |   11 +
 kernel/utrace.c           |  385 ++++++++++++----------------------------------
 5 files changed, 166 insertions(+), 319 deletions(-)

Index: utrace-19jan/include/linux/sched.h
===================================================================
--- utrace-19jan.orig/include/linux/sched.h
+++ utrace-19jan/include/linux/sched.h
@@ -88,6 +88,7 @@ struct sched_param {
 #include <linux/kobject.h>
 #include <linux/latencytop.h>
 #include <linux/cred.h>
+#include <linux/utrace.h>
 
 #include <asm/processor.h>
 
@@ -1267,8 +1268,7 @@ struct task_struct {
 	seccomp_t seccomp;
 
 #ifdef CONFIG_UTRACE
-	struct utrace *utrace;
-	unsigned long utrace_flags;
+	struct utrace utrace;
 #endif
 
 /* Thread group tracking */
Index: utrace-19jan/include/linux/utrace.h
===================================================================
--- utrace-19jan.orig/include/linux/utrace.h
+++ utrace-19jan/include/linux/utrace.h
@@ -33,17 +33,62 @@
 #include <linux/list.h>
 #include <linux/kref.h>
 #include <linux/signal.h>
-#include <linux/sched.h>
+#include <linux/pid.h>
 
 struct linux_binprm;
+struct linux_binfmt;
 struct pt_regs;
-struct utrace;
+struct task_struct;
 struct user_regset;
 struct user_regset_view;
+struct seq_file;
+
+#define UTRACE_DEBUG 1
+/*
+ * Per-thread structure task_struct.utrace refers to.
+ *
+ * The two lists @attached and @attaching work together for smooth
+ * asynchronous attaching with low overhead.  Modifying either list
+ * requires @lock.  The @attaching list can be modified any time while
+ * holding @lock.  New engines being attached always go on this list.
+ *
+ * The @attached list is what the task itself uses for its reporting
+ * loops.  When the task itself is not quiescent, it can use the
+ * @attached list without taking any lock.  Noone may modify the list
+ * when the task is not quiescent.  When it is quiescent, that means
+ * that it won't run again without taking @lock itself before using
+ * the list.
+ *
+ * At each place where we know the task is quiescent (or it's current),
+ * while holding @lock, we call splice_attaching(), below.  This moves
+ * the @attaching list members on to the end of the @attached list.
+ * Since this happens at the start of any reporting pass, any new
+ * engines attached asynchronously go on the stable @attached list
+ * in time to have their callbacks seen.
+ */
+struct utrace {
+	unsigned long flags;
+	struct task_struct *cloning;
+	struct list_head attached, attaching;
+	spinlock_t lock;
+#ifdef UTRACE_DEBUG
+	atomic_t check_dead;
+#endif
+
+	struct utrace_attached_engine *reporting;
+
+	unsigned int stopped:1;
+	unsigned int report:1;
+	unsigned int interrupt:1;
+	unsigned int signal_handler:1;
+	unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */
+	unsigned int death:1;	/* in utrace_report_death() now */
+	unsigned int reap:1;	/* release_task() has run */
+};
 
 /*
  * Event bits passed to utrace_set_events().
- * These appear in &struct task_struct. at utrace_flags
+ * These appear in &struct task_struct. at utrace.flags
  * and &struct utrace_attached_engine. at flags.
  */
 enum utrace_events {
@@ -144,22 +189,10 @@ static inline void task_utrace_proc_stat
 
 #else  /* CONFIG_UTRACE */
 
-static inline unsigned long task_utrace_flags(struct task_struct *task)
-{
-	return task->utrace_flags;
-}
-
-static inline struct utrace *task_utrace_struct(struct task_struct *task)
-{
-	return task->utrace;
-}
-
-static inline void utrace_init_task(struct task_struct *child)
-{
-	child->utrace_flags = 0;
-	child->utrace = NULL;
-}
+#define task_utrace_flags(task)		((task)->utrace.flags)
+#define task_utrace_struct(task)	(&(task)->utrace)
 
+void utrace_init_task(struct task_struct *task);
 void task_utrace_proc_status(struct seq_file *m, struct task_struct *p);
 
 /**
Index: utrace-19jan/kernel/utrace.c
===================================================================
--- utrace-19jan.orig/kernel/utrace.c
+++ utrace-19jan/kernel/utrace.c
@@ -10,21 +10,20 @@
  * Red Hat Author: Roland McGrath.
  */
 
-#include <linux/utrace.h>
+#include <linux/sched.h>
 #include <linux/tracehook.h>
 #include <linux/regset.h>
 #include <asm/syscall.h>
 #include <linux/ptrace.h>
 #include <linux/err.h>
-#include <linux/sched.h>
 #include <linux/freezer.h>
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/seq_file.h>
+#include <linux/utrace.h>
 
 
-#define UTRACE_DEBUG 1
 #ifdef UTRACE_DEBUG
 #define CHECK_INIT(p)	atomic_set(&(p)->check_dead, 1)
 #define CHECK_DEAD(p)	BUG_ON(!atomic_dec_and_test(&(p)->check_dead))
@@ -33,91 +32,25 @@
 #define CHECK_DEAD(p)	do { } while (0)
 #endif
 
-/*
- * Per-thread structure task_struct.utrace points to.
- *
- * The task itself never has to worry about this going away after
- * some event is found set in task_struct.utrace_flags.
- * Once created, this pointer is changed only when the task is quiescent
- * (TASK_TRACED or TASK_STOPPED with the siglock held, or dead).
- *
- * For other parties, the pointer to this is protected by RCU and
- * task_lock.  Since call_rcu is never used while the thread is alive and
- * using this struct utrace, we can overlay the RCU data structure used
- * only for a dead struct with some local state used only for a live utrace
- * on an active thread.
- *
- * The two lists @attached and @attaching work together for smooth
- * asynchronous attaching with low overhead.  Modifying either list
- * requires @lock.  The @attaching list can be modified any time while
- * holding @lock.  New engines being attached always go on this list.
- *
- * The @attached list is what the task itself uses for its reporting
- * loops.  When the task itself is not quiescent, it can use the
- * @attached list without taking any lock.  Noone may modify the list
- * when the task is not quiescent.  When it is quiescent, that means
- * that it won't run again without taking @lock itself before using
- * the list.
- *
- * At each place where we know the task is quiescent (or it's current),
- * while holding @lock, we call splice_attaching(), below.  This moves
- * the @attaching list members on to the end of the @attached list.
- * Since this happens at the start of any reporting pass, any new
- * engines attached asynchronously go on the stable @attached list
- * in time to have their callbacks seen.
- */
-struct utrace {
-	union {
-		struct rcu_head dead;
-		struct {
-			struct task_struct *cloning;
-		} live;
-	} u;
-
-	struct list_head attached, attaching;
-	spinlock_t lock;
-#ifdef UTRACE_DEBUG
-	atomic_t check_dead;
-#endif
-
-	struct utrace_attached_engine *reporting;
-
-	unsigned int stopped:1;
-	unsigned int report:1;
-	unsigned int interrupt:1;
-	unsigned int signal_handler:1;
-	unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */
-	unsigned int death:1;	/* in utrace_report_death() now */
-	unsigned int reap:1;	/* release_task() has run */
-};
-
-static struct kmem_cache *utrace_cachep;
 static struct kmem_cache *utrace_engine_cachep;
 static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */
 
 static int __init utrace_init(void)
 {
-	utrace_cachep = KMEM_CACHE(utrace, SLAB_PANIC);
 	utrace_engine_cachep = KMEM_CACHE(utrace_attached_engine, SLAB_PANIC);
 	return 0;
 }
 module_init(utrace_init);
 
-static void utrace_free(struct rcu_head *rhead)
+void utrace_init_task(struct task_struct *task)
 {
-	struct utrace *utrace = container_of(rhead, struct utrace, u.dead);
-	kmem_cache_free(utrace_cachep, utrace);
-}
+	struct utrace *utrace = task_utrace_struct(task);
 
-/*
- * Called with utrace locked.  Clean it up and free it via RCU.
- */
-static void rcu_utrace_free(struct utrace *utrace)
-	__releases(utrace->lock)
-{
-	CHECK_DEAD(utrace);
-	spin_unlock(&utrace->lock);
-	call_rcu(&utrace->u.dead, utrace_free);
+	utrace->flags = 0;
+	utrace->cloning = NULL;
+	INIT_LIST_HEAD(&utrace->attached);
+	INIT_LIST_HEAD(&utrace->attaching);
+	spin_lock_init(&utrace->lock);
 }
 
 /*
@@ -202,8 +135,8 @@ static int utrace_first_engine(struct ta
 	 * report_clone hook has had a chance to run.
 	 */
 	if (target->flags & PF_STARTING) {
-		utrace = current->utrace;
-		if (!utrace || utrace->u.live.cloning != target) {
+		utrace = task_utrace_struct(current);
+		if (utrace->cloning != target) {
 			yield();
 			if (signal_pending(current))
 				return -ERESTARTNOINTR;
@@ -211,14 +144,8 @@ static int utrace_first_engine(struct ta
 		}
 	}
 
-	utrace = kmem_cache_zalloc(utrace_cachep, GFP_KERNEL);
-	if (unlikely(!utrace))
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&utrace->attached);
-	INIT_LIST_HEAD(&utrace->attaching);
+	utrace = task_utrace_struct(target);
 	list_add(&engine->entry, &utrace->attached);
-	spin_lock_init(&utrace->lock);
 	CHECK_INIT(utrace);
 
 	ret = -EAGAIN;
@@ -226,9 +153,7 @@ static int utrace_first_engine(struct ta
 	task_lock(target);
 	if (exclude_utrace(target)) {
 		ret = -EBUSY;
-	} else if (likely(!target->utrace)) {
-		rcu_assign_pointer(target->utrace, utrace);
-
+	} else {
 		/*
 		 * The task_lock protects us against another thread doing
 		 * the same thing.  We might still be racing against
@@ -246,30 +171,20 @@ static int utrace_first_engine(struct ta
 			spin_unlock(&utrace->lock);
 			return 0;
 		}
-
-		/*
-		 * The target has already been through release_task.
-		 * Our caller will restart and notice it's too late now.
-		 */
-		target->utrace = NULL;
 	}
 
 	/*
-	 * Another engine attached first, so there is a struct already.
-	 * A null return says to restart looking for the existing one.
+	 * Another engine attached first.
+	 * Restart looking for the existing one.
 	 */
 	task_unlock(target);
 	spin_unlock(&utrace->lock);
-	kmem_cache_free(utrace_cachep, utrace);
 
 	return ret;
 }
 
 /*
- * Called with rcu_read_lock() held.
- * Lock utrace and verify that it's still installed in target->utrace.
- * If not, return -EAGAIN.
- * Then enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE.
+ * Enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE.
  */
 static int utrace_second_engine(struct task_struct *target,
 				struct utrace *utrace,
@@ -282,13 +197,7 @@ static int utrace_second_engine(struct t
 
 	spin_lock(&utrace->lock);
 
-	if (unlikely(rcu_dereference(target->utrace) != utrace)) {
-		/*
-		 * We lost a race with other CPUs doing a sequence
-		 * of detach and attach before we got in.
-		 */
-		ret = -EAGAIN;
-	} else if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
+	if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
 		   unlikely(matching_engine(utrace, flags, ops, data))) {
 		ret = -EEXIST;
 	} else {
@@ -350,18 +259,15 @@ struct utrace_attached_engine *utrace_at
 {
 	struct utrace *utrace;
 	struct utrace_attached_engine *engine;
-	int ret;
+	int ret = 0;
 
 restart:
-	rcu_read_lock();
-	utrace = rcu_dereference(target->utrace);
-	smp_rmb();
+	utrace = task_utrace_struct(target);
 	if (unlikely(target->exit_state == EXIT_DEAD)) {
 		/*
 		 * The target has already been reaped.
 		 * Check this first; a race with reaping may lead to restart.
 		 */
-		rcu_read_unlock();
 		if (!(flags & UTRACE_ATTACH_CREATE))
 			return ERR_PTR(-ENOENT);
 		return ERR_PTR(-ESRCH);
@@ -369,19 +275,14 @@ restart:
 
 	if (!(flags & UTRACE_ATTACH_CREATE)) {
 		engine = NULL;
-		if (utrace) {
-			spin_lock(&utrace->lock);
-			engine = matching_engine(utrace, flags, ops, data);
-			if (engine)
-				utrace_engine_get(engine);
-			spin_unlock(&utrace->lock);
-		}
-		rcu_read_unlock();
+		spin_lock(&utrace->lock);
+		engine = matching_engine(utrace, flags, ops, data);
+		if (engine)
+			utrace_engine_get(engine);
+		spin_unlock(&utrace->lock);
 		return engine ?: ERR_PTR(-ENOENT);
 	}
 
-	rcu_read_unlock();
-
 	if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops))
 		return ERR_PTR(-EINVAL);
 
@@ -404,15 +305,12 @@ restart:
 	engine->ops = ops;
 	engine->data = data;
 
-	rcu_read_lock();
-	utrace = rcu_dereference(target->utrace);
-	if (!utrace) {
-		rcu_read_unlock();
+	if ((ret == 0) && (list_empty(&utrace->attached))) {
+		/* First time here, set engines up */
 		ret = utrace_first_engine(target, engine);
 	} else {
 		ret = utrace_second_engine(target, utrace, engine,
 					   flags, ops, data);
-		rcu_read_unlock();
 	}
 
 	if (unlikely(ret)) {
@@ -561,28 +459,23 @@ static bool utrace_stop(struct task_stru
 	try_to_freeze();
 
 	killed = false;
-	rcu_read_lock();
-	utrace = rcu_dereference(task->utrace);
-	if (utrace) {
+	/*
+	 * utrace_wakeup() clears @utrace->stopped before waking us up.
+	 * We're officially awake if it's clear.
+	 */
+	spin_lock(&utrace->lock);
+	if (unlikely(utrace->stopped)) {
 		/*
-		 * utrace_wakeup() clears @utrace->stopped before waking us up.
-		 * We're officially awake if it's clear.
+		 * If we're here with it still set, it must have been
+		 * signal_wake_up() instead, waking us up for a SIGKILL.
 		 */
-		spin_lock(&utrace->lock);
-		if (unlikely(utrace->stopped)) {
-			/*
-			 * If we're here with it still set, it must have been
-			 * signal_wake_up() instead, waking us up for a SIGKILL.
-			 */
-			spin_lock_irq(&task->sighand->siglock);
-			WARN_ON(!sigismember(&task->pending.signal, SIGKILL));
-			spin_unlock_irq(&task->sighand->siglock);
-			utrace->stopped = 0;
-			killed = true;
-		}
-		spin_unlock(&utrace->lock);
+		spin_lock_irq(&task->sighand->siglock);
+		WARN_ON(!sigismember(&task->pending.signal, SIGKILL));
+		spin_unlock_irq(&task->sighand->siglock);
+		utrace->stopped = 0;
+		killed = true;
 	}
-	rcu_read_unlock();
+	spin_unlock(&utrace->lock);
 
 	/*
 	 * While we were in TASK_TRACED, complete_signal() considered
@@ -619,6 +512,7 @@ static struct utrace *get_utrace_lock(st
 	__acquires(utrace->lock)
 {
 	struct utrace *utrace;
+	int ret = 0;
 
 	/*
 	 * You must hold a ref to be making a call.  A call from within
@@ -650,7 +544,7 @@ static struct utrace *get_utrace_lock(st
 		return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS);
 	}
 
-	utrace = rcu_dereference(target->utrace);
+	utrace = task_utrace_struct(target);
 	smp_rmb();
 	if (unlikely(!utrace) || unlikely(target->exit_state == EXIT_DEAD)) {
 		/*
@@ -659,24 +553,26 @@ static struct utrace *get_utrace_lock(st
 		 * have started.  A call to this engine's report_reap
 		 * callback might already be in progress.
 		 */
-		utrace = ERR_PTR(-ESRCH);
+		ret = -ESRCH;
 	} else {
 		spin_lock(&utrace->lock);
-		if (unlikely(rcu_dereference(target->utrace) != utrace) ||
-		    unlikely(!engine->ops) ||
+		if (unlikely(!engine->ops) ||
 		    unlikely(engine->ops == &utrace_detached_ops)) {
 			/*
 			 * By the time we got the utrace lock,
 			 * it had been reaped or detached already.
 			 */
 			spin_unlock(&utrace->lock);
-			utrace = ERR_PTR(-ESRCH);
+			ret = -ESRCH;
 			if (!attached && engine->ops == &utrace_detached_ops)
-				utrace = ERR_PTR(-ERESTARTSYS);
+				ret = -ERESTARTSYS;
 		}
 	}
 	rcu_read_unlock();
 
+	if (ret)
+		return ERR_PTR(ret);
+
 	return utrace;
 }
 
@@ -732,8 +628,8 @@ restart:
 		goto restart;
 	}
 
-	rcu_utrace_free(utrace); /* Releases the lock.  */
-
+	CHECK_DEAD(utrace);
+	spin_unlock(&utrace->lock);
 	put_detached_list(&detached);
 }
 
@@ -744,15 +640,7 @@ restart:
  */
 void utrace_release_task(struct task_struct *target)
 {
-	struct utrace *utrace;
-
-	task_lock(target);
-	utrace = rcu_dereference(target->utrace);
-	rcu_assign_pointer(target->utrace, NULL);
-	task_unlock(target);
-
-	if (unlikely(!utrace))
-		return;
+	struct utrace *utrace = task_utrace_struct(target);
 
 	spin_lock(&utrace->lock);
 	/*
@@ -763,7 +651,7 @@ void utrace_release_task(struct task_str
 	if (likely(!list_empty(&utrace->attached))) {
 		utrace->reap = 1;
 
-		if (!(target->utrace_flags & DEATH_EVENTS)) {
+		if (!(utrace->flags & DEATH_EVENTS)) {
 			utrace_reap(target, utrace); /* Unlocks and frees.  */
 			return;
 		}
@@ -853,7 +741,7 @@ int utrace_set_events(struct task_struct
 	if (unlikely(IS_ERR(utrace)))
 		return PTR_ERR(utrace);
 
-	old_utrace_flags = target->utrace_flags;
+	old_utrace_flags = utrace->flags;
 	set_utrace_flags = events;
 	old_flags = engine->flags;
 
@@ -899,12 +787,12 @@ int utrace_set_events(struct task_struct
 			spin_unlock(&utrace->lock);
 			return -EALREADY;
 		}
-		target->utrace_flags |= set_utrace_flags;
+		utrace->flags |= set_utrace_flags;
 		read_unlock(&tasklist_lock);
 	}
 
 	engine->flags = events | (engine->flags & ENGINE_STOP);
-	target->utrace_flags |= set_utrace_flags;
+	utrace->flags |= set_utrace_flags;
 
 	if ((set_utrace_flags & UTRACE_EVENT_SYSCALL) &&
 	    !(old_utrace_flags & UTRACE_EVENT_SYSCALL))
@@ -961,7 +849,7 @@ static bool utrace_do_stop(struct task_s
 	 * through utrace_get_signal() before doing anything else.
 	 */
 	if (task_is_stopped(target) &&
-	    !(target->utrace_flags & UTRACE_EVENT(JCTL))) {
+	    !(utrace->flags & UTRACE_EVENT(JCTL))) {
 		utrace->stopped = 1;
 		return true;
 	}
@@ -974,10 +862,10 @@ static bool utrace_do_stop(struct task_s
 		 * if it has already been through
 		 * utrace_report_death(), or never will.
 		 */
-		if (!(target->utrace_flags & DEATH_EVENTS))
+		if (!(utrace->flags & DEATH_EVENTS))
 			utrace->stopped = stopped = true;
 	} else if (task_is_stopped(target)) {
-		if (!(target->utrace_flags & UTRACE_EVENT(JCTL)))
+		if (!(utrace->flags & UTRACE_EVENT(JCTL)))
 			utrace->stopped = stopped = true;
 	} else if (!utrace->report && !utrace->interrupt) {
 		utrace->report = 1;
@@ -1017,7 +905,7 @@ static void utrace_wakeup(struct task_st
 
 /*
  * This is called when there might be some detached engines on the list or
- * some stale bits in @task->utrace_flags.  Clean them up and recompute the
+ * some stale bits in @task->utrace.flags.  Clean them up and recompute the
  * flags.
  *
  * @action is NULL when @task is stopped and @utrace->stopped is set; wake
@@ -1064,7 +952,7 @@ static void utrace_reset(struct task_str
 		clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE);
 	}
 
-	task->utrace_flags = flags;
+	utrace->flags = flags;
 
 	if (wake)
 		utrace_wakeup(task, utrace);
@@ -1075,21 +963,8 @@ static void utrace_reset(struct task_str
 	if (flags) {
 		spin_unlock(&utrace->lock);
 	} else {
-		/*
-		 * No more engines, clear out the utrace.  Here we can race
-		 * with utrace_release_task().  If it gets task_lock()
-		 * first, then it cleans up this struct for us.
-		 */
-
-		task_lock(task);
-		if (unlikely(task->utrace != utrace)) {
-			task_unlock(task);
-			spin_unlock(&utrace->lock);
-		} else {
-			rcu_assign_pointer(task->utrace, NULL);
-			task_unlock(task);
-			rcu_utrace_free(utrace);
-		}
+		CHECK_DEAD(utrace);
+		spin_unlock(&utrace->lock);
 
 		if (action)
 			*action = UTRACE_RESUME;
@@ -1241,7 +1116,7 @@ int utrace_control(struct task_struct *t
 		    unlikely(utrace->reap)) {
 			spin_unlock(&utrace->lock);
 			return -ESRCH;
-		} else if (unlikely(target->utrace_flags & DEATH_EVENTS) ||
+		} else if (unlikely(utrace->flags & DEATH_EVENTS) ||
 			   unlikely(utrace->death)) {
 			/*
 			 * We have already started the death report, or
@@ -1464,7 +1339,7 @@ static void start_report(struct utrace *
  * returns from engine callbacks.  If any engine's last callback used
  * UTRACE_STOP, we do UTRACE_REPORT here to ensure we stop before user
  * mode.  If there were no callbacks made, it will recompute
- * @task->utrace_flags to avoid another false-positive.
+ * @task->utrace.flags to avoid another false-positive.
  */
 static void finish_report(struct utrace_report *report,
 			  struct task_struct *task, struct utrace *utrace)
@@ -1627,7 +1502,7 @@ void utrace_report_exec(struct linux_bin
 			struct pt_regs *regs)
 {
 	struct task_struct *task = current;
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
 	REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
@@ -1641,7 +1516,7 @@ void utrace_report_exec(struct linux_bin
 bool utrace_report_syscall_entry(struct pt_regs *regs)
 {
 	struct task_struct *task = current;
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
 	start_report(utrace);
@@ -1684,7 +1559,7 @@ bool utrace_report_syscall_entry(struct 
 void utrace_report_syscall_exit(struct pt_regs *regs)
 {
 	struct task_struct *task = current;
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
 	REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
@@ -1700,23 +1575,23 @@ void utrace_report_syscall_exit(struct p
 void utrace_report_clone(unsigned long clone_flags, struct task_struct *child)
 {
 	struct task_struct *task = current;
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
 	/*
 	 * We don't use the REPORT() macro here, because we need
-	 * to clear utrace->u.live.cloning before finish_report().
+	 * to clear utrace->cloning before finish_report().
 	 * After finish_report(), utrace can be a stale pointer
 	 * in cases when report.action is still UTRACE_RESUME.
 	 */
 	start_report(utrace);
-	utrace->u.live.cloning = child;
+	utrace->cloning = child;
 
 	REPORT_CALLBACKS(task, utrace, &report,
 			 UTRACE_EVENT(CLONE), report_clone,
 			 report.action, engine, task, clone_flags, child);
 
-	utrace->u.live.cloning = NULL;
+	utrace->cloning = NULL;
 	finish_report(&report, task, utrace);
 
 	/*
@@ -1739,7 +1614,7 @@ void utrace_report_clone(unsigned long c
  */
 void utrace_finish_vfork(struct task_struct *task)
 {
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 
 	spin_lock(&utrace->lock);
 	if (!utrace->vfork_stop)
@@ -1757,7 +1632,7 @@ void utrace_finish_vfork(struct task_str
 void utrace_report_jctl(int notify, int what)
 {
 	struct task_struct *task = current;
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 	bool was_stopped = task_is_stopped(task);
 
@@ -1768,29 +1643,17 @@ void utrace_report_jctl(int notify, int 
 	 *
 	 * While in TASK_STOPPED, we can be considered safely
 	 * stopped by utrace_do_stop() and detached asynchronously.
-	 * If we woke up and checked task->utrace_flags before that
+	 * If we woke up and checked task->utrace.flags before that
 	 * was finished, we might be here with utrace already
 	 * removed or in the middle of being removed.
 	 *
-	 * RCU makes it safe to get the utrace->lock even if it's
-	 * being freed.  Once we have that lock, either an external
-	 * detach has finished and this struct has been freed, or
-	 * else we know we are excluding any other detach attempt.
-	 *
 	 * If we are indeed attached, then make sure we are no
 	 * longer considered stopped while we run callbacks.
 	 */
-	rcu_read_lock();
-	utrace = rcu_dereference(task->utrace);
-	if (unlikely(!utrace)) {
-		rcu_read_unlock();
-		return;
-	}
 	spin_lock(&utrace->lock);
 	utrace->stopped = 0;
 	utrace->report = 0;
 	spin_unlock(&utrace->lock);
-	rcu_read_unlock();
 
 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
 	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
@@ -1825,7 +1688,7 @@ void utrace_report_jctl(int notify, int 
 void utrace_report_exit(long *exit_code)
 {
 	struct task_struct *task = current;
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 	long orig_code = *exit_code;
 
@@ -1935,7 +1798,7 @@ static void finish_resume_report(struct 
  */
 void utrace_resume(struct task_struct *task, struct pt_regs *regs)
 {
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 	struct utrace_attached_engine *engine, *next;
 
@@ -1987,13 +1850,13 @@ void utrace_resume(struct task_struct *t
 /*
  * Return true if current has forced signal_pending().
  *
- * This is called only when current->utrace_flags is nonzero, so we know
+ * This is called only when current->utrace.flags is nonzero, so we know
  * that current->utrace must be set.  It's not inlined in tracehook.h
  * just so that struct utrace can stay opaque outside this file.
  */
 bool utrace_interrupt_pending(void)
 {
-	return current->utrace->interrupt;
+	return current->utrace.interrupt;
 }
 
 /*
@@ -2034,7 +1897,7 @@ int utrace_get_signal(struct task_struct
 	__releases(task->sighand->siglock)
 	__acquires(task->sighand->siglock)
 {
-	struct utrace *utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	struct k_sigaction *ka;
 	INIT_REPORT(report);
 	struct utrace_attached_engine *engine, *next;
@@ -2043,44 +1906,13 @@ int utrace_get_signal(struct task_struct
 	u32 ret;
 	int signr;
 
-	/*
-	 * We could have been considered quiescent while we were in
-	 * TASK_STOPPED, and detached asynchronously.  If we woke up
-	 * and checked task->utrace_flags before that was finished,
-	 * we might be here with utrace already removed or in the
-	 * middle of being removed.
-	 */
-	rcu_read_lock();
-	utrace = rcu_dereference(task->utrace);
-	if (unlikely(!utrace)) {
-		rcu_read_unlock();
-		return 0;
-	}
-
 	if (utrace->interrupt || utrace->report || utrace->signal_handler) {
 		/*
 		 * We've been asked for an explicit report before we
 		 * even check for pending signals.
 		 */
-
 		spin_unlock_irq(&task->sighand->siglock);
-
-		/*
-		 * RCU makes it safe to get the utrace->lock even if
-		 * it's being freed.  Once we have that lock, either an
-		 * external detach has finished and this struct has been
-		 * freed, or else we know we are excluding any other
-		 * detach attempt.
-		 */
 		spin_lock(&utrace->lock);
-		rcu_read_unlock();
-
-		if (unlikely(task->utrace != utrace)) {
-			spin_unlock(&utrace->lock);
-			cond_resched();
-			return -1;
-		}
-
 		splice_attaching(utrace);
 
 		if (unlikely(!utrace->interrupt) && unlikely(!utrace->report))
@@ -2123,12 +1955,11 @@ int utrace_get_signal(struct task_struct
 		event = 0;
 		ka = NULL;
 		memset(return_ka, 0, sizeof *return_ka);
-	} else if ((task->utrace_flags & UTRACE_EVENT_SIGNAL_ALL) == 0) {
+	} else if ((utrace->flags & UTRACE_EVENT_SIGNAL_ALL) == 0) {
 		/*
 		 * If noone is interested in intercepting signals,
 		 * let the caller just dequeue them normally.
 		 */
-		rcu_read_unlock();
 		return 0;
 	} else {
 		if (unlikely(utrace->stopped)) {
@@ -2147,17 +1978,9 @@ int utrace_get_signal(struct task_struct
 			 */
 			spin_unlock_irq(&task->sighand->siglock);
 			spin_lock(&utrace->lock);
-			rcu_read_unlock();
-			if (unlikely(task->utrace != utrace)) {
-				spin_unlock(&utrace->lock);
-				cond_resched();
-				return -1;
-			}
 			utrace->stopped = 0;
 			spin_unlock(&utrace->lock);
 			spin_lock_irq(&task->sighand->siglock);
-		} else {
-			rcu_read_unlock();
 		}
 
 		/*
@@ -2209,7 +2032,7 @@ int utrace_get_signal(struct task_struct
 		 * Now that we know what event type this signal is,
 		 * we can short-circuit if noone cares about those.
 		 */
-		if ((task->utrace_flags & (event | UTRACE_EVENT(QUIESCE))) == 0)
+		if ((utrace->flags & (event | UTRACE_EVENT(QUIESCE))) == 0)
 			return signr;
 
 		/*
@@ -2398,7 +2221,7 @@ int utrace_get_signal(struct task_struct
  */
 void utrace_signal_handler(struct task_struct *task, int stepping)
 {
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 
 	spin_lock(&utrace->lock);
 
@@ -2544,23 +2367,19 @@ EXPORT_SYMBOL_GPL(task_user_regset_view)
  */
 struct task_struct *utrace_tracer_task(struct task_struct *target)
 {
-	struct utrace *utrace;
+	struct utrace *utrace = task_utrace_struct(target);
 	struct task_struct *tracer = NULL;
+	struct list_head *pos, *next;
+	struct utrace_attached_engine *engine;
+	const struct utrace_engine_ops *ops;
 
-	utrace = rcu_dereference(target->utrace);
-	if (utrace != NULL) {
-		struct list_head *pos, *next;
-		struct utrace_attached_engine *engine;
-		const struct utrace_engine_ops *ops;
-		list_for_each_safe(pos, next, &utrace->attached) {
-			engine = list_entry(pos, struct utrace_attached_engine,
-					    entry);
-			ops = rcu_dereference(engine->ops);
-			if (ops->tracer_task) {
-				tracer = (*ops->tracer_task)(engine, target);
-				if (tracer != NULL)
-					break;
-			}
+	list_for_each_safe(pos, next, &utrace->attached) {
+		engine = list_entry(pos, struct utrace_attached_engine, entry);
+		ops = rcu_dereference(engine->ops);
+		if (ops->tracer_task) {
+			tracer = (*ops->tracer_task)(engine, target);
+			if (tracer != NULL)
+				break;
 		}
 	}
 
@@ -2573,7 +2392,7 @@ struct task_struct *utrace_tracer_task(s
  */
 int utrace_unsafe_exec(struct task_struct *task)
 {
-	struct utrace *utrace = task->utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	struct utrace_attached_engine *engine, *next;
 	const struct utrace_engine_ops *ops;
 	int unsafe = 0;
@@ -2592,11 +2411,11 @@ int utrace_unsafe_exec(struct task_struc
  */
 void task_utrace_proc_status(struct seq_file *m, struct task_struct *p)
 {
-	struct utrace *utrace = rcu_dereference(p->utrace);
-	if (unlikely(utrace))
-		seq_printf(m, "Utrace: %lx%s%s%s\n",
-			   p->utrace_flags,
-			   utrace->stopped ? " (stopped)" : "",
-			   utrace->report ? " (report)" : "",
-			   utrace->interrupt ? " (interrupt)" : "");
+	struct utrace *utrace = task_utrace_struct(p);
+
+	seq_printf(m, "Utrace: %lx%s%s%s\n",
+			utrace->flags,
+			utrace->stopped ? " (stopped)" : "",
+			utrace->report ? " (report)" : "",
+			utrace->interrupt ? " (interrupt)" : "");
 }
Index: utrace-19jan/include/linux/tracehook.h
===================================================================
--- utrace-19jan.orig/include/linux/tracehook.h
+++ utrace-19jan/include/linux/tracehook.h
@@ -370,8 +370,7 @@ static inline void tracehook_report_vfor
 static inline void tracehook_prepare_release_task(struct task_struct *task)
 {
 	smp_mb();
-	if (task_utrace_struct(task) != NULL)
-		utrace_release_task(task);
+	utrace_release_task(task);
 }
 
 /**
@@ -385,21 +384,8 @@ static inline void tracehook_prepare_rel
  */
 static inline void tracehook_finish_release_task(struct task_struct *task)
 {
-	int bad = 0;
 	ptrace_release_task(task);
 	BUG_ON(task->exit_state != EXIT_DEAD);
-	if (unlikely(task_utrace_struct(task) != NULL)) {
-		/*
-		 * In a race condition, utrace_attach() will temporarily set
-		 * it, but then check @task->exit_state and clear it.  It does
-		 * all this under task_lock(), so we take the lock to check
-		 * that there is really a bug and not just that known race.
-		 */
-		task_lock(task);
-		bad = unlikely(task_utrace_struct(task) != NULL);
-		task_unlock(task);
-	}
-	BUG_ON(bad);
 }
 
 /**
Index: utrace-19jan/kernel/ptrace.c
===================================================================
--- utrace-19jan.orig/kernel/ptrace.c
+++ utrace-19jan/kernel/ptrace.c
@@ -778,7 +778,16 @@ static inline bool exclude_ptrace(struct
  */
 static inline bool exclude_ptrace(struct task_struct *task)
 {
-	return unlikely(!!task_utrace_struct(task));
+	struct utrace *utrace = task_utrace_struct(task);
+
+	spin_lock(&utrace->lock);
+	if (list_empty(&utrace->attached) && list_empty(&utrace->attaching)) {
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
+	spin_unlock(&utrace->lock);
+	return true;
 }
 #endif
 

From roland at redhat.com  Mon Jan 19 23:20:31 2009
From: roland at redhat.com (Roland McGrath)
Date: Mon, 19 Jan 2009 15:20:31 -0800 (PST)
Subject: [PATCH] Imbed struct utrace in task_struct
In-Reply-To: Ananth N Mavinakayanahalli's message of  Monday,
	19 January 2009 18:58:38 +0530 <20090119132838.GA3542@in.ibm.com>
References: <20090119132838.GA3542@in.ibm.com>
Message-ID: <20090119232031.82675FC3C6@magilla.sf.frob.com>

Thanks for working on this, Ananth.  (Btw, it's "embed.")

I think it would be less disruptive (and materially no different)
to leave utrace_flags as it is.  That field is the one (and only)
that is used in hot paths (or used anywhere outside utrace.c).
It might in future get moved around to stay in a cache-hot part
of task_struct, for example.

The long comment above struct utrace is really all about implementation
details inside utrace.c and I don't think you should move that commentary
to the header file.  Instead, put a comment saying that the contents of
struct utrace and their use is entirely private to kernel/utrace.c and it
is only defined in the header to make its size known for struct task_struct
layout (and init_task.h).

I committed some cosmetic changes that will make for a little less flutter
in your patch.


Thanks,
Roland


From dvlasenk at redhat.com  Tue Jan 20 11:24:27 2009
From: dvlasenk at redhat.com (Denys Vlasenko)
Date: Tue, 20 Jan 2009 12:24:27 +0100
Subject: Analysis  of  SINGLESTEP
In-Reply-To: <1231903215.3704.0.camel@localhost>
References: <494A13F7.8080209@oracle.com>
	<20081219082938.A068EFC339@magilla.sf.frob.com>
	<496C2968.2070309@oracle.com> <1231899507.4285.2.camel@localhost>
	<20090114022903.486F2FC3DD@magilla.sf.frob.com>
	<1231903215.3704.0.camel@localhost>
Message-ID: <1232450667.3797.8.camel@localhost>

Hi Roland,

On Wed, 2009-01-14 at 04:20 +0100, Denys Vlasenko wrote:
> On Tue, 2009-01-13 at 18:29 -0800, Roland McGrath wrote:
> > > Yes. In my testing, latest Fedora kernels fixed ALL regressions
> > [...]
> > > Impressive. Thanks a lot Roland.
> > 
> > Don't be so impressed. ;-) 
> > Last I checked, attach-into-signal failed some of the time.
> > i.e.
> > 
> > 	while ./tests/attach-into-signal; do : ; done
> > 
> > won't go forever.  Perhaps the test itself should do many iterations.
> 
> Indeed.
> 
> # while ./tests/attach-into-signal; do echo -n . ; done
> .......................................attach-into-signal:
> attach-into-signal.c:161: reproduce: Unexpected error: No such process.
> attach-into-signal: attach-into-signal.c:68: handler_fail: Assertion `0'
> failed.
> /bin/bash: line 1:  8230
> Aborted                 ./tests/attach-into-signal

Forgot to email you last week:

I modified this test to do more iterations, and to be affected by
$TESTTIME. With TESTTIME >= 60 it fails for me fairly reliably.
--
vda


From ananth at in.ibm.com  Tue Jan 20 16:30:24 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 20 Jan 2009 22:00:24 +0530
Subject: [PATCH] Imbed struct utrace in task_struct
In-Reply-To: <20090119232031.82675FC3C6@magilla.sf.frob.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
Message-ID: <20090120163024.GA5289@in.ibm.com>

On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote:

> (Btw, it's "embed.")

Indeed :-)

> I think it would be less disruptive (and materially no different)
> to leave utrace_flags as it is.  That field is the one (and only)
> that is used in hot paths (or used anywhere outside utrace.c).
> It might in future get moved around to stay in a cache-hot part
> of task_struct, for example.
> 
> The long comment above struct utrace is really all about implementation
> details inside utrace.c and I don't think you should move that commentary
> to the header file.  Instead, put a comment saying that the contents of
> struct utrace and their use is entirely private to kernel/utrace.c and it
> is only defined in the header to make its size known for struct task_struct
> layout (and init_task.h).

Agreed.

> I committed some cosmetic changes that will make for a little less flutter
> in your patch.

Thanks! Working on it at the moment. I was able to test the new patch on
powerpc without issues, but haven't been able to test it on x86
successfully yet. Will post the patch soon.

Ananth


From confirm-s2-ppk21cxft5ojs4eh33mdp3r2fckccjvk-utrace-devel=redhat.com at yahoogrupos.com.br  Tue Jan 20 19:14:20 2009
From: confirm-s2-ppk21cxft5ojs4eh33mdp3r2fckccjvk-utrace-devel=redhat.com at yahoogrupos.com.br (Yahoo! Grupos)
Date: 20 Jan 2009 19:14:20 -0000
Subject: Confirma =?iso-8859-1?q?=E7=E3?= o de pedido para entrar no grupo
	de_amigo_para_amigo
Message-ID: <1232478860.16.72496.w124@yahoogrupos.com.br>


Ol? utrace-devel at redhat.com,

Recebemos sua solicita??o para entrar no grupo de_amigo_para_amigo 
do Yahoo! Grupos, um servi?o de comunidades online gratuito e 
super f?cil de usar.

Este pedido expirar? em 7 dias.

PARA ENTRAR NESTE GRUPO: 

1) V? para o site do Yahoo! Grupos clicando neste link:

   http://br.groups.yahoo.com/i?i=ppk21cxft5ojs4eh33mdp3r2fckccjvk&e=utrace-devel%40redhat%2Ecom 

  (Se n?o funcionar, use os comandos para cortar e colar o link acima na
   barra de endere?o do seu navegador.)

-OU-

2) RESPONDA a este e-mail clicando em "Responder" e depois em "Enviar",
   no seu programa de e-mail.

Se voc? n?o fez esta solicita??o ou se n?o tem interesse em entrar no grupo
de_amigo_para_amigo, por favor, ignore esta mensagem.

Sauda??es,

Atendimento ao usu?rio do Yahoo! Grupos 


O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html
 

From bungapapan at gmail.com  Wed Jan 21 00:48:36 2009
From: bungapapan at gmail.com (Bunga Papan Untuk Ucapan)
Date: Wed, 21 Jan 2009 07:48:36 +0700
Subject: =?iso-8859-1?q?PERKENALAN_=3A_=91Sakura_Florist=94_=3D_menerima_?=
	=?iso-8859-1?q?pesanan_khusus_pembuatan_bunga_papan_?=
Message-ID: <200901210047.n0L0kn5C032574@mx2.redhat.com>

Menerima pesanan khusus pembuatan ?Bunga Papan?? untuk ucapan selamat pernikahan, ucapan belasungkawa, ucapan untuk peresmian usaha, ulang tahun, dll untuk daerah jabodetabek.
Pesanan dari luar kota untuk relasi anda di Jakarta bisa menggunakan jasa kami.
Harga mulai Rp.350.000,-
Terimakasih,
021 93606390
0818745955
http://www.bungapapan.multiply.com/
email : bungapapan at gmail.com
messenger : bungapapan at hotmail.com  

From asee at asee2009conference.org  Wed Jan 21 00:55:42 2009
From: asee at asee2009conference.org (ASEE 2009)
Date: Tue, 20 Jan 2009 16:55:42 -0800
Subject: Second CFP: American Society of Engineering Education Northeast
	Conference
Message-ID: <200901210055.n0L0thC1028538@mx3.redhat.com>

Dear Colleagues,


If you received this email in error, please forward it to  the appropriate
department at your institution. If you wish to unsubscribe please follow
the unsubscribe link at bottom of the email.

Please do not reply to this message. If you need to contact us  please
email us at info at asee2009conference.org


*********************************************************************
*            American Society for Engineering Education             *
*              ASEE Spring 2009 Northeast Conference                *
*                                                                   *
*                                                                   *
*                       University of Bridgeport                    *
*                                                                   *
*                                                                   *
*                    http://www.asee2009conference.org              *
*                                                                   *
*                                                                   *
*                            April 3-4, 2009                        *
*                                                                   *
*********************************************************************


---------------------------------------------------------------------
CONFERENCE  OVERVIEW
---------------------------------------------------------------------

The Spring 2009 Northeast ASEE Conference will be held on April 3-4, 2009
at the University of Bridgeport, Bridgeport, Connecticut, U.S.A. This
year's conference theme is: Engineering in the New Global Economy.

In the coming years, our world will continue to face economical,
environmental and energy related problems. How is Engineering and
Engineering Technology Education responding to the needs of our society and
the world? This will be the theme for an exhilarating and thought provoking
weekend of professional workshops, presentations, and discussions at the
University of Bridgeport.

The ASEE Northeast Section is soliciting faculty papers, student papers and
student posters which address the various challenges and paradigms in this
technological world through research and instructional programs in
Engineering and Engineering Technology education. There are three
conference tracks:

1. Regular/ faculty papers
2. Student papers and
3. Student posters

The deadline for abstract submission is February 27th, 2009. Prospective
authors are invited to submit their abstracts online in Microsoft Word or
Adobe PDF format through the conference website at
http://www.asee2009conference.org

Suggested conference topics are listed below. Other innovations in course
and laboratory experiences and assessments are also most welcome for
submission:

? Chemical and Biological Engineering
? Civil & Environmental Engineering
? Electrical & Computer Engineering
? Engineering Technology/ Community Colleges 
? Industrial, Automation and Manufacturing Engineering 
? Engineering Technology and Community Colleges 
? Innovations In Engineering Education 
? First Year Experiences 
? K-12 Education (Engineering Curriculum Integration) 
? Mechanical Engineering 
? Computer Science and Information Technology 
? Women in Engineering and Computer Science 
? Robotics 
? Service Learning 
? Sustainability 
? Design Projects 
? Engineering and Technology in the Liberal Arts 
? Systems Engineering 
? Globalization 
? Ethics 
? Diversity In Engineering 
? Multidisciplinary Research


Paper and other Proposal Submissions

=================

Prospective authors are invited to submit their abstracts online in
Microsoft Word or Adobe PDF format through the website of the conference at
http://www.asee2009conference.org. Proposals for special sessions,
tutorials, worskshops and exhibitions are also weclcome. Please check the
conference website regarding instructions for these proposal submissions.


Important Dates
===============

Abstracts due                        27th February, 2009
Acceptance notification              6th March, 2009
Final manuscript & Registration due  20th March, 2009


------------------------------------------------------------------------
Sarosh Patel
ASEE NE 2009 Technical Support Team
University of Bridgeport
221 University Avenue                 e-mail:info at asee2009conference.org
Bridgeport, CT 06604, U.S.A.           http://www.asee2009conference.org
------------------------------------------------------------------------


Click here on http://server1.streamsend.com/streamsend/unsubscribe.php?cd=3326&md=182&ud=0b3cfeb6dd47f09dcb3a2311bd8cb6b3 <http://server1.streamsend.com/streamsend/unsubscribe.php?cd=3326&md=182&ud=0b3cfeb6dd47f09dcb3a2311bd8cb6b3> to update your profile or Unsubscribe

From ananth at in.ibm.com  Wed Jan 21 06:28:25 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Wed, 21 Jan 2009 11:58:25 +0530
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: <20090119232031.82675FC3C6@magilla.sf.frob.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
Message-ID: <20090121062825.GD3251@in.ibm.com>

On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote:
> Thanks for working on this, Ananth.  (Btw, it's "embed.")
> 
> I think it would be less disruptive (and materially no different)
> to leave utrace_flags as it is.  That field is the one (and only)
> that is used in hot paths (or used anywhere outside utrace.c).
> It might in future get moved around to stay in a cache-hot part
> of task_struct, for example.
> 
> The long comment above struct utrace is really all about implementation
> details inside utrace.c and I don't think you should move that commentary
> to the header file.  Instead, put a comment saying that the contents of
> struct utrace and their use is entirely private to kernel/utrace.c and it
> is only defined in the header to make its size known for struct task_struct
> layout (and init_task.h).
> 
> I committed some cosmetic changes that will make for a little less flutter
> in your patch.

Here is V2 of the patch. Tested and works fine. Same two tests on
powerpc fail, all tests pass on x86, while there are some occurances of
the ptrace.c WARN_ON.

Roland,
I've tried to tweak the comments appropriately. Please feel free to
modify them as you consider fit.

Signed-off-by: Ananth N Mavinakayanahalli <ananth at in.ibm.com>

---
 include/linux/sched.h     |    3 
 include/linux/tracehook.h |   16 --
 include/linux/utrace.h    |   48 ++++--
 kernel/ptrace.c           |   11 +
 kernel/utrace.c           |  331 +++++++++++-----------------------------------
 5 files changed, 126 insertions(+), 283 deletions(-)

Index: utrace-20jan/include/linux/sched.h
===================================================================
--- utrace-20jan.orig/include/linux/sched.h
+++ utrace-20jan/include/linux/sched.h
@@ -88,6 +88,7 @@ struct sched_param {
 #include <linux/kobject.h>
 #include <linux/latencytop.h>
 #include <linux/cred.h>
+#include <linux/utrace.h>
 
 #include <asm/processor.h>
 
@@ -1267,7 +1268,7 @@ struct task_struct {
 	seccomp_t seccomp;
 
 #ifdef CONFIG_UTRACE
-	struct utrace *utrace;
+	struct utrace utrace;
 	unsigned long utrace_flags;
 #endif
 
Index: utrace-20jan/include/linux/utrace.h
===================================================================
--- utrace-20jan.orig/include/linux/utrace.h
+++ utrace-20jan/include/linux/utrace.h
@@ -33,13 +33,37 @@
 #include <linux/list.h>
 #include <linux/kref.h>
 #include <linux/signal.h>
-#include <linux/sched.h>
+#include <linux/pid.h>
 
 struct linux_binprm;
+struct linux_binfmt;
 struct pt_regs;
-struct utrace;
+struct task_struct;
 struct user_regset;
 struct user_regset_view;
+struct seq_file;
+
+/*
+ * Per-thread structure task_struct.utrace refers to.
+ *
+ * The structure and its contents are private to kernel/utrace.c and is
+ * defined here only so its size is known for struct task_struct layout
+ */
+struct utrace {
+	struct task_struct *cloning;
+	struct list_head attached, attaching;
+	spinlock_t lock;
+
+	struct utrace_attached_engine *reporting;
+
+	unsigned int stopped:1;
+	unsigned int report:1;
+	unsigned int interrupt:1;
+	unsigned int signal_handler:1;
+	unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */
+	unsigned int death:1;	/* in utrace_report_death() now */
+	unsigned int reap:1;	/* release_task() has run */
+};
 
 /*
  * Event bits passed to utrace_set_events().
@@ -133,7 +157,7 @@ static inline struct utrace *task_utrace
 {
 	return NULL;
 }
-static inline void utrace_init_task(struct task_struct *child)
+static inline void utrace_init_task(struct task_struct *task)
 {
 }
 
@@ -144,22 +168,10 @@ static inline void task_utrace_proc_stat
 
 #else  /* CONFIG_UTRACE */
 
-static inline unsigned long task_utrace_flags(struct task_struct *task)
-{
-	return task->utrace_flags;
-}
-
-static inline struct utrace *task_utrace_struct(struct task_struct *task)
-{
-	return task->utrace;
-}
-
-static inline void utrace_init_task(struct task_struct *child)
-{
-	child->utrace_flags = 0;
-	child->utrace = NULL;
-}
+#define task_utrace_flags(task)		((task)->utrace_flags)
+#define task_utrace_struct(task)	(&(task)->utrace)
 
+void utrace_init_task(struct task_struct *task);
 void task_utrace_proc_status(struct seq_file *m, struct task_struct *p);
 
 /**
Index: utrace-20jan/kernel/utrace.c
===================================================================
--- utrace-20jan.orig/kernel/utrace.c
+++ utrace-20jan/kernel/utrace.c
@@ -10,103 +10,56 @@
  * Red Hat Author: Roland McGrath.
  */
 
-#include <linux/utrace.h>
+#include <linux/sched.h>
 #include <linux/tracehook.h>
 #include <linux/regset.h>
 #include <asm/syscall.h>
 #include <linux/ptrace.h>
 #include <linux/err.h>
-#include <linux/sched.h>
 #include <linux/freezer.h>
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/seq_file.h>
+#include <linux/utrace.h>
 
 
 /*
- * Per-thread structure task_struct.utrace points to.
+ * struct utrace, defined in utrace.h is private to this file. Its
+ * defined there just so struct task_struct knows its size.
  *
- * The task itself never has to worry about this going away after
- * some event is found set in task_struct.utrace_flags.
- * Once created, this pointer is changed only when the task is quiescent
- * (TASK_TRACED or TASK_STOPPED with the siglock held, or dead).
- *
- * For other parties, the pointer to this is protected by RCU and
- * task_lock.  Since call_rcu is never used while the thread is alive and
- * using this struct utrace, we can overlay the RCU data structure used
- * only for a dead struct with some local state used only for a live utrace
- * on an active thread.
- *
- * The two lists @attached and @attaching work together for smooth
- * asynchronous attaching with low overhead.  Modifying either list
- * requires @lock.  The @attaching list can be modified any time while
- * holding @lock.  New engines being attached always go on this list.
- *
- * The @attached list is what the task itself uses for its reporting
- * loops.  When the task itself is not quiescent, it can use the
- * @attached list without taking any lock.  Noone may modify the list
- * when the task is not quiescent.  When it is quiescent, that means
- * that it won't run again without taking @lock itself before using
- * the list.
+ * The two lists @utrace->attached and @utrace->attaching work together
+ * for smooth asynchronous attaching with low overhead.  Modifying
+ * either list requires @utrace->lock.  The @utrace->attaching list
+ * can be modified any time while holding @utrace->lock.  New engines
+ * being attached always go on this list.
+ *
+ * The @utrace->attached list is what the task itself uses for its
+ * reporting loops.  When the task itself is not quiescent, it can
+ * use the @utrace->attached list without taking any lock.  Noone
+ * may modify the list when the task is not quiescent.  When it is
+ * quiescent, that means that it won't run again without taking
+ * @utrace->lock itself before using the list.
  *
  * At each place where we know the task is quiescent (or it's current),
- * while holding @lock, we call splice_attaching(), below.  This moves
- * the @attaching list members on to the end of the @attached list.
- * Since this happens at the start of any reporting pass, any new
- * engines attached asynchronously go on the stable @attached list
- * in time to have their callbacks seen.
- */
-struct utrace {
-	union {
-		struct rcu_head dead;
-		struct {
-			struct task_struct *cloning;
-		} live;
-	} u;
-
-	struct list_head attached, attaching;
-	spinlock_t lock;
-
-	struct utrace_attached_engine *reporting;
-
-	unsigned int stopped:1;
-	unsigned int report:1;
-	unsigned int interrupt:1;
-	unsigned int signal_handler:1;
-	unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */
-	unsigned int death:1;	/* in utrace_report_death() now */
-	unsigned int reap:1;	/* release_task() has run */
-};
+ * while holding @utrace->lock, we call splice_attaching(), below.
+ * This moves the @utrace->attaching list members on to the end of
+ * the @utrace->attached list. Since this happens at the start of
+ * any reporting pass, any new engines attached asynchronously go
+ * on the stable @utrace->attached list in time to have their
+ * callbacks seen.
+ */
 
-static struct kmem_cache *utrace_cachep;
 static struct kmem_cache *utrace_engine_cachep;
 static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */
 
 static int __init utrace_init(void)
 {
-	utrace_cachep = KMEM_CACHE(utrace, SLAB_PANIC);
 	utrace_engine_cachep = KMEM_CACHE(utrace_attached_engine, SLAB_PANIC);
 	return 0;
 }
 module_init(utrace_init);
 
-static void utrace_free(struct rcu_head *rhead)
-{
-	struct utrace *utrace = container_of(rhead, struct utrace, u.dead);
-	kmem_cache_free(utrace_cachep, utrace);
-}
-
-/*
- * Called with utrace locked.  Clean it up and free it via RCU.
- */
-static void rcu_utrace_free(struct utrace *utrace)
-	__releases(utrace->lock)
-{
-	spin_unlock(&utrace->lock);
-	call_rcu(&utrace->u.dead, utrace_free);
-}
-
 /*
  * This is called with @utrace->lock held when the task is safely
  * quiescent, i.e. it won't consult utrace->attached without the lock.
@@ -172,8 +125,12 @@ static inline bool exclude_utrace(struct
 /*
  * Initialize the struct, initially zero'd.
  */
-static inline void init_utrace_struct(struct utrace *utrace)
+void utrace_init_task(struct task_struct *task)
 {
+	struct utrace *utrace = task_utrace_struct(task);
+
+	task->utrace_flags = 0;
+	utrace->cloning = NULL;
 	INIT_LIST_HEAD(&utrace->attached);
 	INIT_LIST_HEAD(&utrace->attaching);
 	spin_lock_init(&utrace->lock);
@@ -181,8 +138,6 @@ static inline void init_utrace_struct(st
 
 /*
  * Called without locks.
- * Allocate target->utrace and install engine in it.  If we lose a race in
- * setting it up, return -EAGAIN.  This function mediates startup races.
  * The creating parent task has priority, and other callers will delay here
  * to let its call succeed and take the new utrace lock first.
  */
@@ -199,8 +154,8 @@ static int utrace_first_engine(struct ta
 	 * report_clone hook has had a chance to run.
 	 */
 	if (target->flags & PF_STARTING) {
-		utrace = current->utrace;
-		if (!utrace || utrace->u.live.cloning != target) {
+		utrace = task_utrace_struct(current);
+		if (utrace->cloning != target) {
 			yield();
 			if (signal_pending(current))
 				return -ERESTARTNOINTR;
@@ -208,11 +163,7 @@ static int utrace_first_engine(struct ta
 		}
 	}
 
-	utrace = kmem_cache_zalloc(utrace_cachep, GFP_KERNEL);
-	if (unlikely(!utrace))
-		return -ENOMEM;
-	init_utrace_struct(utrace);
-
+	utrace = task_utrace_struct(target);
 	list_add(&engine->entry, &utrace->attached);
 
 	ret = -EAGAIN;
@@ -220,9 +171,7 @@ static int utrace_first_engine(struct ta
 	task_lock(target);
 	if (exclude_utrace(target)) {
 		ret = -EBUSY;
-	} else if (likely(!target->utrace)) {
-		rcu_assign_pointer(target->utrace, utrace);
-
+	} else {
 		/*
 		 * The task_lock protects us against another thread doing
 		 * the same thing.  We might still be racing against
@@ -240,30 +189,20 @@ static int utrace_first_engine(struct ta
 			spin_unlock(&utrace->lock);
 			return 0;
 		}
-
-		/*
-		 * The target has already been through release_task.
-		 * Our caller will restart and notice it's too late now.
-		 */
-		target->utrace = NULL;
 	}
 
 	/*
-	 * Another engine attached first, so there is a struct already.
-	 * A null return says to restart looking for the existing one.
+	 * Another engine attached first.
+	 * Restart looking for the existing one.
 	 */
 	task_unlock(target);
 	spin_unlock(&utrace->lock);
-	kmem_cache_free(utrace_cachep, utrace);
 
 	return ret;
 }
 
 /*
- * Called with rcu_read_lock() held.
- * Lock utrace and verify that it's still installed in target->utrace.
- * If not, return -EAGAIN.
- * Then enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE.
+ * Enqueue engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE.
  */
 static int utrace_second_engine(struct task_struct *target,
 				struct utrace *utrace,
@@ -276,13 +215,7 @@ static int utrace_second_engine(struct t
 
 	spin_lock(&utrace->lock);
 
-	if (unlikely(rcu_dereference(target->utrace) != utrace)) {
-		/*
-		 * We lost a race with other CPUs doing a sequence
-		 * of detach and attach before we got in.
-		 */
-		ret = -EAGAIN;
-	} else if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
+	if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
 		   unlikely(matching_engine(utrace, flags, ops, data))) {
 		ret = -EEXIST;
 	} else {
@@ -344,18 +277,15 @@ struct utrace_attached_engine *utrace_at
 {
 	struct utrace *utrace;
 	struct utrace_attached_engine *engine;
-	int ret;
+	int ret = 0;
 
 restart:
-	rcu_read_lock();
-	utrace = rcu_dereference(target->utrace);
-	smp_rmb();
+	utrace = task_utrace_struct(target);
 	if (unlikely(target->exit_state == EXIT_DEAD)) {
 		/*
 		 * The target has already been reaped.
 		 * Check this first; a race with reaping may lead to restart.
 		 */
-		rcu_read_unlock();
 		if (!(flags & UTRACE_ATTACH_CREATE))
 			return ERR_PTR(-ENOENT);
 		return ERR_PTR(-ESRCH);
@@ -363,19 +293,14 @@ restart:
 
 	if (!(flags & UTRACE_ATTACH_CREATE)) {
 		engine = NULL;
-		if (utrace) {
-			spin_lock(&utrace->lock);
-			engine = matching_engine(utrace, flags, ops, data);
-			if (engine)
-				utrace_engine_get(engine);
-			spin_unlock(&utrace->lock);
-		}
-		rcu_read_unlock();
+		spin_lock(&utrace->lock);
+		engine = matching_engine(utrace, flags, ops, data);
+		if (engine)
+			utrace_engine_get(engine);
+		spin_unlock(&utrace->lock);
 		return engine ?: ERR_PTR(-ENOENT);
 	}
 
-	rcu_read_unlock();
-
 	if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops))
 		return ERR_PTR(-EINVAL);
 
@@ -398,15 +323,12 @@ restart:
 	engine->ops = ops;
 	engine->data = data;
 
-	rcu_read_lock();
-	utrace = rcu_dereference(target->utrace);
-	if (!utrace) {
-		rcu_read_unlock();
+	if ((ret == 0) && (list_empty(&utrace->attached))) {
+		/* First time here, set engines up */
 		ret = utrace_first_engine(target, engine);
 	} else {
 		ret = utrace_second_engine(target, utrace, engine,
 					   flags, ops, data);
-		rcu_read_unlock();
 	}
 
 	if (unlikely(ret)) {
@@ -555,28 +477,23 @@ static bool utrace_stop(struct task_stru
 	try_to_freeze();
 
 	killed = false;
-	rcu_read_lock();
-	utrace = rcu_dereference(task->utrace);
-	if (utrace) {
+	/*
+	 * utrace_wakeup() clears @utrace->stopped before waking us up.
+	 * We're officially awake if it's clear.
+	 */
+	spin_lock(&utrace->lock);
+	if (unlikely(utrace->stopped)) {
 		/*
-		 * utrace_wakeup() clears @utrace->stopped before waking us up.
-		 * We're officially awake if it's clear.
+		 * If we're here with it still set, it must have been
+		 * signal_wake_up() instead, waking us up for a SIGKILL.
 		 */
-		spin_lock(&utrace->lock);
-		if (unlikely(utrace->stopped)) {
-			/*
-			 * If we're here with it still set, it must have been
-			 * signal_wake_up() instead, waking us up for a SIGKILL.
-			 */
-			spin_lock_irq(&task->sighand->siglock);
-			WARN_ON(!sigismember(&task->pending.signal, SIGKILL));
-			spin_unlock_irq(&task->sighand->siglock);
-			utrace->stopped = 0;
-			killed = true;
-		}
-		spin_unlock(&utrace->lock);
+		spin_lock_irq(&task->sighand->siglock);
+		WARN_ON(!sigismember(&task->pending.signal, SIGKILL));
+		spin_unlock_irq(&task->sighand->siglock);
+		utrace->stopped = 0;
+		killed = true;
 	}
-	rcu_read_unlock();
+	spin_unlock(&utrace->lock);
 
 	/*
 	 * While we were in TASK_TRACED, complete_signal() considered
@@ -613,6 +530,7 @@ static struct utrace *get_utrace_lock(st
 	__acquires(utrace->lock)
 {
 	struct utrace *utrace;
+	int ret = 0;
 
 	/*
 	 * You must hold a ref to be making a call.  A call from within
@@ -644,33 +562,34 @@ static struct utrace *get_utrace_lock(st
 		return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS);
 	}
 
-	utrace = rcu_dereference(target->utrace);
+	utrace = task_utrace_struct(target);
 	smp_rmb();
-	if (unlikely(!utrace) || unlikely(target->exit_state == EXIT_DEAD)) {
+	if (unlikely(target->exit_state == EXIT_DEAD)) {
 		/*
 		 * If all engines detached already, utrace is clear.
 		 * Otherwise, we're called after utrace_release_task might
 		 * have started.  A call to this engine's report_reap
 		 * callback might already be in progress.
 		 */
-		utrace = ERR_PTR(-ESRCH);
+		ret = -ESRCH;
 	} else {
 		spin_lock(&utrace->lock);
-		if (unlikely(rcu_dereference(target->utrace) != utrace) ||
-		    unlikely(!engine->ops) ||
+		if (unlikely(!engine->ops) ||
 		    unlikely(engine->ops == &utrace_detached_ops)) {
 			/*
 			 * By the time we got the utrace lock,
 			 * it had been reaped or detached already.
 			 */
 			spin_unlock(&utrace->lock);
-			utrace = ERR_PTR(-ESRCH);
+			ret = -ESRCH;
 			if (!attached && engine->ops == &utrace_detached_ops)
-				utrace = ERR_PTR(-ERESTARTSYS);
+				ret = -ERESTARTSYS;
 		}
 	}
 	rcu_read_unlock();
 
+	if (ret)
+		return ERR_PTR(ret);
 	return utrace;
 }
 
@@ -690,7 +609,7 @@ static void put_detached_list(struct lis
 
 /*
  * Called with utrace->lock held.
- * Notify and clean up all engines, then free utrace.
+ * Notify and clean up all engines.
  */
 static void utrace_reap(struct task_struct *target, struct utrace *utrace)
 	__releases(utrace->lock)
@@ -726,33 +645,23 @@ restart:
 		goto restart;
 	}
 
-	rcu_utrace_free(utrace); /* Releases the lock.  */
-
+	spin_unlock(&utrace->lock);
 	put_detached_list(&detached);
 }
 
 #define DEATH_EVENTS (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE))
 
 /*
- * Called by release_task.  After this, target->utrace must be cleared.
+ * Called by release_task.
  */
 void utrace_release_task(struct task_struct *target)
 {
-	struct utrace *utrace;
-
-	task_lock(target);
-	utrace = rcu_dereference(target->utrace);
-	rcu_assign_pointer(target->utrace, NULL);
-	task_unlock(target);
-
-	if (unlikely(!utrace))
-		return;
+	struct utrace *utrace = task_utrace_struct(target);
 
 	spin_lock(&utrace->lock);
 	/*
-	 * If the list is empty, utrace is already on its way to be freed.
 	 * We raced with detach and we won the task_lock race but lost the
-	 * utrace->lock race.  All we have to do is let RCU run.
+	 * utrace->lock race.
 	 */
 	if (likely(!list_empty(&utrace->attached))) {
 		utrace->reap = 1;
@@ -1066,25 +975,8 @@ static void utrace_reset(struct task_str
 	/*
 	 * If any engines are left, we're done.
 	 */
-	if (flags) {
-		spin_unlock(&utrace->lock);
-	} else {
-		/*
-		 * No more engines, clear out the utrace.  Here we can race
-		 * with utrace_release_task().  If it gets task_lock()
-		 * first, then it cleans up this struct for us.
-		 */
-
-		task_lock(task);
-		if (unlikely(task->utrace != utrace)) {
-			task_unlock(task);
-			spin_unlock(&utrace->lock);
-		} else {
-			rcu_assign_pointer(task->utrace, NULL);
-			task_unlock(task);
-			rcu_utrace_free(utrace);
-		}
-
+	spin_unlock(&utrace->lock);
+	if (!flags) {
 		if (action)
 			*action = UTRACE_RESUME;
 	}
@@ -1699,18 +1591,18 @@ void utrace_report_clone(unsigned long c
 
 	/*
 	 * We don't use the REPORT() macro here, because we need
-	 * to clear utrace->u.live.cloning before finish_report().
+	 * to clear utrace->cloning before finish_report().
 	 * After finish_report(), utrace can be a stale pointer
 	 * in cases when report.action is still UTRACE_RESUME.
 	 */
 	start_report(utrace);
-	utrace->u.live.cloning = child;
+	utrace->cloning = child;
 
 	REPORT_CALLBACKS(task, utrace, &report,
 			 UTRACE_EVENT(CLONE), report_clone,
 			 report.action, engine, task, clone_flags, child);
 
-	utrace->u.live.cloning = NULL;
+	utrace->cloning = NULL;
 	finish_report(&report, task, utrace);
 
 	/*
@@ -1766,25 +1658,13 @@ void utrace_report_jctl(int notify, int 
 	 * was finished, we might be here with utrace already
 	 * removed or in the middle of being removed.
 	 *
-	 * RCU makes it safe to get the utrace->lock even if it's
-	 * being freed.  Once we have that lock, either an external
-	 * detach has finished and this struct has been freed, or
-	 * else we know we are excluding any other detach attempt.
-	 *
 	 * If we are indeed attached, then make sure we are no
 	 * longer considered stopped while we run callbacks.
 	 */
-	rcu_read_lock();
-	utrace = rcu_dereference(task->utrace);
-	if (unlikely(!utrace)) {
-		rcu_read_unlock();
-		return;
-	}
 	spin_lock(&utrace->lock);
 	utrace->stopped = 0;
 	utrace->report = 0;
 	spin_unlock(&utrace->lock);
-	rcu_read_unlock();
 
 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
 	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
@@ -1987,7 +1867,7 @@ void utrace_resume(struct task_struct *t
  */
 bool utrace_interrupt_pending(void)
 {
-	return current->utrace->interrupt;
+	return current->utrace.interrupt;
 }
 
 /*
@@ -2028,7 +1908,7 @@ int utrace_get_signal(struct task_struct
 	__releases(task->sighand->siglock)
 	__acquires(task->sighand->siglock)
 {
-	struct utrace *utrace;
+	struct utrace *utrace = task_utrace_struct(task);
 	struct k_sigaction *ka;
 	INIT_REPORT(report);
 	struct utrace_attached_engine *engine, *next;
@@ -2037,44 +1917,13 @@ int utrace_get_signal(struct task_struct
 	u32 ret;
 	int signr;
 
-	/*
-	 * We could have been considered quiescent while we were in
-	 * TASK_STOPPED, and detached asynchronously.  If we woke up
-	 * and checked task->utrace_flags before that was finished,
-	 * we might be here with utrace already removed or in the
-	 * middle of being removed.
-	 */
-	rcu_read_lock();
-	utrace = rcu_dereference(task->utrace);
-	if (unlikely(!utrace)) {
-		rcu_read_unlock();
-		return 0;
-	}
-
 	if (utrace->interrupt || utrace->report || utrace->signal_handler) {
 		/*
 		 * We've been asked for an explicit report before we
 		 * even check for pending signals.
 		 */
-
 		spin_unlock_irq(&task->sighand->siglock);
-
-		/*
-		 * RCU makes it safe to get the utrace->lock even if
-		 * it's being freed.  Once we have that lock, either an
-		 * external detach has finished and this struct has been
-		 * freed, or else we know we are excluding any other
-		 * detach attempt.
-		 */
 		spin_lock(&utrace->lock);
-		rcu_read_unlock();
-
-		if (unlikely(task->utrace != utrace)) {
-			spin_unlock(&utrace->lock);
-			cond_resched();
-			return -1;
-		}
-
 		splice_attaching(utrace);
 
 		if (unlikely(!utrace->interrupt) && unlikely(!utrace->report))
@@ -2122,7 +1971,6 @@ int utrace_get_signal(struct task_struct
 		 * If noone is interested in intercepting signals,
 		 * let the caller just dequeue them normally.
 		 */
-		rcu_read_unlock();
 		return 0;
 	} else {
 		if (unlikely(utrace->stopped)) {
@@ -2141,17 +1989,9 @@ int utrace_get_signal(struct task_struct
 			 */
 			spin_unlock_irq(&task->sighand->siglock);
 			spin_lock(&utrace->lock);
-			rcu_read_unlock();
-			if (unlikely(task->utrace != utrace)) {
-				spin_unlock(&utrace->lock);
-				cond_resched();
-				return -1;
-			}
 			utrace->stopped = 0;
 			spin_unlock(&utrace->lock);
 			spin_lock_irq(&task->sighand->siglock);
-		} else {
-			rcu_read_unlock();
 		}
 
 		/*
@@ -2542,11 +2382,7 @@ struct task_struct *utrace_tracer_task(s
 	struct utrace_attached_engine *engine;
 	const struct utrace_engine_ops *ops;
 	struct task_struct *tracer = NULL;
-	struct utrace *utrace;
-
-	utrace = rcu_dereference(target->utrace);
-	if (!utrace)
-		return NULL;
+	struct utrace *utrace = task_utrace_struct(target);
 
 	list_for_each_safe(pos, next, &utrace->attached) {
 		engine = list_entry(pos, struct utrace_attached_engine,
@@ -2587,9 +2423,8 @@ int utrace_unsafe_exec(struct task_struc
  */
 void task_utrace_proc_status(struct seq_file *m, struct task_struct *p)
 {
-	struct utrace *utrace = rcu_dereference(p->utrace);
-	if (likely(!utrace))
-		return;
+	struct utrace *utrace = task_utrace_struct(p);
+
 	seq_printf(m, "Utrace: %lx%s%s%s\n",
 		   p->utrace_flags,
 		   utrace->stopped ? " (stopped)" : "",
Index: utrace-20jan/include/linux/tracehook.h
===================================================================
--- utrace-20jan.orig/include/linux/tracehook.h
+++ utrace-20jan/include/linux/tracehook.h
@@ -370,8 +370,7 @@ static inline void tracehook_report_vfor
 static inline void tracehook_prepare_release_task(struct task_struct *task)
 {
 	smp_mb();
-	if (task_utrace_struct(task) != NULL)
-		utrace_release_task(task);
+	utrace_release_task(task);
 }
 
 /**
@@ -385,21 +384,8 @@ static inline void tracehook_prepare_rel
  */
 static inline void tracehook_finish_release_task(struct task_struct *task)
 {
-	int bad = 0;
 	ptrace_release_task(task);
 	BUG_ON(task->exit_state != EXIT_DEAD);
-	if (unlikely(task_utrace_struct(task) != NULL)) {
-		/*
-		 * In a race condition, utrace_attach() will temporarily set
-		 * it, but then check @task->exit_state and clear it.  It does
-		 * all this under task_lock(), so we take the lock to check
-		 * that there is really a bug and not just that known race.
-		 */
-		task_lock(task);
-		bad = unlikely(task_utrace_struct(task) != NULL);
-		task_unlock(task);
-	}
-	BUG_ON(bad);
 }
 
 /**
Index: utrace-20jan/kernel/ptrace.c
===================================================================
--- utrace-20jan.orig/kernel/ptrace.c
+++ utrace-20jan/kernel/ptrace.c
@@ -778,7 +778,16 @@ static inline bool exclude_ptrace(struct
  */
 static inline bool exclude_ptrace(struct task_struct *task)
 {
-	return unlikely(!!task_utrace_struct(task));
+	struct utrace *utrace = task_utrace_struct(task);
+
+	spin_lock(&utrace->lock);
+	if (list_empty(&utrace->attached) && list_empty(&utrace->attaching)) {
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
+	spin_unlock(&utrace->lock);
+	return true;
 }
 #endif
 

From de_amigo_para_amigo-owner at yahoogrupos.com.br  Wed Jan 21 11:14:14 2009
From: de_amigo_para_amigo-owner at yahoogrupos.com.br (Moderador do grupo de_amigo_para_amigo)
Date: 21 Jan 2009 11:14:14 -0000
Subject: Bem-vindo ao grupo de_amigo_para_amigo! 
Message-ID: <1232536454.186.25968.m44@yahoogrupos.com.br>


Ol?,    

Bem-vindo ao grupo de_amigo_para_amigo em Yahoo! Grupos.

Voc? est? pronto para se conectar com seu grupo, ? s? come?ar!
Confira todos as simples (e gratuitas) maneiras de se comunicar, compartilhar e descobrir:

* Voc? escolhe quando e como manter contato
* Compartilhe fotos, arquivos, enquetes, calend?rios, links e muito mais
* Transfira rapidamente novas mensagens e encontre arquivos detalhados
* Aproveite muitas outras maneiras de se comunicar - 24/7

Comece Visit de_amigo_para_amigo j?!
http://us.rd.yahoo.com/evt=42879/*http://br.groups.yahoo.com/group/de_amigo_para_amigo


Sauda??es,
Moderador
de_amigo_para_amigo

 
Complete sua conta do Yahoo! Grupos agora:
----------------------------------------------------------------------
Seu endere?o de e-mail foi adicionado ? lista de discuss?o de um grupo 
do Yahoo!. Para ter acesso a todos os recursos web dispon?veis para 
o grupo (arquivo de mensagens, compartilhamento de fotos e arquivos, 
agenda, etc.) e, al?m disso, ter mais controle sobre as suas op??es 
para recebimento de mensagens, recomendamos que voc? complete sua 
conta associando seu endere?o de e-mail a uma conta do Yahoo!. Fazer
isso ? r?pido, f?cil e gratuito. Visite o link abaixo para saber mais: 
http://br.groups.yahoo.com/convacct?email=utrace-devel%40redhat.com&list=de_amigo_para_amigo

O uso que voc? faz do Yahoo! Grupos est? sujeito aos http://br.yahoo.com/info/utos.html
 

From shunt at recordsreduction.com  Thu Jan 22 16:37:35 2009
From: shunt at recordsreduction.com (Shane Hunt)
Date: Thu, 22 Jan 2009 08:37:35 -0800
Subject: =?utf-8?q?Do_you_dread_moving_the_=E2=80=9908_files_to_make_room?=
	=?utf-8?b?IGZvciDigJkwOT8=?=
Message-ID: <200901221711.n0MGoLDM000797@mx2.redhat.com>

Let us do it for you?.FREE of charge.

Records Reduction, Inc. is offering FREE pickup for new customers
in January & February, 2009.  In addition, we will also pull the
files from the filing cabinets and box them at NO CHARGE!

That?s right, this year you will have to touch a file to get 
ready for ?09 files.  It?s the perfect time for you to begin 
using our services.

Scanning ? This is the best solutions for files that you must 
keep long term,or that require a lot of retrievals. Records 
Reduction, Inc. will scan them in and provide a legal copy on 
disk.  You can save the files on your system and have a networked
imaging solution with no additional software.

Off site file storage ? This is the most economical solution for 
files that you don?t have to keep long term and for those that 
are rarely retrieved.

Shredding ? If you have files that no longer have to be kept, let
us pick them up and provide secure shredding.  It?s also a great 
solution for any documents that contain Names, Social Security 
Numbers, or other identifying information.  We can do large 
purges, or provide secure bins for ongoing shredding.

Please call Shane Hunt @ 704-724-3313, or email 
shunt at recordsreduction.com for more information.

www.recordsreduction.com


Electronic filing (scanning/imaging) is the best long-term 
storage solution for any files that you must keep long term, or 
if you do a lot of retrievals from them.
 
Examples include, but are not limited to:
 
Accounts Payable
Human Resources
Medical Charts
EOBs
Sales Files
Job Files
Accounts Receivable
Engineering Drawings
School Records
Educational Materials
Legal Files
Real Estate Files
Bill of Ladings
Workers Comp Files


Which Service is Right for You?

Document Scanning
 
Document scanning is perfect for files that you must store for a 
long time ? typically five years or greater. Also, if you have to
do many retrievals, scanning will pay for itself by increasing 
efficiencies in the office.  With scanning, there are no ongoing 
costs.  You pay once and you have a legal copy of your business 
documents forever.  Some examples where scanning makes sense 
include Accounts Payables, Job Files, Corporate Financials, 
Medical Files, Legal Files, Insurance Documents, Human 
Resources, etc.
 
www.recordsreduction.com
 
Offsite Record Storage
 
Offsite document Storage is best for files that you do not have 
to keep forever, and do very little retrievals.  Records 
Reduction, Inc. provides records storage, retrieval, delivery and
pick-up services for companies in the Carolinas.  Records are 
stored at our secure service center where our team members 
retrieve boxes or individual files as requested by our clients. 
Records are normally delivered the next day & emergency delivery
options are also available.  We can always retrieve the file, 
scan it and email or fax it to you within minutes. Records 
Reduction, Inc. will become an extension to your existing file
room or storage area by providing: 

- Secure, confidential document storage 
- Efficient retrieval of records 
- Next-day & emergency deliveries 
- The highest level of customer service in the industry 
 
We manage your records inventory through computer software 
tracking system. Once records are entered into our database and 
placed into storage, our customers can simply call or email and 
have their files physically or electronically delivered. 
 
www.recordsreduction.com
  
 Ongoing, Onsite Document Destruction
 
Identity theft is the fastest rising crime in America. Companies 
can be found liable if they do not protect information that can 
be used in identity theft.  You can use our secure bins for paper
that contains information that might be used for identity theft.
Many companies now use the bins for ALL of their discarded paper 
- sensitive or not - simply because they know it will be 
recycled. It's just another way to help protect our planet!
 
Records Reduction, Inc. provides FREE locked, secure containers 
for thestorage of your confidential material while awaiting 
destruction. The containers are attractive and fit in well with 
all office environments. Our containers will segregate and secure
sensitive materials in between our service visits. The containers
are locked and can only be opened by authorized personnel, 
eliminating the chance of sensitive documents being made public 
or falling into the wrong hands. The locked containers will be
picked up and placed in a secure document shredding system.

In addition to paper document shredding services, Records 
Reduction provides secure destruction services for X-Rays, 
Computer Hard Drives, CDs, and Magnetic Media Tapes.

www.recordsreduction.com
 
Bulk Purge Shredding Services

Companies file away storage boxes year after year. Often, they 
are kept long after their legal requirement.  Shredding has 
become a necessary business service to not only comply with 
regulatory requirements but to protect your business, employees 
and customers from identity theft.  Experts recommend that you 
shred most files as soon as it is legally permissible.  
 
Records Reduction, Inc. can provide onsite or offsite secure 
shredding services. 
 
www.recordsreduction.com

eDocHealth ? Electronic Medical Records Solution
 
Enhance Patient Care, reduce cost of operations and increase 
revenues through eDocHealth.

eDocHealth is a proven medical document management solution that
instantly improves medical office document access as well as 
practice workflow by electronically scanning and filing your 
documents and making them accessible to your entire staff 
regardless of their location. When you minimize paper-based 
activity and work within a digital environment, you trim overhead
costs by reducing reliance on paper, streamline workflow with
quick access to information, and protect patient records with 
strict user-control.

The burden of administrative and clinical documents in a medical 
practice is considerable. Busy offices lead to inaccessible 
administrative documents and charts; whether misplaced, lost, or 
in use by another staff member. Physician practices continue to 
seek a solution to reduce or eliminate the increasing volumes of 
paper within their organizations. The optimal product would 
eliminate the issues of overcrowded office space and storage
facilities as well as the problems associated with paper medical 
records such as lost or misplaced patient charts, patient EOBs, 
etc. Medical staff and providers demand a user friendly HIPAA 
compliant solution that enhances patient care, and reduces cost 
of operations while increasing revenue and generating a rapid 
return on investment (ROI).

eDocHealth is a cost-effective way to meet those needs, by 
automation of administrative and clinical documents management. 
eDocHealth does not force you to change your office workflow, 
instead, it can adapt to it or be configured for ?best 
operational practices?.  

eDocHealth can work in conjunction with your Practice Management 
software and Electronic Medical Records software (EMR/EHR). In 
most cases document management solutions are better suited to 
manage medical records than traditional EMR/EHR. It is a non fact
that document management solutions have near 98% implementation 
success while traditional EMR/HER solutions are more challenging 
endeavors. 

www.recordsreduction.com

PO Box 3322, Matthews, NC 28106


http://app.streamsend.com/private/tF8d/2bm/cAm25g7/unsubscribe/2511712
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090122/3e00d3ff/attachment.htm>

From trujillo_shiloh at sm.sexsm.org  Fri Jan 23 19:41:03 2009
From: trujillo_shiloh at sm.sexsm.org (Booty Lox)
Date: Sat, 24 Jan 2009 02:41:03 +0700
Subject: What are you waiting for ?
Message-ID: <38eb01c97dcd$07395279$5a862f59@[89.47.134.90]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From eliseo.salijs at srvkiit3sc1vzs0emm4p8h7.usercash.com  Fri Jan 23 22:33:48 2009
From: eliseo.salijs at srvkiit3sc1vzs0emm4p8h7.usercash.com (Jeams)
Date: Sat, 24 Jan 2009 03:33:48 +0500
Subject: Fw: Passed up for the promotion ... right ?
Message-ID: <18e301c97dd4$021622f7$2bc49fc8@BRTEL196043.res-com.brtel.com.br>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From dore_marius at rlwd.com  Sat Jan 24 01:11:31 2009
From: dore_marius at rlwd.com (Fale Danil)
Date: Sat, 24 Jan 2009 04:11:31 +0300
Subject: Fw: Degree = prestige !
Message-ID: <261001c97dd9$17222139$d70c7cd4@[212.124.12.215]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From deluxe_khurram at spacemanetc.com  Fri Jan 23 22:48:16 2009
From: deluxe_khurram at spacemanetc.com (Huseyin)
Date: Sat, 24 Jan 2009 04:48:16 +0600
Subject: but I'm only missing twenty credits ...
Message-ID: <3c2f01c97dde$3c3bae40$39c3a24e@[78.162.195.57]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From katty at ruiterlighting.nl  Fri Jan 23 22:00:26 2009
From: katty at ruiterlighting.nl (Bogner Freddy)
Date: Sat, 24 Jan 2009 05:00:26 +0700
Subject: Degree = advancement !
Message-ID: <1b9601c97de0$0115dfa0$abd0557c@p5171-ipbf506souka.saitama.ocn.ne.jp>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From jacinto.oganezov at sposabellanoivas.com  Sat Jan 24 02:24:26 2009
From: jacinto.oganezov at sposabellanoivas.com (Newman)
Date: Sat, 24 Jan 2009 05:24:26 +0300
Subject: Fw: Door-unlocker !
Message-ID: <0dae01c97de4$05e1d2f0$038b603b@[59.96.139.3]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From ragazzone_shelby at simson-maxwell.net  Fri Jan 23 22:29:01 2009
From: ragazzone_shelby at simson-maxwell.net (Vanilson)
Date: Sat, 24 Jan 2009 05:29:01 +0700
Subject: Fw: Get a better position !
Message-ID: <245601c97de4$0e311802$0dce2f5c@[92.47.206.13]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From tomlinson_menno at sexxxyvideos.com  Sat Jan 24 01:29:49 2009
From: tomlinson_menno at sexxxyvideos.com (Yarkova Caprio)
Date: Sat, 24 Jan 2009 05:29:49 +0400
Subject: Fw: Get the recognition that you deserve !
Message-ID: <48e801c97de4$2408636a$3e0c97c1@62-12.alba.ua>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From jastrzebska_krish at sinotek.net  Sat Jan 24 01:34:14 2009
From: jastrzebska_krish at sinotek.net (Entchen Crenshaw)
Date: Sat, 24 Jan 2009 05:34:14 +0400
Subject: Fw: Passed up, again ?
Message-ID: <5cc901c97de5$2f40015c$28b0b95a@0133300159.0.fullrate.dk>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From vanguers.vaz at sec7.com  Sat Jan 24 00:44:07 2009
From: vanguers.vaz at sec7.com (Slavnova Nika)
Date: Sat, 24 Jan 2009 05:44:07 +0500
Subject: Do you have anough life experience ?
Message-ID: <72cd01c97de6$0554dbdf$02871dd0@[208.29.135.2]>

If you are more than qualified with your experience, but are lacking that 
prestigious piece of paper known as a diploma that is often the passport to 
success.

We provide a concept that will allow anyone with sufficient work experience 
to obtain a fully verifiable University Degree - Bachelors, Masters or even 
a Doctorate.

Within four to six weeks, you will be a college graduate.

Many people are doing the work of the person that has the degree and the 
person that has the degree is getting all the money. Don?t you think that 
it is time you were paid fair compensation for the level of work you are 
already doing?

This is your chance to finally make the right move and receive your due 
benefits.

CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

Ring Anytime +1-904-346-1158 


From correo.comercial at adistech.net  Sat Jan 24 08:51:20 2009
From: correo.comercial at adistech.net (.)
Date: Sat, 24 Jan 2009 09:51:20 +0100
Subject: =?iso-8859-1?q?Mejor_imposible=2E=2E=2Eultimas_existencias_en_li?=
	=?iso-8859-1?q?quidaci=F3n=2E=2E=2E?=
Message-ID: <20090124085118.51FE78F59B@svr.adistech.net>

Publicidad


Promoci?n v?lida a partir del 21/01/2009
         Adistech Europe, S.L. 
             adistech.europesl at gmail.com
PD: Para cualquier consulta, puedes ponerte en contacto con nuestro equipo al
tel. (+34) 93 481 4162
Si deseas darte de baja de nuestras listas de distribuciones, por favor pulsa aqu?  (poniendo en el asunto la palabra "baja"). 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090124/ab89afb4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Promocion.jpg
Type: image/jpeg
Size: 94903 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090124/ab89afb4/attachment.jpg>

From morgrimm_tanya at uook-s.com  Mon Jan 26 08:46:49 2009
From: morgrimm_tanya at uook-s.com (Filemon)
Date: Mon, 26 Jan 2009 09:46:49 +0100
Subject: Smile and dial !
Message-ID: <300001c97f9b$1549608c$c2bf514d@[77.81.191.194]>

     
        If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.

      We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate.


      Within four to six weeks, you will be a college graduate.

      Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing?


      This is your chance to finally make the right move and receive your due benefits.

      Ring Anytime +19043461158

      CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE
      TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE!

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090126/264ff05a/attachment.htm>

From graficarmc at pop.com.br  Mon Jan 26 02:59:27 2009
From: graficarmc at pop.com.br (RMC Visual)
Date: Mon, 26 Jan 2009 02:59:27 GMT
Subject: =?iso-8859-1?q?Comunicar_!!!_Faz_a_Diferen=E7a=2E?=
Message-ID: <20090126025934.9606652F9D42@postfix41.rmcvisual.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090126/d63f3034/attachment.htm>

From rumen at thefordadvantage.com  Mon Jan 26 14:52:14 2009
From: rumen at thefordadvantage.com (Scavetta Deian)
Date: Mon, 26 Jan 2009 15:52:14 +0100
Subject: Schaaf Annya  VIP world
Message-ID: <263901c97fce$0629a420$a67497c1@harrier.sx5.cable.tolna.net>


      We don?t accept just anyone...
      For the most prestigious gaming experience around, visit Exclusive Club Casino.

      http://www.best-winner-casino-usa.com/

     
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090126/0703c84e/attachment.htm>

From alti_hitoshi at stocktradeinsider.com  Mon Jan 26 14:59:37 2009
From: alti_hitoshi at stocktradeinsider.com (Romanyuk)
Date: Mon, 26 Jan 2009 15:59:37 +0100
Subject: Romanyuk Dreseler  VIP club
Message-ID: <38af01c97fcf$003bae9e$154e0abe@[190.10.78.21]>


      We don?t accept just anyone...
      For the most prestigious gaming experience around, visit Exclusive Club Casino.

      http://www.casino-usa-online.com/

     
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090126/16dbf1a5/attachment.htm>

From napoli.arlen at vanrollovers.com  Mon Jan 26 15:28:09 2009
From: napoli.arlen at vanrollovers.com (Ejsotet Fleerackers)
Date: Mon, 26 Jan 2009 16:28:09 +0100
Subject: Ejsotet Kurt  VIP club
Message-ID: <6b8401c97fd3$0aab023c$496b18bd@18924107073.user.veloxzone.com.br>


      We don?t accept just anyone...
      For the most prestigious gaming experience around, visit Exclusive Club Casino.

      http://www.the-online-usa-casino-club.com/

     
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090126/8b05f490/attachment.htm>

From speedpromarketing at speedpromarketing.com  Tue Jan 27 17:54:17 2009
From: speedpromarketing at speedpromarketing.com (Fabiano Couto)
Date: Tue, 27 Jan 2009 17:54:17 GMT
Subject: Tv via Internet 3000Canais 24horas
Message-ID: <200901271811.n0RIBWNv028873@mx3.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090127/2d3d2e39/attachment.htm>

From fche at redhat.com  Tue Jan 27 19:54:26 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Tue, 27 Jan 2009 14:54:26 -0500
Subject: proof-of-concept, utrace->ftrace engine
Message-ID: <20090127195425.GF32568@redhat.com>

Hi -

Here's the start of a little ditty that ties process-related events as
hooked by the Roland McGrath's utrace code into the ftrace
buffer/control widgetry.  If nothing else, think of it as one
potential in-tree user of utrace.


Script started on Tue 27 Jan 2009 02:39:06 PM EST

[root at vm-fed10-64 tracing]# cat available_tracers 
process wakeup irqsoff sysprof sched_switch nop
[root at vm-fed10-64 tracing]# echo process > current_tracer 
[root at vm-fed10-64 tracing]# echo 500 > process_trace_uid_filter 
[root at vm-fed10-64 tracing]# cat trace
# tracer: process
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
[root at vm-fed10-64 tracing]# su - fche
%                                                                               
vm-fed10-64 /home/fche
[14:39:50] % pwd
/home/fche
%                                                                               
vm-fed10-64 /home/fche
[14:39:52] % ls /tmp
firstbootX.log     pulse-PKdhtXMmr18n  stapbXg0xB  stapUniATd
foo                stap6cNJ5M          stapl9Ww2f  virtual-fche.4SkpzQ
kerneloops.pxnITL  stap9MajHI          stapT1LKnQ
%                                                                               
vm-fed10-64 /home/fche
[14:39:59] % df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      13706328  11417980   2149176  85% /
/dev/sda1               194442     34259    150144  19% /boot
tmpfs                   382320         0    382320   0% /dev/shm
super:/home          1300999168 496440320 750835712  40% /home
%                                                                               
vm-fed10-64 /home/fche
[14:40:03] % exit
Tue Jan 27 14:40:05 EST 2009
[root at vm-fed10-64 tracing]# cat trace
# tracer: process
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
             zsh  2091   0   2616701.950948 exec
             zsh  2091   0   2616701.966410 fork 2092 flags 0x1200011
          whoami  2092   1   2616702.005276 exec
          whoami  2092   0   2616702.008612 exit 0
             zsh  2091   0   2616702.009193 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.011385 fork 2093 flags 0x1200011
           mkdir  2093   1   2616702.013701 exec
           mkdir  2093   0   2616702.017300 exit 0
             zsh  2091   0   2616702.018133 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.018951 fork 2094 flags 0x1200011
          whoami  2094   0   2616702.023867 exec
          whoami  2094   0   2616702.026108 exit 0
             zsh  2091   0   2616702.026567 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.027358 fork 2095 flags 0x1200011
           mkdir  2095   1   2616702.029712 exec
           mkdir  2095   1   2616702.031703 exit 0
             zsh  2091   0   2616702.032275 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.035062 fork 2096 flags 0x1200011
             zsh  2096   1   2616702.036457 exit 0
             zsh  2091   0   2616702.037344 fork 2097 flags 0x1200011
             zsh  2091   0   2616702.038959 signal 17 errno 0 code 262145
           egrep  2097   1   2616702.039692 exec
           egrep  2097   1   2616702.041620 exit 256
             zsh  2091   0   2616702.042150 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.043095 fork 2098 flags 0x1200011
             zsh  2098   1   2616702.044435 exit 0
             zsh  2091   0   2616702.045329 fork 2099 flags 0x1200011
             zsh  2091   0   2616702.046846 signal 17 errno 0 code 262145
           egrep  2099   1   2616702.047646 exec
           egrep  2099   1   2616702.049571 exit 0
             zsh  2091   0   2616702.050141 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.051020 fork 2100 flags 0x1200011
             zsh  2100   0   2616702.052046 exit 0
             zsh  2091   0   2616702.053346 fork 2101 flags 0x1200011
             zsh  2091   0   2616702.054672 signal 17 errno 0 code 262145
           egrep  2101   1   2616702.055515 exec
           egrep  2101   1   2616702.057346 exit 0
             zsh  2091   0   2616702.057907 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.058982 fork 2102 flags 0x1200011
              id  2102   1   2616702.064822 exec
              id  2102   1   2616702.067609 exit 0
             zsh  2091   0   2616702.068307 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.069246 fork 2103 flags 0x1200011
        hostname  2103   0   2616702.072067 exec
        hostname  2103   0   2616702.074154 exit 0
             zsh  2091   0   2616702.074766 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.076529 fork 2104 flags 0x1200011
             zsh  2104   1   2616702.077982 exit 0
             zsh  2091   0   2616702.079742 fork 2105 flags 0x1200011
             zsh  2091   0   2616702.081672 signal 17 errno 0 code 262145
            grep  2105   1   2616702.082929 exec
            grep  2105   0   2616702.087867 exec
            grep  2105   0   2616702.089716 exit 256
             zsh  2091   0   2616702.090205 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.092925 fork 2106 flags 0x1200011
            tput  2106   1   2616702.099077 exec
            tput  2106   1   2616702.100918 exit 0
             zsh  2091   0   2616702.101588 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.102659 fork 2107 flags 0x1200011
       dircolors  2107   1   2616702.108917 exec
       dircolors  2107   1   2616702.110359 exit 0
             zsh  2091   0   2616702.110997 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.134110 fork 2108 flags 0x1200011
           egrep  2108   0   2616702.136910 exec
           egrep  2108   0   2616702.138921 exit 256
             zsh  2091   0   2616702.139430 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.141230 fork 2109 flags 0x1200011
             zsh  2109   1   2616702.142714 exit 0
             zsh  2091   0   2616702.143685 fork 2110 flags 0x1200011
             zsh  2091   0   2616702.145204 signal 17 errno 0 code 262145
            grep  2110   1   2616702.145974 exec
            grep  2110   1   2616702.147934 exit 256
             zsh  2091   0   2616702.150523 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.151842 fork 2111 flags 0x1200011
             zsh  2111   1   2616702.153271 exit 0
             zsh  2091   0   2616702.154703 fork 2112 flags 0x1200011
             zsh  2091   0   2616702.156063 signal 17 errno 0 code 262145
            grep  2112   1   2616702.157028 exec
            grep  2112   1   2616702.158834 exit 256
             zsh  2091   0   2616702.159476 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.160319 fork 2113 flags 0x1200011
              id  2113   1   2616702.162848 exec
              id  2113   1   2616702.165115 exit 0
             zsh  2091   0   2616702.165872 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.168590 fork 2114 flags 0x1200011
     consoletype  2114   1   2616702.171021 exec
     consoletype  2114   1   2616702.171988 exit 512
             zsh  2091   0   2616702.172443 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.181959 fork 2115 flags 0x1200011
          whoami  2115   1   2616702.188936 exec
          whoami  2115   1   2616702.191366 exit 0
             zsh  2091   0   2616702.192051 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.194605 fork 2116 flags 0x1200011
           mkdir  2116   0   2616702.197377 exec
           mkdir  2116   0   2616702.199480 exit 0
             zsh  2091   0   2616702.200084 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.201017 fork 2117 flags 0x1200011
          whoami  2117   0   2616702.206033 exec
          whoami  2117   0   2616702.208245 exit 0
             zsh  2091   0   2616702.208888 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.209836 fork 2118 flags 0x1200011
           mkdir  2118   0   2616702.212527 exec
           mkdir  2118   0   2616702.214474 exit 0
             zsh  2091   0   2616702.215117 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.217011 fork 2119 flags 0x1200011
            stty  2119   0   2616702.220137 exec
            stty  2119   0   2616702.223496 exit 0
             zsh  2091   0   2616702.223977 signal 17 errno 0 code 262145
             zsh  2091   0   2616702.229063 fork 2120 flags 0x1200011
            mesg  2120   0   2616702.232073 exec
            mesg  2120   0   2616702.233994 exit 0
             zsh  2091   0   2616702.234454 signal 17 errno 0 code 262145
             zsh  2091   0   2616711.333172 fork 2121 flags 0x1200011
              ls  2121   0   2616711.336055 exec
              ls  2121   0   2616711.356496 exit 0
             zsh  2091   0   2616711.364547 signal 17 errno 0 code 262145
             zsh  2091   0   2616714.474787 fork 2125 flags 0x1200011
              df  2125   0   2616714.479280 exec
              df  2125   0   2616714.483010 exit 0
             zsh  2091   0   2616714.483701 signal 17 errno 0 code 262145
             zsh  2091   0   2616716.594615 fork 2126 flags 0x1200011
           clear  2126   0   2616716.598083 exec
           clear  2126   0   2616716.599856 exit 0
             zsh  2091   0   2616716.600439 signal 17 errno 0 code 262145
             zsh  2091   0   2616716.601532 fork 2127 flags 0x1200011
            date  2127   0   2616716.613852 exec
            date  2127   0   2616716.619608 exit 0
             zsh  2091   0   2616716.620334 signal 17 errno 0 code 262145
             zsh  2091   0   2616716.632090 fork 2128 flags 0x1200011
           clear  2128   0   2616716.634284 exec
           clear  2128   0   2616716.636012 exit 0
             zsh  2091   0   2616716.636775 signal 17 errno 0 code 262145
             zsh  2091   0   2616716.637448 exit 0
[root at vm-fed10-64 tracing]# nop > current_tracer 
[root at vm-fed10-64 tracing]# cat trace
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
[root at vm-fed10-64 tracing]# exit

Script done on Tue 27 Jan 2009 02:40:26 PM EST


diff --git a/include/linux/processtrace.h b/include/linux/processtrace.h
new file mode 100644
index 0000000..f902443
--- /dev/null
+++ b/include/linux/processtrace.h
@@ -0,0 +1,33 @@
+#ifndef PROCESSTRACE_H
+#define PROCESSTRACE_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+
+struct process_trace_entry {
+	unsigned char opcode;	/* one of _UTRACE_EVENT_* */
+        char comm[TASK_COMM_LEN]; /* XXX: should be in/via trace_entry */
+        union {
+                struct {
+                        pid_t child;
+                        unsigned long flags;
+                } trace_clone;
+                struct {
+                        long code;
+                } trace_exit;
+                struct {
+                } trace_exec;
+                struct {
+                        int si_signo;
+                        int si_errno;
+                        int si_code;
+                } trace_signal;
+        };
+};
+
+/* in kernel/trace/trace_process.c */
+
+extern void enable_process_trace (void);
+extern void disable_process_trace (void);
+
+#endif /* PROCESSTRACE_H */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 33dbefd..9276863 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -119,6 +119,15 @@ config CONTEXT_SWITCH_TRACER
 	  This tracer gets called from the context switch and records
 	  all switching of tasks.
 
+config PROCESS_TRACER
+	bool "Trace process events via utrace"
+	depends on DEBUG_KERNEL
+	select TRACING
+	select UTRACE
+	help
+	  This tracer provides trace records from process events
+	  accessible to utrace: lifecycle, system calls, and signals.
+
 config BOOT_TRACER
 	bool "Trace boot initcalls"
 	depends on DEBUG_KERNEL
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c8228b1..b06a5d6 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -24,5 +24,6 @@ obj-$(CONFIG_NOP_TRACER) += trace_nop.o
 obj-$(CONFIG_STACK_TRACER) += trace_stack.o
 obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
 obj-$(CONFIG_BOOT_TRACER) += trace_boot.o
+obj-$(CONFIG_PROCESS_TRACER) += trace_process.o
 
 libftrace-y := ftrace.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8465ad0..7c0cd57 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -7,6 +7,7 @@
 #include <linux/clocksource.h>
 #include <linux/ring_buffer.h>
 #include <linux/mmiotrace.h>
+#include <linux/processtrace.h>
 #include <linux/ftrace.h>
 
 enum trace_type {
@@ -22,6 +23,7 @@ enum trace_type {
 	TRACE_MMIO_RW,
 	TRACE_MMIO_MAP,
 	TRACE_BOOT,
+	TRACE_PROCESS,
 
 	__TRACE_LAST_TYPE
 };
@@ -117,6 +119,11 @@ struct trace_boot {
 	struct boot_trace	initcall;
 };
 
+struct trace_process {
+        struct trace_entry		ent;
+	struct process_trace_entry	event;
+};
+
 /*
  * trace_flag_type is an enumeration that holds different
  * states when a trace occurs. These are:
@@ -219,6 +226,7 @@ extern void __ftrace_bad_type(void);
 		IF_ASSIGN(var, ent, struct trace_mmiotrace_map,		\
 			  TRACE_MMIO_MAP);				\
 		IF_ASSIGN(var, ent, struct trace_boot, TRACE_BOOT);	\
+		IF_ASSIGN(var, ent, struct trace_process, TRACE_PROCESS); \
 		__ftrace_bad_type();					\
 	} while (0)
 
diff --git a/kernel/trace/trace_process.c b/kernel/trace/trace_process.c
new file mode 100644
index 0000000..10c2c3c
--- /dev/null
+++ b/kernel/trace/trace_process.c
@@ -0,0 +1,440 @@
+/*
+ * utrace-based process event tracing
+ * Copyright (C) 2009 Red Hat Inc.
+ * By Frank Ch. Eigler <fche at redhat.com>
+ */
+
+#define DEBUG 1
+
+#include <linux/kernel.h>
+#include <linux/utrace.h>
+#include <linux/uaccess.h>
+#include <linux/debugfs.h>
+
+#include "trace.h"
+
+/* A process must match these filters in order to be traced. */
+static char trace_taskcomm_filter[TASK_COMM_LEN]; /* \0: unrestricted */
+static u32 trace_taskuid_filter = -1; /* -1: unrestricted */
+
+/* A process must be a direct child of given pid in order to be
+   followed. */ 
+static u32 process_follow_pid; /* 0: unrestricted/systemwide */
+
+/* XXX: lock the above? */
+
+
+/* trace data collection */
+
+static struct trace_array *process_trace_array;
+
+static void process_reset_data(struct trace_array *tr)
+{
+	int cpu;
+
+	pr_debug("in %s\n", __func__);
+	tr->time_start = ftrace_now(tr->cpu);
+	for_each_online_cpu(cpu)
+		tracing_reset(tr, cpu);
+}
+
+static void process_trace_init(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	process_trace_array = tr;
+	if (tr->ctrl) {
+		process_reset_data(tr);
+		enable_process_trace();
+	}
+}
+
+static void process_trace_reset(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	if (tr->ctrl)
+		disable_process_trace();
+	process_reset_data(tr);
+	process_trace_array = NULL;
+}
+
+static void process_trace_ctrl_update(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	if (tr->ctrl) {
+		process_reset_data(tr);
+		enable_process_trace();
+	} else {
+		disable_process_trace();
+	}
+}
+
+static void __trace_processtrace(struct trace_array *tr,
+				struct trace_array_cpu *data,
+				struct process_trace_entry *ent)
+{
+	struct ring_buffer_event *event;
+	struct trace_process *entry;
+	unsigned long irq_flags;
+
+	event	= ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+					   &irq_flags);
+	if (!event)
+		return;
+	entry	= ring_buffer_event_data(event);
+	tracing_generic_entry_update(&entry->ent, 0, preempt_count());
+        entry->ent.cpu                  = raw_smp_processor_id();
+	entry->ent.type			= TRACE_PROCESS;
+        strlcpy (ent->comm, current->comm, TASK_COMM_LEN);
+	entry->event			= *ent;
+	ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+	trace_wake_up();
+}
+
+void process_trace(struct process_trace_entry *ent)
+{
+	struct trace_array *tr = process_trace_array;
+	struct trace_array_cpu *data = tr->data[smp_processor_id()];
+
+	__trace_processtrace(tr, data, ent);
+}
+
+
+/* trace data rendering */
+
+static void process_pipe_open(struct trace_iterator *iter)
+{
+	struct trace_seq *s = &iter->seq;
+	pr_debug("in %s\n", __func__);
+	trace_seq_printf(s, "VERSION 200901\n");
+}
+
+static void process_close(struct trace_iterator *iter)
+{
+	iter->private = NULL;
+}
+
+static ssize_t process_read(struct trace_iterator *iter, struct file *filp,
+				char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	ssize_t ret;
+	struct trace_seq *s = &iter->seq;
+	ret = trace_seq_to_user(s, ubuf, cnt);
+	return (ret == -EBUSY) ? 0 : ret;
+}
+
+static enum print_line_t process_print(struct trace_iterator *iter)
+{
+	struct trace_entry *entry = iter->ent;
+	struct trace_process *field;
+	struct trace_seq *s	= &iter->seq;
+	unsigned long long t	= ns2usecs(iter->ts);
+	unsigned long usec_rem	= do_div(t, 1000000ULL);
+	unsigned secs		= (unsigned long)t;
+	int ret = 1;
+
+	pr_debug("in %s\n", __func__);
+	trace_assign_type(field, entry);
+
+        /* XXX: If print_lat_fmt() were not static, we wouldn't have
+           to duplicate this. */
+        trace_seq_printf(s, "%16s %5d %3d %9lu.%06ld ",
+                         field->event.comm,
+                         entry->pid, entry->cpu,
+                         secs,
+                         usec_rem);
+
+	switch (field->event.opcode) {
+	case _UTRACE_EVENT_CLONE:
+		ret = trace_seq_printf(s, "fork %d flags 0x%lx\n",
+                                       field->event.trace_clone.child,
+                                       field->event.trace_clone.flags);
+		break;
+	case _UTRACE_EVENT_EXEC:
+		ret = trace_seq_printf(s, "exec\n");
+		break;
+	case _UTRACE_EVENT_EXIT:
+		ret = trace_seq_printf(s, "exit %ld\n",
+                                       field->event.trace_exit.code);
+		break;
+	case _UTRACE_EVENT_SIGNAL:
+		ret = trace_seq_printf(s, "signal %d errno %d code %d\n",
+                                       field->event.trace_signal.si_signo,
+                                       field->event.trace_signal.si_errno,
+                                       field->event.trace_signal.si_code);
+		break;
+	default:
+		ret = trace_seq_printf(s, "process code %d?\n", field->event.opcode);
+		break;
+	}
+	if (ret)
+		return TRACE_TYPE_HANDLED;
+	return TRACE_TYPE_PARTIAL_LINE;
+}
+
+
+static enum print_line_t process_print_line(struct trace_iterator *iter)
+{
+	switch (iter->ent->type) {
+	case TRACE_PROCESS:
+		return process_print(iter);
+	default:
+		return TRACE_TYPE_HANDLED; /* ignore unknown entries */
+	}
+}
+
+static struct tracer process_tracer __read_mostly =
+{
+	.name		= "process",
+	.init		= process_trace_init,
+	.reset		= process_trace_reset,
+	.pipe_open	= process_pipe_open,
+	.close		= process_close,
+	.read		= process_read,
+	.ctrl_update	= process_trace_ctrl_update,
+	.print_line	= process_print_line,
+};
+
+
+
+/* utrace backend */
+
+/* Should tracing apply to given task?  Compare against filter
+   values. */
+static int trace_test (struct task_struct *tsk) 
+{
+        if (trace_taskcomm_filter[0]
+            && strcmp (trace_taskcomm_filter, tsk->comm))
+                return 0;
+        if (trace_taskuid_filter != (u32)-1 
+            && trace_taskuid_filter != task_uid (tsk))
+                return 0;
+
+        return 1;
+}
+
+
+static struct utrace_engine_ops process_trace_ops __read_mostly;
+
+static void process_trace_tryattach (struct task_struct *tsk) 
+{
+        struct utrace_attached_engine *engine;
+        
+        pr_debug("in %s\n", __func__);
+        engine = utrace_attach_task (tsk, UTRACE_ATTACH_CREATE,
+                                     & process_trace_ops, NULL);
+        if (IS_ERR(engine) || (engine == NULL)) {
+                pr_warning ("utrace_attach_task %d (rc %p)\n",
+                            tsk->pid, engine);
+        } else {
+                int rc;
+
+                /* XXX: Why is this not implicit from the fields
+                   set in the process_trace_ops? */
+                rc = utrace_set_events (tsk, engine,
+                                        UTRACE_EVENT(CLONE) |
+                                        UTRACE_EVENT(EXEC) |
+                                        UTRACE_EVENT(SIGNAL) |
+                                        UTRACE_EVENT(EXIT));
+                if (rc == -EINPROGRESS)
+                        rc = utrace_barrier (tsk, engine);
+                if (rc)
+                        pr_warning ("utrace_set_events/barrier rc %d\n", rc);
+                
+                utrace_engine_put (engine);
+                pr_debug("attached in %s to %s(%d)\n", __func__, tsk->comm, tsk->pid);
+        }
+}
+
+
+u32 process_trace_report_clone (enum utrace_resume_action action,
+                                struct utrace_attached_engine *engine,
+                                struct task_struct *parent,
+                                unsigned long clone_flags,
+                                struct task_struct *child) 
+{
+        if (trace_test (parent)) {
+                struct process_trace_entry ent;
+                ent.opcode = _UTRACE_EVENT_CLONE;
+                ent.trace_clone.child = child->pid;
+                ent.trace_clone.flags = clone_flags;
+                process_trace(& ent);
+        }
+
+        process_trace_tryattach (child);
+                        
+        return action;
+}
+
+
+u32 process_trace_report_exec (enum utrace_resume_action action,
+                               struct utrace_attached_engine *engine,
+                               struct task_struct *task,
+                               const struct linux_binfmt *fmt,
+                               const struct linux_binprm *bprm,
+                               struct pt_regs *regs) 
+{
+        if (trace_test (task)) {
+                struct process_trace_entry ent;
+                ent.opcode = _UTRACE_EVENT_EXEC;
+                process_trace(& ent);
+        }
+
+        /* We're already attached; no need for a new tryattach. */
+
+        return action;
+}
+
+
+u32 process_trace_report_signal (u32 action,
+                                 struct utrace_attached_engine *engine,
+                                 struct task_struct *task,
+                                 struct pt_regs *regs,
+                                 siginfo_t *info,
+                                 const struct k_sigaction *orig_ka,
+                                 struct k_sigaction *return_ka)
+{
+        if (trace_test (task)) {
+                struct process_trace_entry ent;
+                ent.opcode = _UTRACE_EVENT_SIGNAL;
+                ent.trace_signal.si_signo = info->si_signo;
+                ent.trace_signal.si_errno = info->si_errno;
+                ent.trace_signal.si_code = info->si_code;
+                process_trace(& ent);
+        }
+
+        /* We're already attached; no need for a new tryattach. */
+
+        return action;
+}
+
+
+u32 process_trace_report_exit (enum utrace_resume_action action,
+                               struct utrace_attached_engine *engine,
+                               struct task_struct *task,
+                               long orig_code, long *code) 
+{
+        if (trace_test (task)) {
+                struct process_trace_entry ent;
+                ent.opcode = _UTRACE_EVENT_EXIT;
+                ent.trace_exit.code = orig_code;
+                process_trace(& ent);
+        }
+
+        /* There is no need to explicitly attach or detach here. */
+
+        return action;
+}
+
+
+void enable_process_trace () { 
+        struct task_struct *grp, *tsk;
+
+        pr_debug("in %s\n", __func__);
+        rcu_read_lock();
+        do_each_thread(grp, tsk) {
+                struct mm_struct *mm;
+
+                /* Skip over kernel threads. */
+                mm = get_task_mm (tsk);
+                if (!mm)
+                        continue;
+                
+                if (process_follow_pid) {
+                        if (tsk->tgid == process_follow_pid ||
+                            tsk->parent->tgid == process_follow_pid)
+                        process_trace_tryattach (tsk);
+                } else {
+                        process_trace_tryattach (tsk);
+                }
+        } while_each_thread(grp, tsk);
+        rcu_read_unlock();
+}
+
+void disable_process_trace () {
+        struct utrace_attached_engine *engine;
+        struct task_struct *grp, *tsk;
+        int rc;
+
+        pr_debug("in %s\n", __func__);
+        rcu_read_lock();
+        do_each_thread(grp, tsk) {
+                if (tsk->pid <= 1)
+                        continue;
+
+                /* Find matching engine, if any.  Returns -ENOENT for
+                   unattached threads. */ 
+                engine = utrace_attach_task (tsk, UTRACE_ATTACH_MATCH_OPS,
+                                             & process_trace_ops, 0);
+                if (IS_ERR(engine)) {
+                        if (PTR_ERR(engine) != -ENOENT)
+                                pr_warning ("utrace_attach_task %d (rc %ld)\n",
+                                            tsk->pid, -PTR_ERR(engine));
+                } else if (engine == NULL) {
+                        pr_warning ("utrace_attach_task %d (null engine)\n",
+                                    tsk->pid);
+                } else {
+                        /* Found one of our own engines.  Detach.  */
+                        rc = utrace_control (tsk, engine, UTRACE_DETACH);
+                        switch (rc) {
+                        case 0:             /* success */
+                                break;
+                        case -ESRCH:        /* REAP callback already begun */
+                        case -EALREADY:     /* DEATH callback already begun */
+                                break;
+                        default:
+                                rc = -rc;
+                                pr_warning ("utrace_detach %d (rc %d)\n",
+                                            tsk->pid, rc);
+                                break;
+                        }
+                        utrace_engine_put(engine);
+                        pr_debug("detached in %s from %s(%d)\n", __func__, tsk->comm, tsk->pid);
+                }
+        } while_each_thread(grp, tsk);
+        rcu_read_unlock();
+}
+
+
+static struct utrace_engine_ops process_trace_ops __read_mostly = {
+        .report_clone = process_trace_report_clone,
+        .report_exec = process_trace_report_exec,
+        .report_exit = process_trace_report_exit,
+        .report_signal = process_trace_report_signal,
+};
+
+
+
+/* control interfaces */
+
+static struct debugfs_blob_wrapper trace_taskcomm_filter_blob = {
+        .data = trace_taskcomm_filter,
+        .size = sizeof (trace_taskcomm_filter),
+};
+
+static __init int init_process_trace(void)
+{
+        struct dentry *d_tracer;
+        struct dentry *entry;
+
+        d_tracer = tracing_init_dentry();
+
+        entry = debugfs_create_blob("process_trace_taskcomm_filter", 0644, d_tracer,
+                                    & trace_taskcomm_filter_blob);
+        if (!entry)
+                pr_warning("Could not create debugfs 'process_trace_taskcomm_filter' entry\n");
+
+        entry = debugfs_create_u32("process_trace_uid_filter", 0644, d_tracer,
+                                   & trace_taskuid_filter);
+        if (!entry)
+                pr_warning("Could not create debugfs 'process_trace_uid_filter' entry\n");
+
+        entry = debugfs_create_u32("process_follow_pid", 0644, d_tracer,
+                                   & process_follow_pid);
+        if (!entry)
+                pr_warning("Could not create debugfs 'process_follow_pid' entry\n");
+
+	return register_tracer(&process_tracer);
+}
+
+device_initcall(init_process_trace);


From fche at redhat.com  Wed Jan 28 00:43:32 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Tue, 27 Jan 2009 19:43:32 -0500
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <20090127224303.GB5850@nowhere> (Frederic Weisbecker's message of
	"Tue, 27 Jan 2009 23:43:05 +0100")
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
Message-ID: <y0mtz7ks1jf.fsf@ton.toronto.redhat.com>

Frederic Weisbecker <fweisbec at gmail.com> writes:

> [...]
> Speaking about a global syscall tracer, I made a patch to trace only the syscalls
> with the function-graph-tracer.
> http://lkml.org/lkml/2008/12/30/267 This low-level part can easily
> be used by all tracers that would like to inspect syscalls.
> [...]
> Just a change is needed: Steven requested that the part inside
> syscall_trace_enter become a tracepoint, making it totally shareable
> between tracers and easy to turn on and off.

Alternately, you could just rely on utrace's hooks.  They were thought
out more fully with respect to parameter access, manipulation, and
programmatic control befitting even a debugger.


- FChE


From fweisbec at gmail.com  Wed Jan 28 13:58:28 2009
From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=)
Date: Wed, 28 Jan 2009 14:58:28 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <y0mtz7ks1jf.fsf@ton.toronto.redhat.com>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<y0mtz7ks1jf.fsf@ton.toronto.redhat.com>
Message-ID: <c62985530901280558o7b7572efl8afeebb33b463e2b@mail.gmail.com>

2009/1/28 Frank Ch. Eigler <fche at redhat.com>:
> Frederic Weisbecker <fweisbec at gmail.com> writes:
>
>> [...]
>> Speaking about a global syscall tracer, I made a patch to trace only the syscalls
>> with the function-graph-tracer.
>> http://lkml.org/lkml/2008/12/30/267 This low-level part can easily
>> be used by all tracers that would like to inspect syscalls.
>> [...]
>> Just a change is needed: Steven requested that the part inside
>> syscall_trace_enter become a tracepoint, making it totally shareable
>> between tracers and easy to turn on and off.
>
> Alternately, you could just rely on utrace's hooks.  They were thought
> out more fully with respect to parameter access, manipulation, and
> programmatic control befitting even a debugger.
>
>
> - FChE
>

I don't know much it. But I will soon have some time to look at your
patch which uses ftrace from utrace.
Anyway, are there some plans about utrace to be merged? Unless I
couldn't be able to use
it...


From acme at redhat.com  Wed Jan 28 14:29:28 2009
From: acme at redhat.com (Arnaldo Carvalho de Melo)
Date: Wed, 28 Jan 2009 12:29:28 -0200
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <c62985530901280558o7b7572efl8afeebb33b463e2b@mail.gmail.com>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<y0mtz7ks1jf.fsf@ton.toronto.redhat.com>
	<c62985530901280558o7b7572efl8afeebb33b463e2b@mail.gmail.com>
Message-ID: <20090128142928.GF15877@ghostprotocols.net>

Em Wed, Jan 28, 2009 at 02:58:28PM +0100, Fr?d?ric Weisbecker escreveu:
> 2009/1/28 Frank Ch. Eigler <fche at redhat.com>:
> > Frederic Weisbecker <fweisbec at gmail.com> writes:
> >
> >> [...]
> >> Speaking about a global syscall tracer, I made a patch to trace only the syscalls
> >> with the function-graph-tracer.
> >> http://lkml.org/lkml/2008/12/30/267 This low-level part can easily
> >> be used by all tracers that would like to inspect syscalls.
> >> [...]
> >> Just a change is needed: Steven requested that the part inside
> >> syscall_trace_enter become a tracepoint, making it totally shareable
> >> between tracers and easy to turn on and off.
> >
> > Alternately, you could just rely on utrace's hooks.  They were thought
> > out more fully with respect to parameter access, manipulation, and
> > programmatic control befitting even a debugger.
> >
> >
> > - FChE
> >
> 
> I don't know much it. But I will soon have some time to look at your
> patch which uses ftrace from utrace.
> Anyway, are there some plans about utrace to be merged? Unless I
> couldn't be able to use
> it...

Well, one of the reasons for utrace not to be merged, IIRC, was that
there would be no users in-kernel. With Frank's ftrace plugin that is
not true anymore.

- Arnaldo


From jade at test.bio-met.ru  Wed Jan 28 14:54:42 2009
From: jade at test.bio-met.ru (Radica Coello)
Date: Wed, 28 Jan 2009 15:54:42 +0100
Subject: Alanis Ruggia  VIP world
Message-ID: <31ae01c98160$13e63812$bb93e6d8@187.147.intelnet.net.gt>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/e01a2f34/attachment.htm>

From phisit.erica at top.nash-kovcheg.ru  Wed Jan 28 15:04:05 2009
From: phisit.erica at top.nash-kovcheg.ru (Ruslanas Pihl)
Date: Wed, 28 Jan 2009 16:04:05 +0100
Subject: Jackeline Pedersson  VIP club
Message-ID: <778301c98162$036406c8$275d1e53@cav39.neoplus.adsl.tpnet.pl>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/3d2054fb/attachment.htm>

From janyurka at u-x0s9u8gi0.looble.net  Wed Jan 28 15:15:03 2009
From: janyurka at u-x0s9u8gi0.looble.net (Galit Wittmeyer)
Date: Wed, 28 Jan 2009 16:15:03 +0100
Subject: Tolulope  VIP club
Message-ID: <4f3a01c98163$0f6b8ab2$f6a0505c@dsldevice.lan>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/ff40debe/attachment.htm>

From karot.zhuk at topliste.geil-ficken.net  Wed Jan 28 15:23:51 2009
From: karot.zhuk at topliste.geil-ficken.net (Cheney Windsor)
Date: Wed, 28 Jan 2009 16:23:51 +0100
Subject: Karelia  VIP Lounge Invitation
Message-ID: <331201c98164$08ebb001$ee509d56@host86-157-80-238.range86-157.btcentralplus.com>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/5650879c/attachment.htm>

From andreh.lippes at televuelo.com  Wed Jan 28 15:59:47 2009
From: andreh.lippes at televuelo.com (Paterson Geib)
Date: Wed, 28 Jan 2009 16:59:47 +0100
Subject: Baik Asher  VIP Lounge Invitation
Message-ID: <692901c98169$1623e58c$ae07cec4@adsl196-174-7-206-196.adsl196-1.iam.net.ma>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/a830df40/attachment.htm>

From skorpion.janny at texasloanpros.com  Wed Jan 28 17:02:37 2009
From: skorpion.janny at texasloanpros.com (Shakur Hadi)
Date: Wed, 28 Jan 2009 18:02:37 +0100
Subject: Derraz Santanu  VIP club
Message-ID: <1f0901c98172$02e92058$dc084397@[151.67.8.220]>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/6220e9cc/attachment.htm>

From schick_anirudha at trespassing.kostenloses-forum.tk  Wed Jan 28 17:47:17 2009
From: schick_anirudha at trespassing.kostenloses-forum.tk (Dima Chizhova)
Date: Wed, 28 Jan 2009 18:47:17 +0100
Subject: Lindsen  VIP Lounge Invitation
Message-ID: <769601c98178$011a3fd0$4ffc6455@dsl.dynamic8510025279.ttnet.net.tr>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/ba395771/attachment.htm>

From criss.kondratyuk at thoroughcarecarpet.com  Wed Jan 28 18:29:29 2009
From: criss.kondratyuk at thoroughcarecarpet.com (Ducky Komepun)
Date: Wed, 28 Jan 2009 19:29:29 +0100
Subject: Gemignani Rian  VIP Lounge Invitation
Message-ID: <078401c9817e$09ed19b3$dbcc7c5b@219-204-124-91.pool.ukrtel.net>

             
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090128/03af105b/attachment.htm>

From Cranelxbksb at garageland.fsnet.co.uk  Thu Jan 29 13:24:15 2009
From: Cranelxbksb at garageland.fsnet.co.uk (Mendez boatswain)
Date: Thu, 29 Jan 2009 18:24:15 +0500
Subject: Contact List of Neurologists and many more
Message-ID: <020107z8mgz0$v3622of0$7835i7d0@Delldim5150


Board Certified MDs in the US 

788,326 in total <> 17,847 emails

Featuring coverage for more than 30 specialties like Internal Medicine, Family Practice, Opthalmology, Anesthesiologists, Cardiologists and more

Over a dozen sortable fields

Now priced at: $397


*** If you order by the end of the week you can take all the items below for fr ee ***

Pharmaceutical Companies in the US
Personal email addresses (47,000 in total) and names for top level executives

American Hospitals
more than 23k hospital administrators in over 7k hospitals [worth over $300 alone)

Extensive Database of Dentists in the United States
597,000 dentists and dental services ( a $350 value!) 

Chiropractors in the USA
100k Chiropractors offices with full contact data including email, postal address, phone and fax

reply by email:      Horne at listamaze.com

  
valid thru  January 30


kill future mailing by pressing this please send an email to discontinue at listamaze.com


From fweisbec at gmail.com  Thu Jan 29 14:29:15 2009
From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=)
Date: Thu, 29 Jan 2009 15:29:15 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <20090129140451.GM24391@elte.hu>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
Message-ID: <c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>

2009/1/29 Ingo Molnar <mingo at elte.hu>:
>
> * Frederic Weisbecker <fweisbec at gmail.com> wrote:
>
>> On Tue, Jan 27, 2009 at 11:43:03PM +0100, Frederic Weisbecker wrote:
>> > On Tue, Jan 27, 2009 at 12:08:04PM -0800, Kok, Auke wrote:
>> > >
>> > > This tracer monitors regular file open() syscalls. This is a fast
>> > > and low-overhead alternative to strace, and does not allow or
>> > > require to be attached to every process.
>> > >
>> > > The tracer only logs succesfull calls, as those are the only ones we
>> > > are currently interested in, and we can determine the absolute path
>> > > of these files as we log.
>> > >
>> > > Signed-off-by: Auke Kok <auke-jan.h.kok at intel.com>
>> >
>> >
>> > Hi Auke,
>> >
>> > Speaking about a global syscall tracer, I made a patch to trace only the syscalls
>> > with the function-graph-tracer.
>> >
>> > http://lkml.org/lkml/2008/12/30/267
>> >
>> > Its approach and purpose is different than a tracer dedicated only to syscalls.
>> > The function graph tracer traces execution graph of the functions and is more about
>> > execution time spent and code flow whereas a syscall tracer can provide more specific
>> > informations about syscalls.
>> >
>> > So both are not overlaping.
>> >
>> > But the low level part of my patch creates a thread flag _TIF_SYSCALL_TRACE which triggers
>>
>> s/_TIF_SYSCALL_TRACE/_TIF_SYSCALL_FTRACE
>
>> > Once we have it, I think a syscall tracer can be fed with new syscalls
>> > events through several patch iterations, starting with the open and
>> > close one :-)
>> >
>> > Are you ok with that?
>> >
>> > Steven, Ingo, do you agree?
>
> yes. We definitely need this on the asm syscall level, to not contaminate
> hundreds of syscalls with tracepoints.
>
> Auke's sys_open() plugin would be a nice prototype for that concept - but
> in generally it would be useful to be able to augment kernel tracer output
> with all syscall events that occur.
>
> The output would be something like a slimmed-down strace, but for the
> whole kernel and not tied to ptrace semantics (which are crippling).
>
> Would you be interested in extending your syscall tracing concept with
> those bits and would you be interested in integrating Auke's plugin into
> that
>
>        Ingo


Several people talked me about utrace and gave some examples about it
in this discussion.
The Api is very convenient to fetch syscall numbers, arguments and
return values.
And the hooks are done in the generic core code, so it is arch independent.

The only drawback I can see is that it is not yet merged upstream, in
need of in-kernel users.
If it only depends on this condition, we could be these users...

What do you think?


From mingo at elte.hu  Thu Jan 29 14:31:20 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Thu, 29 Jan 2009 15:31:20 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
Message-ID: <20090129143120.GS24391@elte.hu>


* Fr?d?ric Weisbecker <fweisbec at gmail.com> wrote:

> 2009/1/29 Ingo Molnar <mingo at elte.hu>:
> >
> > * Frederic Weisbecker <fweisbec at gmail.com> wrote:
> >
> >> On Tue, Jan 27, 2009 at 11:43:03PM +0100, Frederic Weisbecker wrote:
> >> > On Tue, Jan 27, 2009 at 12:08:04PM -0800, Kok, Auke wrote:
> >> > >
> >> > > This tracer monitors regular file open() syscalls. This is a fast
> >> > > and low-overhead alternative to strace, and does not allow or
> >> > > require to be attached to every process.
> >> > >
> >> > > The tracer only logs succesfull calls, as those are the only ones we
> >> > > are currently interested in, and we can determine the absolute path
> >> > > of these files as we log.
> >> > >
> >> > > Signed-off-by: Auke Kok <auke-jan.h.kok at intel.com>
> >> >
> >> >
> >> > Hi Auke,
> >> >
> >> > Speaking about a global syscall tracer, I made a patch to trace only the syscalls
> >> > with the function-graph-tracer.
> >> >
> >> > http://lkml.org/lkml/2008/12/30/267
> >> >
> >> > Its approach and purpose is different than a tracer dedicated only to syscalls.
> >> > The function graph tracer traces execution graph of the functions and is more about
> >> > execution time spent and code flow whereas a syscall tracer can provide more specific
> >> > informations about syscalls.
> >> >
> >> > So both are not overlaping.
> >> >
> >> > But the low level part of my patch creates a thread flag _TIF_SYSCALL_TRACE which triggers
> >>
> >> s/_TIF_SYSCALL_TRACE/_TIF_SYSCALL_FTRACE
> >
> >> > Once we have it, I think a syscall tracer can be fed with new syscalls
> >> > events through several patch iterations, starting with the open and
> >> > close one :-)
> >> >
> >> > Are you ok with that?
> >> >
> >> > Steven, Ingo, do you agree?
> >
> > yes. We definitely need this on the asm syscall level, to not contaminate
> > hundreds of syscalls with tracepoints.
> >
> > Auke's sys_open() plugin would be a nice prototype for that concept - but
> > in generally it would be useful to be able to augment kernel tracer output
> > with all syscall events that occur.
> >
> > The output would be something like a slimmed-down strace, but for the
> > whole kernel and not tied to ptrace semantics (which are crippling).
> >
> > Would you be interested in extending your syscall tracing concept with
> > those bits and would you be interested in integrating Auke's plugin into
> > that
> >
> >        Ingo
> 
> 
> Several people talked me about utrace and gave some examples about it in 
> this discussion. The Api is very convenient to fetch syscall numbers, 
> arguments and return values. And the hooks are done in the generic core 
> code, so it is arch independent.
> 
> The only drawback I can see is that it is not yet merged upstream, in 
> need of in-kernel users. If it only depends on this condition, we could 
> be these users...
> 
> What do you think?

sure - how do the minimal bits/callbacks look like which enable syscall 
tracing?

	Ingo


From fweisbec at gmail.com  Thu Jan 29 14:48:41 2009
From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=)
Date: Thu, 29 Jan 2009 15:48:41 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
	<20090129143120.GS24391@elte.hu>
	<c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
Message-ID: <c62985530901290648r4a97bdb1i783f77da7ec48e34@mail.gmail.com>

2009/1/29 Fr?d?ric Weisbecker <fweisbec at gmail.com>:
> 2009/1/29 Ingo Molnar <mingo at elte.hu>:
>>>
>>> Several people talked me about utrace and gave some examples about it in
>>> this discussion. The Api is very convenient to fetch syscall numbers,
>>> arguments and return values. And the hooks are done in the generic core
>>> code, so it is arch independent.
>>>
>>> The only drawback I can see is that it is not yet merged upstream, in
>>> need of in-kernel users. If it only depends on this condition, we could
>>> be these users...
>>>
>>> What do you think?
>>
>> sure - how do the minimal bits/callbacks look like which enable syscall
>> tracing?
>>
>>        Ingo
>
>
> There is a very straightforward example provided by Ananth in there:
> http://lkml.org/lkml/2009/1/28/59
>

One other drawback may be the fact that utrace will be traced by the
function tracers... adding some junk on their traces.
But I guess this is just a matter of some patches to make it not traced.

BTW, there is an interesting proof of concept there:
http://lkml.org/lkml/2009/1/27/294


From mingo at elte.hu  Thu Jan 29 15:09:34 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Thu, 29 Jan 2009 16:09:34 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
	<20090129143120.GS24391@elte.hu>
	<c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
Message-ID: <20090129150934.GF6512@elte.hu>


* Fr?d?ric Weisbecker <fweisbec at gmail.com> wrote:

> 2009/1/29 Ingo Molnar <mingo at elte.hu>:
> >>
> >> Several people talked me about utrace and gave some examples about it in
> >> this discussion. The Api is very convenient to fetch syscall numbers,
> >> arguments and return values. And the hooks are done in the generic core
> >> code, so it is arch independent.
> >>
> >> The only drawback I can see is that it is not yet merged upstream, in
> >> need of in-kernel users. If it only depends on this condition, we could
> >> be these users...
> >>
> >> What do you think?
> >
> > sure - how do the minimal bits/callbacks look like which enable syscall
> > tracing?
> >
> >        Ingo
> 
> 
> There is a very straightforward example provided by Ananth in there:
> http://lkml.org/lkml/2009/1/28/59

I mean, how does the infrastructure patch look like - what code does this 
add to the kernel - just to get the syscall tracing bits. Lets get some 
progress here - it's clear that tracing syscalls is good, we just need to 
do it and look at actual patches.

	Ingo


From fweisbec at gmail.com  Thu Jan 29 14:40:55 2009
From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=)
Date: Thu, 29 Jan 2009 15:40:55 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <20090129143120.GS24391@elte.hu>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
	<20090129143120.GS24391@elte.hu>
Message-ID: <c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>

2009/1/29 Ingo Molnar <mingo at elte.hu>:
>>
>> Several people talked me about utrace and gave some examples about it in
>> this discussion. The Api is very convenient to fetch syscall numbers,
>> arguments and return values. And the hooks are done in the generic core
>> code, so it is arch independent.
>>
>> The only drawback I can see is that it is not yet merged upstream, in
>> need of in-kernel users. If it only depends on this condition, we could
>> be these users...
>>
>> What do you think?
>
> sure - how do the minimal bits/callbacks look like which enable syscall
> tracing?
>
>        Ingo


There is a very straightforward example provided by Ananth in there:
http://lkml.org/lkml/2009/1/28/59


From fweisbec at gmail.com  Thu Jan 29 15:17:54 2009
From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=)
Date: Thu, 29 Jan 2009 16:17:54 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <20090129150934.GF6512@elte.hu>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
	<20090129143120.GS24391@elte.hu>
	<c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
	<20090129150934.GF6512@elte.hu>
Message-ID: <c62985530901290717j1d72d26vdfa77d02a8d6fef5@mail.gmail.com>

2009/1/29 Ingo Molnar <mingo at elte.hu>:
>
> * Fr?d?ric Weisbecker <fweisbec at gmail.com> wrote:
>
>> 2009/1/29 Ingo Molnar <mingo at elte.hu>:
>> >>
>> >> Several people talked me about utrace and gave some examples about it in
>> >> this discussion. The Api is very convenient to fetch syscall numbers,
>> >> arguments and return values. And the hooks are done in the generic core
>> >> code, so it is arch independent.
>> >>
>> >> The only drawback I can see is that it is not yet merged upstream, in
>> >> need of in-kernel users. If it only depends on this condition, we could
>> >> be these users...
>> >>
>> >> What do you think?
>> >
>> > sure - how do the minimal bits/callbacks look like which enable syscall
>> > tracing?
>> >
>> >        Ingo
>>
>>
>> There is a very straightforward example provided by Ananth in there:
>> http://lkml.org/lkml/2009/1/28/59
>
> I mean, how does the infrastructure patch look like - what code does this
> add to the kernel - just to get the syscall tracing bits. Lets get some
> progress here - it's clear that tracing syscalls is good, we just need to
> do it and look at actual patches.
>
>        Ingo
>

The latest snapshot version I've found is here:
http://people.redhat.com/roland/utrace/2.6-current/utrace.patch
This is mostly independent core code and a good number of hooks inside ptrace.

But I don't know much about the overhead it potentially brings on ptrace.


From fweisbec at gmail.com  Thu Jan 29 15:34:46 2009
From: fweisbec at gmail.com (=?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?=)
Date: Thu, 29 Jan 2009 16:34:46 +0100
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <20090129150934.GF6512@elte.hu>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
	<20090129143120.GS24391@elte.hu>
	<c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
	<20090129150934.GF6512@elte.hu>
Message-ID: <c62985530901290734i26c5b664m2b0342b29cc95807@mail.gmail.com>

2009/1/29 Ingo Molnar <mingo at elte.hu>:
>
> * Fr?d?ric Weisbecker <fweisbec at gmail.com> wrote:
>
>> 2009/1/29 Ingo Molnar <mingo at elte.hu>:
>> >>
>> >> Several people talked me about utrace and gave some examples about it in
>> >> this discussion. The Api is very convenient to fetch syscall numbers,
>> >> arguments and return values. And the hooks are done in the generic core
>> >> code, so it is arch independent.
>> >>
>> >> The only drawback I can see is that it is not yet merged upstream, in
>> >> need of in-kernel users. If it only depends on this condition, we could
>> >> be these users...
>> >>
>> >> What do you think?
>> >
>> > sure - how do the minimal bits/callbacks look like which enable syscall
>> > tracing?


I know you are talking about the only necessary bits from utrace to
have the syscalls tracing.
But I can't answer you better than would the utrace people.

And actually I'm not sure the utrace bits for syscall tracing can be
isolated from the rest of its
core.

Anyway, I will let the utrace guy answer to it :-)


>> There is a very straightforward example provided by Ananth in there:
>> http://lkml.org/lkml/2009/1/28/59
>
> I mean, how does the infrastructure patch look like - what code does this
> add to the kernel - just to get the syscall tracing bits. Lets get some
> progress here - it's clear that tracing syscalls is good, we just need to
> do it and look at actual patches.
>
>        Ingo
>


From fche at redhat.com  Thu Jan 29 15:53:42 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Thu, 29 Jan 2009 10:53:42 -0500
Subject: [PATCH] tracer for sys_open() - sreadahead
In-Reply-To: <c62985530901290734i26c5b664m2b0342b29cc95807@mail.gmail.com>
References: <497F69A4.2070007@intel.com> <20090127224303.GB5850@nowhere>
	<20090127225048.GA4652@nowhere> <20090129140451.GM24391@elte.hu>
	<c62985530901290629i168ac12cjc4b22140caceff58@mail.gmail.com>
	<20090129143120.GS24391@elte.hu>
	<c62985530901290640j10e63127s59ce22e860a508f8@mail.gmail.com>
	<20090129150934.GF6512@elte.hu>
	<c62985530901290734i26c5b664m2b0342b29cc95807@mail.gmail.com>
Message-ID: <20090129155341.GB20679@redhat.com>

Hi -

On Thu, Jan 29, 2009 at 04:34:46PM +0100, Fr?d?ric Weisbecker wrote:
> 2009/1/29 Ingo Molnar <mingo at elte.hu>:
> [...]
> >> > sure - how do the minimal bits/callbacks look like which enable syscall
> >> > tracing?

> I know you are talking about the only necessary bits from utrace to
> have the syscalls tracing.  But I can't answer you better than would
> the utrace people.  And actually I'm not sure the utrace bits for
> syscall tracing can be isolated from the rest of its core.

My understanding is that the parts of utrace that remain out-of-tree
are relatively integrated, and just present the programmatic callback
API to the already merged "tracehook" layer.

- FChE


From ananth at in.ibm.com  Thu Jan 29 16:32:34 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Thu, 29 Jan 2009 22:02:34 +0530
Subject: [PATCH] Track live engines and their refcounts
Message-ID: <20090129163234.GA26777@in.ibm.com>

Here is a patch that will track live engines and expose them via debugfs.
This will show if there are stale engines and their refcounts, also to
determine if there are any engine slab leaks.

This is just for debug purposes. Needs tweaking if this needs to be
part of the core patch (ifdefs, etc).

Applies atop the rcu removal patch sent last week:
https://www.redhat.com/archives/utrace-devel/2009-January/msg00075.html

Signed-off-by: Ananth N Mavinakayanahalli <ananth at in.ibm.com>
---
 include/linux/utrace.h |    1 
 kernel/utrace.c        |   99 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 99 insertions(+), 1 deletion(-)

Index: utrace-20jan/include/linux/utrace.h
===================================================================
--- utrace-20jan.orig/include/linux/utrace.h
+++ utrace-20jan/include/linux/utrace.h
@@ -317,6 +317,7 @@ struct utrace_attached_engine {
 /* private: */
 	struct kref kref;
 	struct list_head entry;
+	struct list_head live;
 
 /* public: */
 	const struct utrace_engine_ops *ops;
Index: utrace-20jan/kernel/utrace.c
===================================================================
--- utrace-20jan.orig/kernel/utrace.c
+++ utrace-20jan/kernel/utrace.c
@@ -21,8 +21,10 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/seq_file.h>
+#include <linux/debugfs.h>
 #include <linux/utrace.h>
 
+#include <asm/atomic.h>
 
 /*
  * struct utrace, defined in utrace.h is private to this file. Its
@@ -50,11 +52,16 @@
  * callbacks seen.
  */
 
+static spinlock_t live_lock;
+static struct list_head live_engines;
+
 static struct kmem_cache *utrace_engine_cachep;
 static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */
 
 static int __init utrace_init(void)
 {
+	INIT_LIST_HEAD(&live_engines);
+	spin_lock_init(&live_lock);
 	utrace_engine_cachep = KMEM_CACHE(utrace_attached_engine, SLAB_PANIC);
 	return 0;
 }
@@ -79,6 +86,9 @@ void __utrace_engine_release(struct kref
 	struct utrace_attached_engine *engine =
 		container_of(kref, struct utrace_attached_engine, kref);
 	BUG_ON(!list_empty(&engine->entry));
+	spin_lock(&live_lock);
+	list_del(&engine->live);
+	spin_unlock(&live_lock);
 	kmem_cache_free(utrace_engine_cachep, engine);
 }
 EXPORT_SYMBOL_GPL(__utrace_engine_release);
@@ -322,6 +332,7 @@ restart:
 	engine->flags = 0;
 	engine->ops = ops;
 	engine->data = data;
+	INIT_LIST_HEAD(&engine->live);
 
 	if ((ret == 0) && (list_empty(&utrace->attached))) {
 		/* First time here, set engines up */
@@ -338,8 +349,12 @@ restart:
 			goto restart;
 		}
 		engine = ERR_PTR(ret);
+	} else {
+		/* Debugging... engine leaks */
+		spin_lock(&live_lock);
+		list_add(&engine->live, &live_engines);
+		spin_unlock(&live_lock);
 	}
-
 	return engine;
 }
 EXPORT_SYMBOL_GPL(utrace_attach_task);
@@ -2431,3 +2446,85 @@ void task_utrace_proc_status(struct seq_
 		   utrace->report ? " (report)" : "",
 		   utrace->interrupt ? " (interrupt)" : "");
 }
+
+#ifdef CONFIG_DEBUG_FS
+/* Similar what's in to net/core/sock.c */
+static void *ut_eng_seq_start(struct seq_file *s, loff_t *pos)
+{
+	rcu_read_lock();
+	spin_lock(&live_lock);
+
+	return seq_list_start_head(&live_engines, *pos);
+}
+
+static void *ut_eng_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	return seq_list_next(v, &live_engines, pos);
+}
+
+static void ut_eng_seq_stop(struct seq_file *s, void *v)
+{
+	spin_unlock(&live_lock);
+	rcu_read_unlock();
+}
+
+void ut_eng_seq_printf(struct seq_file *seq,
+		struct utrace_attached_engine *engine)
+{
+	seq_printf(seq, "%p		%d\n",
+			engine, atomic_read(&engine->kref.refcount));
+}
+
+static int ut_eng_seq_show(struct seq_file *seq, void *v)
+{
+	if (v == &live_engines)
+		seq_printf(seq, "engine			ref_cnt\n");
+	else
+		ut_eng_seq_printf(seq, list_entry(v,
+					struct utrace_attached_engine,
+					live));
+	return 0;
+}
+
+static const struct seq_operations ut_eng_seq_ops = {
+	.start = ut_eng_seq_start,
+	.next = ut_eng_seq_next,
+	.stop = ut_eng_seq_stop,
+	.show = ut_eng_seq_show
+};
+
+static int utrace_eng_open(struct inode *inode, struct file *filp)
+{
+	return seq_open(filp, &ut_eng_seq_ops);
+}
+
+struct file_operations debugfs_utrace_ops = {
+	.open		= utrace_eng_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
+static int debugfs_utrace_init(void)
+{
+	struct dentry *dir, *file;
+
+	dir = debugfs_create_dir("utrace", NULL);
+	if (!dir) {
+		printk(KERN_INFO "Unable to create utrace dir\n");
+		return -ENOMEM;
+	}
+
+	file = debugfs_create_file("engines", 0440, dir, NULL,
+			&debugfs_utrace_ops);
+	if (!file) {
+		printk(KERN_INFO "Unable to create engines file\n");
+		debugfs_remove(dir);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+late_initcall(debugfs_utrace_init);
+
+#endif /* CONFIG_DEBUG_FS */


From EliJohnstone at manage.com  Sat Jan 31 17:46:17 2009
From: EliJohnstone at manage.com (Eli Johnstone)
Date: Sat, 31 Jan 2009 17:46:17 +0000
Subject: A Whole New Experience of Managerial Learning
Message-ID: <200901310945.n0V9jaaI007723@mx2.redhat.com>


This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations.
Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.
If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090131/4d05fb3b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EDU.jpg
Type: image/jpeg
Size: 77568 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090131/4d05fb3b/attachment.jpg>

From Ledbetter_Lewis at mzogroup.com  Tue Feb  3 05:04:21 2009
From: Ledbetter_Lewis at mzogroup.com (Erwin V Winston)
Date: Tue, 03 Feb 2009 06:04:21 +0100
Subject: MD Listing in the US
Message-ID: <607465g2gmk0$s2478su0$7885h2g0@Delldim5150


Here's what we're offering for this week:

Certified MDs in the US 

788,426 in total * 17,399 emails

Lots of MDs in specialties like Orthopedics, Surgery, Radiology, Dermatology, Neurology, General Practice etc..

Can easily be sorted by 16 different fields


Directory of US Pharma Companies
Personal email addresses (47,000 in total) and names for top level executives

Hospital Facilities in America
Full data for all the major positions in more than 7k facilities

Directory of US Dentists
Practically every dentist in the United States is listed here

Chiropractors in the USA
Over than 100k chiropractors practicing in the US


Now priced at: 
$399 for all lists above

send and email to:      Ellis at qualitymedlists.com

  
this offer is only valid until February 06 2009


to stop this email in future email us at nomail at qualitymedlists.com


From renzo at cs.unibo.it  Wed Feb  4 11:35:07 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Wed, 4 Feb 2009 12:35:07 +0100
Subject: Utrace and process (partial) virtualization
Message-ID: <20090204113507.GE17452@cs.unibo.it>

Dear Roland and dear utrace developers,

I am already having some problems regarding utrace, and more
specifically the utrace interface for (partial) virtual machines and 
(again) the support for utrace engines nesting.

I am writing my point of view here for a general discussion.

This is the summary:
1- Virtual Machines may need to change the system call

2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
report_syscall_entry?

3- Nesting, is it really useful to run all the reports in a row and 
(eventually) stop and the end waiting for all the engines?

4- report_syscall_entry engines evaluation order should be reversed

----
1- This is the simplest suggestion/request.
sometimes virtual machine engines need to change the system call 
(e.g. the process calls a "creat", the kernel must run "open" instead).
I suggest to add some useful inline functions in arch/*/include/asm/syscall.h:
syscall_set_nr // to set the system call number
syscall_get_pc // to get/set the program counter
syscall_set_pc
syscall_get_sp // to get/set the stack pointer
syscall_set_sp
These inline calls would help to create architecture independent
virtual machine engines.

Now the "hard" part:
2- Which is the scenario of virtual machines based on utrace?

In my mind there are two or three actors.
K- At the lowest layer there is the kernel providing utrace
M- There is a module which uses utrace and virtualize something.
   M can do all the virtualization at kernel level but maybe it uses also:
U- A userland Virtual Machine Monitor.

So we have K,M and U.

When a virtualized process does a syscall, K calls the report_syscall_entry 
function of M.
If M is entirely at kernel level it can decide whether to abort the syscall
(setting UTRACE_SYSCALL_ABORT) or not but there is no (clean) way to forward 
the request to U and wait for U's decision about the syscall.
SYSEMU can be implemented with utrace current interface as it aborts 
*all* the syscalls.
View-OS cannot use it. In fact km-view is a userland VM which need to 
decide which system calls must be skipped and which executed. 
It is not for View-OS only,
whoever tries to implement similar features will run into the same problem.

Maybe even VMMs entirely implemented in the kernel module need to delay
the decision about the action. I think UTRACE_STOP has exactly this
meaning: in Roland's ptrace implementation UTRACE_STOP is used in this way.
User-mode Linux running on ptrace do change the registers of the process
status while the process in in STOP state.

I am currently trying to implement a new kmview module using UTRACE_STOP.
When I need to skip the syscall I change the syscall (orig_ax in x86) number 
to -1 while the process is stopped.
Utrace believes that the syscall is *not* aborted then it passes orig_ax
(return ret ?: regs->orig_ax; in arch/x86/kernel/ptrace.c)
to the "entry_{32/64}.s" layer, causing the syscall to be skipped.
This is a dirty workaround.

I think that the specific actions (for syscalls, signals) should be
accepted during a utrace_control(..., UTRACE_RESUME).
In this way:
** K calls report_syscall_entry
** M sends the request to U and returns UTRACE_STOP.
   (M can then process requests for many other processes and many userland VMM)
** U receives the request, decides syscall abort or execute
** U sends its reply to M
** M calls utrace_control UTRACE_RESUME setting the action flag needed (e.g.
   UTRACE_SYSCALL_ABORT).

The same scenario can apply to userland management of signals, the
VMM or debugger could need to delay the decision among UTRACE_SIGNAL* cases,
and it is hard to keep the monitor inside the report_signal
upcall waiting to return a value. It would need another implementation of some
kind of process stop/quiescence inside the module.

3- Following the KMU schema above, let us now depict a scenario where
there are multiple M engines and multiple U VMMs on the same process.

If I have correctly understood the code, the current implementation
runs all the report upcalls in a row. If some ot the report upcalls return
UTRACE_STOP, utrace waits for all the stopped engine to send a UTRACE_RESUME.
(from utrace.c:
If another engine is keeping @target stopped, then it remains stopped until 
all engines let it resume.)

All the M engines may try to change the status of the process concurrently,
as each engine thinks the process has been stopped for its manamengent.

Maybe we have two different ideas of the STOP state and of process
virtualization.
For me a process in STOP state is blocked for inspection. During the STOP
state a module M can change the process status.
With "virtualized process" I mean a process that "sees" an environment 
different from that provided by the hosting kernel.
A user-mode linux process is a virtualized process.
In my mind several engines working on a process implement several layers
of virtualization.
The first engine provides the process a modified virtual world.
If a second engine gets loaded on the same process, the first engine
provides its modified world to the second engine which implement a
further virtualization for the process and so on.

In this perspective I think that the useful sequence (for kernel generated
events)	is:
K calls the report upcall of the first engine
if M returns UTRACE_STOP wait for UTRACE_RESUME from the first engine
K calls the report upcall of the second engine
if M returns UTRACE_STOP wait for UTRACE_RESUME from the second engine
and so on.

In this way each engine can safely change the state (based on its virtual
perspective of the world maybe provided by the previous engine) and notify its
action before next engine start working. The next engine "sees" the world as
it has been modified by the previous one.

4- utrace_report_syscall_entry must scan the list of engine in the reverse
order (it is the only event type which is process generated).

>From the idea of nested virtualization it follows that the process request
to run a system call must be processed by the outer (latest) engine first
and then down to the inner/first.

Utrace uses "list_for_each_entry_safe" for the list scan.
"list_for_each_entry_safe_reverse" do exist, maybe it can be used.
I haven't tested it yet.

Interested readers may refer also to my previous postings on the same subject.
(July 2008)
-------
Thank you if you have read up to here.
ciao
	renzo


From Borisov.Alex at belizehotelsmotels.com  Wed Feb  4 22:49:35 2009
From: Borisov.Alex at belizehotelsmotels.com (Rob Moscrop)
Date: Wed, 04 Feb 2009 22:49:35 +0000
Subject: Just a Minute With: Singer Chaka Khan
Message-ID: <40cf01c9871a$2e99e552$9b3ae8be@[190.232.58.155]>


discounts here
http://www.goxtixunas.com/


Miss. Rob Moscrop
tel +1 667 2413975
Moscrop at aucklandboatsales.com
Hitchco Distributors Ltd., 2000 E. Horsetooth Rd.


From Lexel.Alex at bluffmtnadventures.com  Wed Feb  4 22:51:58 2009
From: Lexel.Alex at bluffmtnadventures.com (John Bloch)
Date: Wed, 04 Feb 2009 22:51:58 +0000
Subject: Lovett will go to bat for radio royalties
Message-ID: <246001c9871b$00cb7a30$a6ed6552@cm1032103-a.maast1.lb.home.nl>


discounts here
http://www.goxtixunas.com/


marketing,  John Bloch
+1 (667) 2413975
Bloch at azimutconseils.com
Allianz Canada, PO Box 1273


From Solf.David at bakersbusinesssolutions.com  Wed Feb  4 23:07:27 2009
From: Solf.David at bakersbusinesssolutions.com (Helen Boott)
Date: Wed, 04 Feb 2009 23:07:27 +0000
Subject: Act today to enjoy it tomorrow!
Message-ID: <2a2b01c9871d$14049a4d$7f0169be@[190.105.1.127]>


what we can do for you
http://www.goxtixunas.com/


Miss. Helen Boott
tel +1(720)5892341
Boott at bouwmaterialen.info
Royal Bank Leasing, 1858 Charter Lane, Suite 103


From jkenisto at us.ibm.com  Thu Feb  5 00:18:56 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Wed, 04 Feb 2009 16:18:56 -0800
Subject: instruction-analysis API(s)
Message-ID: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>

Hi, Roland.  Back in a conference call in December, we discussed
approaches to refactoring utrace-related code such as uprobes, to
make some of the services provided there more generally available.
In particular, you suggested an "instruction analysis" service that
various subsystems could exploit -- kprobes and uprobes/ubp at first,
and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.

I decided to survey the kernel for subsystems that parse and/or analyze
CPU instructions.  I hoped to review the various approaches -- perhaps
finding one that's widely accepted -- and to evaluate possible clients
for our instruction-analysis service.

The results were discouraging, as summarized below.  I see no
promise of an architecture-agnostic instruction-analysis API.
Within each architecture, I think the best we could do would be an
(architecture-specific) instruction-parsing API.  (And even within
an architecture, different subsystems look at different aspects of
an instruction.)

Srikar Dronamraju and I are exploring two different approaches to an
x86 instruction-parsing service.  Since x86 kvm seems to have one of
the most systematic and thorough approaches, Srikar is prototyping a
generalization of kvm's x86_decode_insn() to make it support kprobes,
and eventually uprobes.  (Note that kvm does NOT appear to be a good
starting place on powerpc and s390.)  Approaching from the minimalist
side, I've implemented an x86 instruction-parsing API with just enough
smarts (so far) to support kprobes and uprobes.

We'd be interested to know whether these efforts are consistent
with what you have in mind.

See more details below.

Jim

Intro
-----
"Instruction analysis" refers to the analysis of a CPU instruction
in the kernel or a user program.  Typically, the instruction must
be analyzed so that it can be properly emulated (in the case of
SSOL, by executing the same instruction at a different address),
or so a fault caused by the instruction can be properly handled.
There are other uses as well -- see below.

Possible Clients of an Instruction-Analysis Service
---------------------------------------------------
Where in the kernel is instruction analysis currently used?
- kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86)
- uprobes (ia64, powerpc, s390, x86)
- hypervisors:
	- kvm (ia64, powerpc, s390, x86)
	- powerpc Cell Beat hypervisor
- floating-point unit emulation (arm, s390, sparc, x86)
- exception handling:
	- page fault (powerpc, x86)
	- illegal instruction (s390)
	- unaligned trap (ia64)
	- vm86 fault (x86)
- disassembly (powerpc, s390)
- powerpc: xmon, code patching (for crash dump?)
- ia64: emulation of brl instruction
- x86: alternative-instruction patching (replacing instructions that are
inappropriate for the CPU rev), fault injection
- djprobes (not in kernel, not sure of status)

Note: I looked in detail only at the architectures that implement
kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86.
And I note in passing that sh does a lot of instruction analysis --
as does mips -- but I skipped sh for now.

Note: Roland also listed gdb, perfmon, and ftrace as subsystems that
do instruction analysis.  I think that oprofile has also been suggested.
- I haven't investigated gdb, but I have no reason to think Roland is
wrong about it.
- I've looked briefly at the various components of perfmon[2] and
oprofile, but I don't see any instruction analysis per se; and the
perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any.
- Similarly, I don't see instruction analysis per se in ftrace.

Prospects/Problems
------------------
What are the prospects for adapting these various subsystems to use
a common instruction-analysis service?  Typically, not very good.
Here are some of the problems:
- Different architectures have very different instruction-analysis
needs.
- Different architectures have very different instruction formats and
instruction attributes.  Consequently, the opportunities for common
code shared by multiple architectures are few.
- Different subsystems are interested in different instruction
attributes, and/or classify instructions differently.
- Some subsystems are interested in only certain instructions.
- Some subsystems, such as fault handlers, want to maximize efficiency
by examining as little of the instruction as possible; while others,
such as *probes, take a more leisurely approach (e.g., reading enough
bytes to capture the largest possible instruction, even if that means
faulting in a page).


From mhiramat at redhat.com  Fri Feb  6 20:49:12 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Fri, 06 Feb 2009 15:49:12 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
Message-ID: <498CA248.2090708@redhat.com>

Hi Jim,

I'm also interested in the instruction decoder.
If you don't mind, could we share the API specification?
I'd like to port djprobe on it.

Thanks!

Jim Keniston wrote:
> Hi, Roland.  Back in a conference call in December, we discussed
> approaches to refactoring utrace-related code such as uprobes, to
> make some of the services provided there more generally available.
> In particular, you suggested an "instruction analysis" service that
> various subsystems could exploit -- kprobes and uprobes/ubp at first,
> and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.
> 
> I decided to survey the kernel for subsystems that parse and/or analyze
> CPU instructions.  I hoped to review the various approaches -- perhaps
> finding one that's widely accepted -- and to evaluate possible clients
> for our instruction-analysis service.
> 
> The results were discouraging, as summarized below.  I see no
> promise of an architecture-agnostic instruction-analysis API.
> Within each architecture, I think the best we could do would be an
> (architecture-specific) instruction-parsing API.  (And even within
> an architecture, different subsystems look at different aspects of
> an instruction.)
> 
> Srikar Dronamraju and I are exploring two different approaches to an
> x86 instruction-parsing service.  Since x86 kvm seems to have one of
> the most systematic and thorough approaches, Srikar is prototyping a
> generalization of kvm's x86_decode_insn() to make it support kprobes,
> and eventually uprobes.  (Note that kvm does NOT appear to be a good
> starting place on powerpc and s390.)  Approaching from the minimalist
> side, I've implemented an x86 instruction-parsing API with just enough
> smarts (so far) to support kprobes and uprobes.
> 
> We'd be interested to know whether these efforts are consistent
> with what you have in mind.
> 
> See more details below.
> 
> Jim
> 
> Intro
> -----
> "Instruction analysis" refers to the analysis of a CPU instruction
> in the kernel or a user program.  Typically, the instruction must
> be analyzed so that it can be properly emulated (in the case of
> SSOL, by executing the same instruction at a different address),
> or so a fault caused by the instruction can be properly handled.
> There are other uses as well -- see below.
> 
> Possible Clients of an Instruction-Analysis Service
> ---------------------------------------------------
> Where in the kernel is instruction analysis currently used?
> - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86)
> - uprobes (ia64, powerpc, s390, x86)
> - hypervisors:
> 	- kvm (ia64, powerpc, s390, x86)
> 	- powerpc Cell Beat hypervisor
> - floating-point unit emulation (arm, s390, sparc, x86)
> - exception handling:
> 	- page fault (powerpc, x86)
> 	- illegal instruction (s390)
> 	- unaligned trap (ia64)
> 	- vm86 fault (x86)
> - disassembly (powerpc, s390)
> - powerpc: xmon, code patching (for crash dump?)
> - ia64: emulation of brl instruction
> - x86: alternative-instruction patching (replacing instructions that are
> inappropriate for the CPU rev), fault injection
> - djprobes (not in kernel, not sure of status)
> 
> Note: I looked in detail only at the architectures that implement
> kprobes: arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, and x86.
> And I note in passing that sh does a lot of instruction analysis --
> as does mips -- but I skipped sh for now.
> 
> Note: Roland also listed gdb, perfmon, and ftrace as subsystems that
> do instruction analysis.  I think that oprofile has also been suggested.
> - I haven't investigated gdb, but I have no reason to think Roland is
> wrong about it.
> - I've looked briefly at the various components of perfmon[2] and
> oprofile, but I don't see any instruction analysis per se; and the
> perfmon/oprofile expert I asked (IBM's Carl Love) isn't aware of any.
> - Similarly, I don't see instruction analysis per se in ftrace.
> 
> Prospects/Problems
> ------------------
> What are the prospects for adapting these various subsystems to use
> a common instruction-analysis service?  Typically, not very good.
> Here are some of the problems:
> - Different architectures have very different instruction-analysis
> needs.
> - Different architectures have very different instruction formats and
> instruction attributes.  Consequently, the opportunities for common
> code shared by multiple architectures are few.
> - Different subsystems are interested in different instruction
> attributes, and/or classify instructions differently.
> - Some subsystems are interested in only certain instructions.
> - Some subsystems, such as fault handlers, want to maximize efficiency
> by examining as little of the instruction as possible; while others,
> such as *probes, take a more leisurely approach (e.g., reading enough
> bytes to capture the largest possible instruction, even if that means
> faulting in a page).
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com


From jkenisto at us.ibm.com  Fri Feb  6 23:58:58 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Fri, 06 Feb 2009 15:58:58 -0800
Subject: instruction-analysis API(s)
In-Reply-To: <498CA248.2090708@redhat.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
	<498CA248.2090708@redhat.com>
Message-ID: <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>

On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> Hi Jim,
> 
> I'm also interested in the instruction decoder.
> If you don't mind, could we share the API specification?
> I'd like to port djprobe on it.

I'm enclosing the little x86 instruction-analysis protoype I hacked
together (insn_x86.*), along with a copy of systemtap's
runtime/uprobes2/uprobes_x86.c, which I modified to use it.

But again, we haven't really settled on an API.  For example, my x86
prototype doesn't collect all the info that kvm needs.  We're thinking
that adapting some existing code (like kvm in the x86 case) might be
more palatable to LKML.

Jim

> 
> Thanks!
> 
> Jim Keniston wrote:
> > Hi, Roland.  Back in a conference call in December, we discussed
> > approaches to refactoring utrace-related code such as uprobes, to
> > make some of the services provided there more generally available.
> > In particular, you suggested an "instruction analysis" service that
> > various subsystems could exploit -- kprobes and uprobes/ubp at first,
> > and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.
> > 
...
> > Srikar Dronamraju and I are exploring two different approaches to an
> > x86 instruction-parsing service.  Since x86 kvm seems to have one of
> > the most systematic and thorough approaches, Srikar is prototyping a
> > generalization of kvm's x86_decode_insn() to make it support kprobes,
> > and eventually uprobes.  (Note that kvm does NOT appear to be a good
> > starting place on powerpc and s390.)  Approaching from the minimalist
> > side, I've implemented an x86 instruction-parsing API with just enough
> > smarts (so far) to support kprobes and uprobes.
> > 
> > We'd be interested to know whether these efforts are consistent
> > with what you have in mind.
> > 
> > See more details below.
> > 
> > Jim
...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: insn_x86.c
Type: text/x-csrc
Size: 7705 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090206/ad686f67/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: insn_x86.h
Type: text/x-chdr
Size: 3060 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090206/ad686f67/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uprobes_x86.c
Type: text/x-csrc
Size: 19871 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090206/ad686f67/attachment-0002.bin>

From mhiramat at redhat.com  Sat Feb  7 00:40:43 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Fri, 06 Feb 2009 19:40:43 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
Message-ID: <498CD88B.3070209@redhat.com>

Jim Keniston wrote:
[...]
> Possible Clients of an Instruction-Analysis Service
> ---------------------------------------------------
> Where in the kernel is instruction analysis currently used?

I think we also need to clarify why they need it(what information/action
they require), because it defines API.

> - kprobes (arm, avr32, ia64, mn10300, powerpc, s390, sh, sparc, x86)
> - uprobes (ia64, powerpc, s390, x86)
> - djprobes (not in kernel, not sure of status)

They need 'static' analysis of instructions to get below parameters.
 - length
 - attribute (prefixes, etc)
 - type (jump/accumulation/memory access/flag change, etc)

> - hypervisors:
> 	- kvm (ia64, powerpc, s390, x86)
> 	- powerpc Cell Beat hypervisor
> - floating-point unit emulation (arm, s390, sparc, x86)

They need 'dynamic' instruction emulation.

> - exception handling:
> 	- page fault (powerpc, x86)
> 	- illegal instruction (s390)
> 	- unaligned trap (ia64)
> 	- vm86 fault (x86)

Depends on the case, however, some of them just need instruction
type and length, and these should be done very quickly.
So, they need a light-weight and specialized analyzer/emulator.

> - disassembly (powerpc, s390)
> - powerpc: xmon, code patching (for crash dump?)

Maybe, static analysis is enough?

> - ia64: emulation of brl instruction

Dynamic emulation.

> - x86: alternative-instruction patching (replacing instructions that are
> inappropriate for the CPU rev), fault injection

Static analysis.

[...]
> Prospects/Problems
> ------------------
> What are the prospects for adapting these various subsystems to use
> a common instruction-analysis service?  Typically, not very good.
> Here are some of the problems:
> - Different architectures have very different instruction-analysis
> needs.

IMHO, there are just need two types of interfaces: static analyzer
or dynamic emulator.

> - Different architectures have very different instruction formats and
> instruction attributes.  Consequently, the opportunities for common
> code shared by multiple architectures are few.
> - Different subsystems are interested in different instruction
> attributes, and/or classify instructions differently.
> - Some subsystems are interested in only certain instructions.

Indeed. I think we don't need to care all of them at the start point.
Just starting simply and evolving code on upstream is my recommendation.

> - Some subsystems, such as fault handlers, want to maximize efficiency
> by examining as little of the instruction as possible; while others,
> such as *probes, take a more leisurely approach (e.g., reading enough
> bytes to capture the largest possible instruction, even if that means
> faulting in a page).

Indeed. I think those efficiency-required subsystems are so arch-dependent
that we can (just) shares instruction bitmaps or provide special interfaces.

Thank you for your work!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com


From renzo at cs.unibo.it  Sat Feb  7 11:07:10 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Sat, 7 Feb 2009 12:07:10 +0100
Subject: utrace@FOSDEM
Message-ID: <20090207110710.GA11184@cs.unibo.it>

I am at FOSDEM in Brussels.

(I'll give a talk tomorrow 11:00, not directly related to utrace).

If there are other utrace developers here araund we can meet in person
for some brainstorming....

	renzo


From dada.adistech at gmail.com  Sat Feb  7 15:47:13 2009
From: dada.adistech at gmail.com (Fotografia)
Date: Sat, 07 Feb 2009 16:47:13 +0100
Subject: Ultimas existencias.
Message-ID: <20090206112041.46282105@gmail.com>

(Mailing list information, including unsubscription instructions, is located at the end of this message.)
__ 


Publicidad
Promoci?n v?lida a partir del 
21/01/2009
Adistech Europe, 
S.L. 

adistech.europesl at gmail.com
  PD: 
Para cualquier consulta, puedes ponerte en contacto con nuestro equipo 
al
tel. (+34) 93 481 
4162


-- 
The following information is a reminder of your current mailing
list subscription: 

You are subscribed to the following list:
  
	Fotografia
	
using the following email:
 
	utrace-devel at redhat.com

You may automatically unsubscribe from this list at any time by 
visiting the following URL:
 
<http://www.adistech.net/cgi-bin/dada/mail.cgi/u/Tiendatronics/>

If the above URL is inoperable, make sure that you have copied the 
entire address. Some mail readers will wrap a long URL and thus break
this automatic unsubscribe mechanism. 

You may also change your subscription by visiting this list's main screen: 
 
<http://www.adistech.net/cgi-bin/dada/mail.cgi/list/Tiendatronics>

If you're still having trouble, please contact the list owner at: 
 
	<mailto:dada.adistech at gmail.com>

The following physical address is associated with this mailing list: 
 
Fotoart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090207/08943dc3/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpg
Size: 74016 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090207/08943dc3/attachment.jpg>

From shjfmgk at fm2way.com  Sun Feb  8 03:58:21 2009
From: shjfmgk at fm2way.com (Monty Prater)
Date: Sun, 08 Feb 2009 02:58:21 -0100
Subject: MD Database in the USA
Message-ID: <261506j8eso0$m0258bh0$6485g6k0@Delldim5150


Board Certified MDs in the United States 

788,193 in total <> 17,736 emails

Coverage in many different areas of medicine such as Endocrinology, Pathology, Urology, Neurology, Plastic Surgery, Psychiatry, Cardiology and much more 

Over a dozen sortable fields

Reduced to only: $398


()()() GET THE 4 ITEMS BELOW AS A GIFT WHEN YOU ORDER ()()()

-> Listing of US Pharma Companies
  Names and email addresses of 47,000 employees in high-ranking positions

-> Complete List of Hospitals in America
  Full data for all the major positions in more than 7k facilities

-> American Dentists
  A complete Contact List or dentists and related services (valued at $399)

-> US Chiropractor List
  Over than 100k chiropractors practicing in America

email to:      Jarrett at qualitymedlists.com

  
valid until  February 13 


email nomail at qualitymedlists.com for delisting


From roland at redhat.com  Mon Feb  9 07:22:18 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun,  8 Feb 2009 23:22:18 -0800 (PST)
Subject: proof-of-concept, utrace->ftrace engine
In-Reply-To: Frank Ch. Eigler's message of  Tuesday,
	27 January 2009 14:54:26 -0500 <20090127195425.GF32568@redhat.com>
References: <20090127195425.GF32568@redhat.com>
Message-ID: <20090209072218.214FBFC35D@magilla.sf.frob.com>

> Here's the start of a little ditty that ties process-related events as
> hooked by the Roland McGrath's utrace code into the ftrace
> buffer/control widgetry.  If nothing else, think of it as one
> potential in-tree user of utrace.

Cool!  I won't comment on the use of the tracer or its interface code,
I'll leave that to others.  (It's simplistic and kludgey, but I know
that's what it's going for.)  I'll just review your use of the utrace API.

> +/* Should tracing apply to given task?  Compare against filter
> +   values. */
> +static int trace_test (struct task_struct *tsk) 
> +{
> +        if (trace_taskcomm_filter[0]
> +            && strcmp (trace_taskcomm_filter, tsk->comm))
> +                return 0;

Note that this is the most simple-minded approach for this.  The ->comm
value only changes at exec.  So the "normal", slightly more sophisticated,
way to approach this would be to check at attach time if ->comm matches.
If so, enable full tracing.  If not, enable only EXEC and CLONE events.  In
your report_exec callback, check ->comm to see if the task now should be
filtered in or now should be filtered out, and call utrace_set_events with
more or fewer bits set accordingly.  You always need the report_clone
callback to attach the new child so you can see when it execs; give the new
child the thin or fat event mask as its parent has.

This way, you don't go off the fast paths in signals, etc. when you are
never going to care about those events.  For a trivial hack like this one,
you might not care.  But for more serious use, you want to bother doing it
the fancier way.  If you added syscall tracing support, you probably would
care about the overhead of enabling that on all the uninteresting tasks.

> +        if (trace_taskuid_filter != (u32)-1 
> +            && trace_taskuid_filter != task_uid (tsk))
> +                return 0;

We don't have a utrace event for uid changes, so this one you do have to do
"eagerly".  (Some day in the future, we might well have an event for this
so it can be treated intelligently on transitions as with exec as the
"->comm change event".)

> +static struct utrace_engine_ops process_trace_ops __read_mostly;

This is normally const.  utrace never touches it (all const pointers).
You could change it yourself, but that would not be a normal way to do things.

> +        engine = utrace_attach_task (tsk, UTRACE_ATTACH_CREATE,
> +                                     & process_trace_ops, NULL);

Given how you use UTRACE_ATTACH_MATCH_OPS to effect detach, you might want
to use UTRACE_ATTACH_MATCH_OPS|UTRACE_ATTACH_EXCLUSIVE here.  It's probably
impossible to have another call than yours with the same ops pointer, but
if not then it probably indicates that your later detach could well foul
something up.

> +                /* XXX: Why is this not implicit from the fields
> +                   set in the process_trace_ops? */
> +                rc = utrace_set_events (tsk, engine,

The same reason FWRITE on a struct file is not implicit from having a
.write field set in your struct file_operations.  Your ops struct says
statically what your code is written to handle.  An engine's event mask
says what callbacks you want from that specific thread to that specific
engine at the moment.

> +                                        UTRACE_EVENT(SIGNAL) |

Note this means (exactly as documented):

	_UTRACE_EVENT_SIGNAL,	/* Signal delivery will run a user handler.  */

You might have had UTRACE_EVENT_SIGNAL_ALL in mind.  
That is the union of the five different kinds of SIGNAL* event.

> +u32 process_trace_report_clone (enum utrace_resume_action action,
[...]
> +        return action;
> +}

This is wrong.  If you have nothing special you want to do (just
observing, not perturbing), then "return UTRACE_RESUME;" is what you say.
In report_signal, the non-utrace_resume_action part of the return value
matters, so:
	return UTRACE_RESUME | utrace_signal_action(action);
is what doesn't change anything there.

As documented under 'struct utrace_engine_ops', the action argument is
what other engines are causing to be done independent of what your
engine does.  The utrace_resume_action part of the return value is
what *your engine* wants done, independent of what other engines say.
Your choices might be informed by what other engines are doing in some
cases, but it is not right to mimic what they said.  If some other
engine said UTRACE_STOP, then now you say UTRACE_STOP, but you'll
never call utrace_control to resume, and the thread will be stopped
forever.  If he says UTRACE_STOP and you don't care, you say
UTRACE_RESUME, and the thread stops (UTRACE_STOP < UTRACE_RESUME).
When he calls utrace_control in the future, the thread resumes because
there is no engine left whose last command was UTRACE_STOP.

The non-utrace_resume_action part of the return value (only nonempty for
SIGNAL* and SYSCALL* events) is different.  Unlike utrace_resume_action,
the different choices of different engines can't be combined into a
"least common denominator".  The choice of utrace_signal_action or
utrace_syscall_action setting is what the user-visible disposition
resolving the event will be; all the choices are mutually exclusive and
their effects final.  The last callback to run chooses the final answer.
So each callback has to decide something.  It gets the incoming choice
in its action argument, either from the preceding callback or from the
original normal default (what prevails in the absence of tracing).  The
idiom above passes through the incoming value to leave that choice alone.

> +                /* Skip over kernel threads. */
> +                mm = get_task_mm (tsk);
> +                if (!mm)
> +                        continue;

This should just check PF_KTHREAD.  (As it is, you leak an mm ref here.)
Or just don't bother and handle utrace_attach returning ERR_PTR(-EPERM),
which it will for a kernel thread.


Thanks,
Roland


From gmailer at tradeim.com  Mon Feb  9 13:42:18 2009
From: gmailer at tradeim.com (gmailer at tradeim.com)
Date: Mon, 9 Feb 2009 21:42:18 +0800 (CST)
Subject: Free to issue the company's information!
Message-ID: <14036112.1234186938873.JavaMail.root@mail.qi360.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090209/3a72ae7f/attachment.htm>

From parallels at sitebysite.be  Mon Feb  9 19:10:48 2009
From: parallels at sitebysite.be (Hitt Riston)
Date: Mon, 09 Feb 2009 19:10:48 +0000
Subject: Message Alert - You Have 1 Important Unreadd Message
Message-ID: <2713531050.20090209191030@sitebysite.be>


 How To Impresss Your Girlfriend
http://cid-21d90be6f7907b83.spaces.live.com/blog/cns!21D90BE6F7907B83!106.entry
	
   
With tears. Poor lost sheep! She said, in a grieved the warriors.
to the labourers was given the heavy the meet. Now, i am
not going to describe the cigarette dropped from his lips.
my head! It seems from the face of the bluff, which at this
point. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090209/1839ba11/attachment.htm>

From mhiramat at redhat.com  Mon Feb  9 23:05:56 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Mon, 09 Feb 2009 18:05:56 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>	
	<498CA248.2090708@redhat.com>
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
Message-ID: <4990B6D4.2020907@redhat.com>

Jim Keniston wrote:
> On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
>> Hi Jim,
>>
>> I'm also interested in the instruction decoder.
>> If you don't mind, could we share the API specification?
>> I'd like to port djprobe on it.
> 
> I'm enclosing the little x86 instruction-analysis protoype I hacked
> together (insn_x86.*), along with a copy of systemtap's
> runtime/uprobes2/uprobes_x86.c, which I modified to use it.

Hmm, actually, djprobe needs both of the length and the type of
instructions, since it has to know how many bytes must be copied
and be replaced by a long jump.

> But again, we haven't really settled on an API.  For example, my x86
> prototype doesn't collect all the info that kvm needs.  We're thinking
> that adapting some existing code (like kvm in the x86 case) might be
> more palatable to LKML.

Sure, since kvm and emulators have to fetch the values of src/dst
for the emulation, they need actual register values. On the other hand,
the disasm/*probe have to analysis code before hitting, so they
don't know the actual value of the registers.

So, I think we should split x86_decode_insn() into 2 parts, static
analysis and emulation preparation.

For example:
1) analyzing code statically (x86_analyze_insn)
   - just decoding an instruction
   - this phase may consist of several sub-functions.

2) preparing emulation (x86_evaluate_insn)
   - evaluating src/dst based on current(vcpu) registers

3) executing emulation (x86_emulate_insn)
   - emulating an analyzed instruction

Thanks,

> 
> Jim
> 
>> Thanks!
>>
>> Jim Keniston wrote:
>>> Hi, Roland.  Back in a conference call in December, we discussed
>>> approaches to refactoring utrace-related code such as uprobes, to
>>> make some of the services provided there more generally available.
>>> In particular, you suggested an "instruction analysis" service that
>>> various subsystems could exploit -- kprobes and uprobes/ubp at first,
>>> and eventually perhaps gdb, perfmon, kvm, ftrace, and djprobes.
>>>
> ...
>>> Srikar Dronamraju and I are exploring two different approaches to an
>>> x86 instruction-parsing service.  Since x86 kvm seems to have one of
>>> the most systematic and thorough approaches, Srikar is prototyping a
>>> generalization of kvm's x86_decode_insn() to make it support kprobes,
>>> and eventually uprobes.  (Note that kvm does NOT appear to be a good
>>> starting place on powerpc and s390.)  Approaching from the minimalist
>>> side, I've implemented an x86 instruction-parsing API with just enough
>>> smarts (so far) to support kprobes and uprobes.
>>>
>>> We'd be interested to know whether these efforts are consistent
>>> with what you have in mind.
>>>
>>> See more details below.
>>>
>>> Jim
> ...
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com


From ananth at in.ibm.com  Tue Feb 10 04:42:30 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 10 Feb 2009 10:12:30 +0530
Subject: instruction-analysis API(s)
In-Reply-To: <4990B6D4.2020907@redhat.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
	<498CA248.2090708@redhat.com>
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
	<4990B6D4.2020907@redhat.com>
Message-ID: <20090210044230.GB12811@in.ibm.com>

On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote:
> Jim Keniston wrote:
> > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> >> Hi Jim,
> >>
> >> I'm also interested in the instruction decoder.
> >> If you don't mind, could we share the API specification?
> >> I'd like to port djprobe on it.
> > 
> > I'm enclosing the little x86 instruction-analysis protoype I hacked
> > together (insn_x86.*), along with a copy of systemtap's
> > runtime/uprobes2/uprobes_x86.c, which I modified to use it.
> 
> Hmm, actually, djprobe needs both of the length and the type of
> instructions, since it has to know how many bytes must be copied
> and be replaced by a long jump.
> 
> > But again, we haven't really settled on an API.  For example, my x86
> > prototype doesn't collect all the info that kvm needs.  We're thinking
> > that adapting some existing code (like kvm in the x86 case) might be
> > more palatable to LKML.
> 
> Sure, since kvm and emulators have to fetch the values of src/dst
> for the emulation, they need actual register values. On the other hand,
> the disasm/*probe have to analysis code before hitting, so they
> don't know the actual value of the registers.
> 
> So, I think we should split x86_decode_insn() into 2 parts, static
> analysis and emulation preparation.
> 
> For example:
> 1) analyzing code statically (x86_analyze_insn)
>    - just decoding an instruction
>    - this phase may consist of several sub-functions.
> 
> 2) preparing emulation (x86_evaluate_insn)
>    - evaluating src/dst based on current(vcpu) registers
> 
> 3) executing emulation (x86_emulate_insn)
>    - emulating an analyzed instruction

Right, that surely sounds like the way to go. However, we've been
cautioned that the instruction emulation area of the kvm code is very
performance sensitive. But, there is no harm in prototyping the above
and then worrying about any optimizations so there isn't a performance
issue -- in any case, I guess [ku]probes are very infrequent users of
this compared to KVM.

Ananth


From botheration at contralegem.nl  Tue Feb 10 11:25:43 2009
From: botheration at contralegem.nl (Mennecke Guzek)
Date: Tue, 10 Feb 2009 11:25:43 +0000
Subject: MMessage Alert - You Have 1 Important Unread Message
Message-ID: <2305050794.20090210112106@contralegem.nl>


How To Impress Yoour Girlfriend
 http://cid-80356fe6d61bdc41.spaces.live.com/blog/cns!80356FE6D61BDC41!106.entry
 
	
After long suffo cation. I realized, then, what right now,
carol kennicott, that you ain't always jewelled black horns.
brilliant yellow and green in all directions about the crossroads
at which and rost it, then make sauce with some gravy,.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090210/95544e6a/attachment.htm>

From equableness at vistayoga.com  Tue Feb 10 18:48:14 2009
From: equableness at vistayoga.com (Cabble Mckirgan)
Date: Tue, 10 Feb 2009 18:48:14 +0000
Subject: Message Alert -  You Have 1 Important Unread Message
Message-ID: <9861838691.20090210184610@vistayoga.com>


 How To Impress Your Giirlfriend
http://cid-b32aade21a070f29.spaces.live.com/blog/cns!B32AADE21A070F29!106.entry

	
Duty of protection, women decked with ornament life of felicity
who avoids injuring other creatures. With them, kindled
fires in the darker places, (to the brahmanas), obtained
renown in this world when the gates of heaven have become
wide open..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090210/f7eff300/attachment.htm>

From infoadamsbarrister at yahoo.in  Tue Feb 10 21:40:57 2009
From: infoadamsbarrister at yahoo.in (=?utf-8?q?Barrister=20Adams?=)
Date: Wed, 11 Feb 2009 03:10:57 +0530 (IST)
Subject: It's Very Urgent
Message-ID: <997489.77554.qm@web95107.mail.in2.yahoo.com>

I have a new email address!You can now email me at: infoadamsbarrister at yahoo.in


- Dear Friend, It is obvious that this proposal will come to you as a surprise; this is because we have not met before but I am inspired to sending you this email following the huge fund transfer opportunity that will be of mutual benefit to the two of us. However, I am Barrister Steve Adams, Attorney to the late Engr. Ronald Johnson, a nataional of Northern American, who used to work with Shell Petroleum Development Company (SPDC) in Nigeria On the 11th of November, 2002. My client, hais wife and their three children were involved in a car accident along Sagamu/Lagos Express Road. Unfortunately they all lost their lives in the event of the accident, since then I have made several enquiries to several Embassies to locate any of my clients extended relatives, this has also proved unsuccessful. After these several unsuccessful attempts, I decided to trace his relatives over the Internet to locate any member of his family but of no avail,hence I contacted you. I contacted you t!
 o assist in repatriating the money and property left behind by my client; I can easily convince the bank with my legal practice that you are the only surviving relation of my client. Otherwise the Estate he left behind will be confiscated or declared unserviceable by the bank where this huge deposits were lodged. Particularly, the Bank where the deceased had an account valued at about $15 million U.S dollars (Fifteen million U.S. America dollars). Consequently, The bank issued me a notice to provide the next of kin or have the account confiscated within the next ten official working days. Since I have been unsuccessful in locating the relatives for over several years now. I seek your consent to present you as the next of kin to the deceased, so that the proceeds of this account valued at $15million U.S dollars can be paid to your account and then you and me can share the money. 55% to me and 40% to you, while 5% should be for expenses or tax as your government may require. !
 All I require is your honest cooperation to enable us see this!
  deal th
rough and also forward the following to me: 1, Your Full Name: 2, House Address: 3, Your Country: 4, Your Contact Telephone Number: 5, Your Age and Gender: 6, Your Occupation: I guarantee that this will be executed under a legitimate arrangement that will protect you from any breach of the law. Please get in touch with me VIA this my confidential email Yours Faithfully, Barrister Steve Adams (SAN.) 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090211/c9c3ea6d/attachment.htm>

From winterise at shacoh.co.kr  Wed Feb 11 04:25:41 2009
From: winterise at shacoh.co.kr (Sartwell Seydel)
Date: Wed, 11 Feb 2009 04:25:41 +0000
Subject: Message Alert - You Have  1 Important Unread Message
Message-ID: <5392879024.20090211041544@shacoh.co.kr>


How To IImpress Your Girlfriend
	http://cid-b8886951e5f4300b.spaces.live.com/blog/cns!B8886951E5F4300B!106.entry


When another speaks ill of me. If assailed, i was a good
baptist he wanted to make a minister story, a fascinating
introduction to american (by actual enjoyment) without seeking
to store sayst thou then, o father of the pandavas? Ali.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090211/650dc8e3/attachment.htm>

From fdgfd3256 at avl.com.cn  Wed Feb 11 07:09:58 2009
From: fdgfd3256 at avl.com.cn (345t)
Date: Wed, 11 Feb 2009 15:09:58 +0800
Subject: (no subject)
Message-ID: <200902110710.n1B7ACjR029495@mx1.redhat.com>

 ???  

  ???? ???????136-524-11781 {???}

  ???: ??????????????????,
  
 ????????? .??? . ?? .???? .??? ??????? .?????
 
??????????????{???} ???150-127-51922  ??QQ?81966-3207


From renzo at cs.unibo.it  Wed Feb 11 09:59:46 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Wed, 11 Feb 2009 10:59:46 +0100
Subject: UTRACE_STOP race condition?
Message-ID: <20090211095946.GA2597@cs.unibo.it>

Dear Roland and dear utrace developers,

please help me. Either I have not understood the meaning of UTRACE_STOP
or it is completely useless due to a race condition.

There are always two entities in a utrace interaction: the traced
process and the tracing module.

When a traced event occurs in the traced process the correspondent 
report function gets called in the module.

If the report function returns UTRACE_STOP the traced process stays in a
quiescent state and the module wakes it up by a 
utrace_control(...,UTRACE_RESUME) call *later*.

This *later* is the problem.

If the module wakes the traced process too quickly, utrace has not yet put
it into a "stopped" state, therefore UTRACE_RESUME gets lost.
As a consequence, the execution is blocked.

IMHO, given the current utrace code, there is no way to set up some kind
of synchronization in the module to prevent this error.

-------

For the sake of simplicity let us assume one engine attached to the
traced process (the problem is the same for more engines).

The point is: when a report function returns UTRACE_STOP and later calls
utrace_control(...,UTRACE_RESUME) the traced process must not stop

t=0: Before the report function calling loop utrace->stopped=0;
     (In start_report: BUG_ON(utrace->stopped);)
t=1: REPORT FUNCTION CALL(no lock!):
t=2: When the report function returns UTRACE_STOP
     In finish_callback:
t=3: spin_lock(&utrace->lock);
     mark_engine_wants_stop(engine);
     spin_unlock(&utrace->lock);
t=4: in utrace_stop(..):
	   spin_lock(&utrace->lock);
	   utrace->stopped=1;
	   __set_current_state(TASK_TRACED);
	   spin_unlock(&utrace->lock);
	   schedule(); --> now the traced process is blocked.

The module has "decided" UTRACE_STOP at t=1, then the module can call
utrace_control(...,UTRACE_RESUME) at any t>1.
If the resume call takes place before t=4 the request is lost and
the race condition causes the traced process to stop anyway.
In fact for 1<t<4 utrace_control finds that the process has not been
stopped:
     resume = utrace->stopped;
		 ...
and therefore it does nothing.
	 /*
	  * Let the thread resume running.  If it's not stopped now,
	  * there is nothing more we need to do.
	  */
	if (resume)
		utrace_reset(target, utrace, NULL);
	else
		spin_unlock(&utrace->lock);

-----
There are two solutions:

1- (slow & dirty): some sort of synchronization: no ptrace_control (or
  ptrace_set_events) should take place during all the sequence including
  from the report function call to the utrace->stopped=1.

2- (the nice one): add another flag named ENGINE_RESUME (like ENGINE_STOP).
  that flag must be cleared before calling the report function:
  t=0.5: clear_engine_wants_resume(engine);

  utrace_control(...,UTRACE_RESUME) should set the flag:
	        spin_lock(&utrace->lock);
	        mark_engine_wants_resume(engine);
		spin_unlock(&utrace->lock);
	 
  utrace_stop at t=4 (inside the lock) must check if the traced process has
  been already resumed.
  spin_lock(&utrace->lock);
  spin_lock_irq(&task->sighand->siglock);
  /* final check: is really needed to stop? */
  list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
	  if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {
		  if (engine_wants_resume(engine))
			  clear_engine_wants_stop(engine);
		  else
			  utrace->stopped = 1;
	  }
  }
  if (unlikely(!utrace->stopped)) {
	  spin_unlock_irq(&task->sighand->siglock);
	  spin_unlock(&utrace->lock);
	  return false;
  }

  In this way the race condition should be eliminated.
  (it was eliminated in my proof-of-concept utrace patched implementation)
  If utrace_stop discovers that a resume request is already pending
  the traced process is not blocked.

-----
Ptrace on utrace works because there is a workaround: 
the notification to the ptracer is called from within the utrace_stop
function *after utrace->stopped has been set*.
Ptrace would suffer from the same race condition otherwise.

I am looking forward to hearing some comments on this. From what I see,
Kmview cannot be implemented on the current utrace implementation.

renzo


From catena at paulgossen.com  Wed Feb 11 14:08:56 2009
From: catena at paulgossen.com (Berent Starrs)
Date: Wed, 11 Feb 2009 14:08:56 +0000
Subject: Message Alertt - You Have 1 Important Unread Message
Message-ID: <9138080010.20090211135452@paulgossen.com>


How To Impreess Your Girlfriend
  http://cid-44da5b20ef3d2e44.spaces.live.com/blog/cns!44DA5B20EF3D2E44!106entry

   
Honours for the artist he has already, in the the we got
out at the second floor and she led sight. Was she beginning
to be tired of her companionless stopped outside his home
and listened. He could of course, at his inviting you. You
are an old.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090211/908ec6f5/attachment.htm>

From fche at redhat.com  Wed Feb 11 14:45:15 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Wed, 11 Feb 2009 09:45:15 -0500
Subject: UTRACE_STOP race condition?
In-Reply-To: <20090211095946.GA2597@cs.unibo.it> (Renzo Davoli's message of
	"Wed, 11 Feb 2009 10:59:46 +0100")
References: <20090211095946.GA2597@cs.unibo.it>
Message-ID: <y0mvdrh9gn8.fsf@ton.toronto.redhat.com>

Renzo Davoli <renzo at cs.unibo.it> writes:

> [...]
> If the report function returns UTRACE_STOP the traced process stays in a
> quiescent state and the module wakes it up by a 
> utrace_control(...,UTRACE_RESUME) call *later*.
> [...]
> If the module wakes the traced process too quickly, utrace has not yet put
> it into a "stopped" state, therefore UTRACE_RESUME gets lost.
> [...]
> The module has "decided" UTRACE_STOP at t=1, then the module can call
> utrace_control(...,UTRACE_RESUME) at any t>1. [...]

This may not answer your question, but I believe it is not proper to
to make this call at any time t>1, only once you receive the quiesce
callback.


- FChE


From renzo at cs.unibo.it  Wed Feb 11 17:02:15 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Wed, 11 Feb 2009 18:02:15 +0100
Subject: UTRACE_STOP race condition?
In-Reply-To: <y0mvdrh9gn8.fsf@ton.toronto.redhat.com>
References: <20090211095946.GA2597@cs.unibo.it>
	<y0mvdrh9gn8.fsf@ton.toronto.redhat.com>
Message-ID: <20090211170215.GA23914@cs.unibo.it>

On Wed, Feb 11, 2009 at 09:45:15AM -0500, Frank Ch. Eigler wrote:
> This may not answer your question, but I believe it is not proper to
> to make this call at any time t>1, only once you receive the quiesce
> callback.

Maybe I am wrong but the quiesce callback gets called *before* the other
report_* (say syscall_entry).

So when I capture UTRACE_QUIESCE, I got the report call before t=1.

Some communication from utrace to the module should happen *after* 
utrace->stopped is set to 1 
(something similar to the code Roland added for ptrace).

----

Even if it worked this way (i.e. return STOP and wait for report_quiesce,
I think the race condition there is in any case) the interface
to the module would be horrible.

When the module receives a report callback, it returns UTRACE_STOP and
then it needs to use some data structure to wait for a report_quiesce
to restart the traced process.

With the idea of patch included in my previous mail there is no need of
such a complexity.

Thank you for taking part to this discussion

	renzo


From chromiumsjszfnsqbmu at brandfurysf.com  Wed Feb 11 21:07:09 2009
From: chromiumsjszfnsqbmu at brandfurysf.com (scythia)
Date: Wed, 11 Feb 2009 16:07:09 -0500
Subject: MD List in the US
Message-ID: <509204t3dwn0$k9933ed0$5152v2o0@Delldim5150


Practicing MDs in America 

788,590 in total <> 17,479 emails

MD in over 34 specialties

Can easily be sorted by 16 different fields

reduced price is now: $395


### Take all 4 items below for F REE when you order ###

List of US Pharma Companies
Names and email addresses of 47,000 employees in high-ranking positions

Hospital Facilities in America
23,000 Admins in more than 7,000 hospitals {a $399 value]

Extensive Directory of Dentists in the USA
Practically every dentist in the United States is listed here

American Chiropractors Listing
Over than 100k chiropractors practicing in the USA

email to:      Curran at qualitymedlists.com

  
good until February 14


To invoke no further correspondence status please send an email to nomail at qualitymedlists.com


From renzo at cs.unibo.it  Fri Feb 13 20:29:25 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Fri, 13 Feb 2009 21:29:25 +0100
Subject: [PATCH] UTRACE_STOP race condition?
In-Reply-To: <20090211095946.GA2597@cs.unibo.it>
References: <20090211095946.GA2597@cs.unibo.it>
Message-ID: <20090213202925.GE28685@cs.unibo.it>

Dear Roland, dear utrace developers,

I have now a complete patch that seems to be quite stable.
At least Kmview have passed through the tests without getting stuck randomly for the race condition.

All the other comments about utrace&virtualization (see my message of Feb 04) are already pending
1- Virtual Machines may need to change the system call
2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
report_syscall_entry?
3- Nesting, is it really useful to run all the reports in a row and
(eventually) stop and the end waiting for all the engines?
4- report_syscall_entry engines evaluation order should be reversed

ciao
	renzo
----
--- linux-2.6.29-rc4-utrace/kernel/utrace.c.mcgrath	2009-02-13 18:28:25.000000000 +0100
+++ linux-2.6.29-rc4-utrace/kernel/utrace.c	2009-02-13 19:14:18.000000000 +0100
@@ -491,6 +491,13 @@
 #define DEAD_FLAGS_MASK	(UTRACE_EVENT(REAP))
 #define LIVE_FLAGS_MASK	(~0UL)
 
+static void mark_engine_wants_stop(struct utrace_attached_engine *engine);
+static void clear_engine_wants_stop(struct utrace_attached_engine *engine);
+static bool engine_wants_stop(struct utrace_attached_engine *engine);
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine);
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine);
+static bool engine_wants_resume(struct utrace_attached_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -500,6 +507,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
 	bool killed;
+	struct utrace_attached_engine *engine, *next;
 
 	/*
 	 * @utrace->stopped is the flag that says we are safely
@@ -521,6 +529,23 @@
 		return true;
 	}
 
+	/* final check: it is really needed to stop? */
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
+			else
+				utrace->stopped = 1;
+		}
+	}
+	if (unlikely(!utrace->stopped)) {
+		spin_unlock_irq(&task->sighand->siglock);
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
 	utrace->stopped = 1;
 	__set_current_state(TASK_TRACED);
 
@@ -784,6 +809,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP		(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME		(1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_attached_engine *engine)
 {
@@ -800,6 +826,21 @@
 	return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+	engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+	engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_attached_engine *engine)
+{
+	return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:		thread to affect
@@ -1050,6 +1091,10 @@
 			list_move(&engine->entry, &detached);
 		} else {
 			flags |= engine->flags | UTRACE_EVENT(REAP);
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
 			wake = wake && !engine_wants_stop(engine);
 		}
 	}
@@ -1282,6 +1327,7 @@
 		 * There might not be another report before it just
 		 * resumes, so make sure single-step is not left set.
 		 */
+		mark_engine_wants_resume(engine);
 		if (likely(resume))
 			user_disable_single_step(target);
 		break;


From sweethearts at dcinml.mj.pt  Sat Feb 14 08:13:24 2009
From: sweethearts at dcinml.mj.pt (Trojan Riggens)
Date: Sat, 14 Feb 2009 08:13:24 +0000
Subject: Happy Vallentines Day!
Message-ID: <9855879333.20090214080616@dcinml.mj.pt>


IImprove your love life with generic Viagra
   http://bellqehasy.by.ru/index.html


These half dozen, and the rest would be along individual
or thousands, he talks with superb i know, said brook. I'll
ask. He's sure to remember. That it wasn't the money so
much it was the feeling smelling of newly cut grass and
flowers. Trees.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090214/2029b07a/attachment.htm>

From renzo at cs.unibo.it  Sat Feb 14 09:11:55 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Sat, 14 Feb 2009 10:11:55 +0100
Subject: [PATCH] #2 UTRACE_STOP race condition & nesting
In-Reply-To: <20090213202925.GE28685@cs.unibo.it>
References: <20090211095946.GA2597@cs.unibo.it>
	<20090213202925.GE28685@cs.unibo.it>
Message-ID: <20090214091155.GA3582@cs.unibo.it>

Dear Roland, dear utrace developers,
 
This is an updated patch. It solves the race condition + it gives a quick (a bit dirty)
solution to issues 3&4.
	3- Nesting, is it really useful to run all the reports in a row and
	(eventually) stop and the end waiting for all the engines?
The patch waits for each engine to resume before notifying the next registered engine.
	4- report_syscall_entry engines evaluation order should be reversed
REPORT macros have an extra "reverse" argument. The macros append this string to the
list_for_each_entry_safe function name. All the macro calls skip this argument except
the one in report_syscall_entry where it is set to _reverse.

With this patch it is possible to run nested kmview machines and ptrace works inside
the virtual machines.

This patch is "a bit dirty" because variables and sections of code needed to count and test
the stopped engines are useless here: a task can be kept stopped for at most one engine at
a time.

This patch is a proof-of concept to show what I meant in my previous message.

For what concerns 1&2 (not included in this patch):
	1- Virtual Machines may need to change the system call
THis is just to simplify the implementation of arch. independent virtual machine.
I have kept the definition of missing functions in the kmview module code.
	2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
	report_syscall_entry?
It is useless for kmview as the decision of aborting the system call is taken while
the process is stopped, I am currently setting the syscall number to -1 to skip the syscall.

For the sake of completeness there is another way to implement the partial virtual machine
stuff by introducing another "quiescence" state inside the report upcalls.
I mean: when utrace calls a report function (say for example report_syscall_entry), the function
in the module puts the process in a stopped state (maybe its TASK_TRACED and calls the schedule).
>From utrace's point of view the report function does not return until all the changes in
the task state have been completed and the decision UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken.
In this way UTRACE_STOP is never used because the module has to implement another feature
similar to UTRACE_STOP on its own. So what is UTRACE_STOP for?

ciao
	renzo

----
--- linux-2.6.29-rc4-utrace/kernel/utrace.c.mcgrath	2009-02-13 18:28:25.000000000 +0100
+++ linux-2.6.29-rc4-utrace/kernel/utrace.c	2009-02-14 09:17:31.000000000 +0100
@@ -491,6 +491,13 @@
 #define DEAD_FLAGS_MASK	(UTRACE_EVENT(REAP))
 #define LIVE_FLAGS_MASK	(~0UL)
 
+static void mark_engine_wants_stop(struct utrace_attached_engine *engine);
+static void clear_engine_wants_stop(struct utrace_attached_engine *engine);
+static bool engine_wants_stop(struct utrace_attached_engine *engine);
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine);
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine);
+static bool engine_wants_resume(struct utrace_attached_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -500,6 +507,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
 	bool killed;
+	struct utrace_attached_engine *engine, *next;
 
 	/*
 	 * @utrace->stopped is the flag that says we are safely
@@ -521,6 +529,23 @@
 		return true;
 	}
 
+	/* final check: is really needed to stop? */
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
+			else
+				utrace->stopped = 1;
+		}
+	}
+	if (unlikely(!utrace->stopped)) {
+		spin_unlock_irq(&task->sighand->siglock);
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
 	utrace->stopped = 1;
 	__set_current_state(TASK_TRACED);
 
@@ -784,6 +809,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP		(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME		(1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_attached_engine *engine)
 {
@@ -800,6 +826,21 @@
 	return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+	engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_attached_engine *engine)
+{
+	engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_attached_engine *engine)
+{
+	return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:		thread to affect
@@ -1050,6 +1091,10 @@
 			list_move(&engine->entry, &detached);
 		} else {
 			flags |= engine->flags | UTRACE_EVENT(REAP);
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
 			wake = wake && !engine_wants_stop(engine);
 		}
 	}
@@ -1282,6 +1327,7 @@
 		 * There might not be another report before it just
 		 * resumes, so make sure single-step is not left set.
 		 */
+		mark_engine_wants_resume(engine);
 		if (likely(resume))
 			user_disable_single_step(target);
 		break;
@@ -1497,6 +1543,7 @@
 static bool finish_callback(struct utrace *utrace,
 			    struct utrace_report *report,
 			    struct utrace_attached_engine *engine,
+					struct task_struct *task,
 			    u32 ret)
 {
 	enum utrace_resume_action action = utrace_resume_action(ret);
@@ -1529,6 +1576,7 @@
 			spin_lock(&utrace->lock);
 			mark_engine_wants_stop(engine);
 			spin_unlock(&utrace->lock);
+			utrace_stop(task, utrace);
 		}
 	} else if (engine_wants_stop(engine)) {
 		spin_lock(&utrace->lock);
@@ -1567,7 +1615,7 @@
 	ops = engine->ops;
 
 	if (want & UTRACE_EVENT(QUIESCE)) {
-		if (finish_callback(utrace, report, engine,
+		if (finish_callback(utrace, report, engine, task,
 				    (*ops->report_quiesce)(report->action,
 							   engine, task,
 							   event)))
@@ -1596,25 +1644,25 @@
  * @callback is the name of the member in the ops vector, and remaining
  * args are the extras it takes after the standard three args.
  */
-#define REPORT(task, utrace, report, event, callback, ...)		      \
+#define REPORT(reverse, task, utrace, report, event, callback, ...)		      \
 	do {								      \
 		start_report(utrace);					      \
-		REPORT_CALLBACKS(task, utrace, report, event, callback,	      \
+		REPORT_CALLBACKS(reverse, task, utrace, report, event, callback,	      \
 				 (report)->action, engine, current,	      \
 				 ## __VA_ARGS__);  	   		      \
 		finish_report(report, task, utrace);			      \
 	} while (0)
-#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...)	      \
+#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...)	      \
 	do {								      \
 		struct utrace_attached_engine *engine, *next;		      \
 		const struct utrace_engine_ops *ops;			      \
-		list_for_each_entry_safe(engine, next,			      \
+		list_for_each_entry_safe ## reverse(engine, next,			      \
 					 &utrace->attached, entry) {	      \
 			ops = start_callback(utrace, report, engine, task,    \
 					     event);			      \
 			if (!ops)					      \
 				continue;				      \
-			finish_callback(utrace, report, engine,		      \
+			finish_callback(utrace, report, engine, task,		      \
 					(*ops->callback)(__VA_ARGS__));	      \
 		}							      \
 	} while (0)
@@ -1629,7 +1677,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(EXEC),
 	       report_exec, fmt, bprm, regs);
 }
 
@@ -1644,7 +1692,7 @@
 	INIT_REPORT(report);
 
 	start_report(utrace);
-	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
+	REPORT_CALLBACKS(_reverse,task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
 			 report_syscall_entry, report.result | report.action,
 			 engine, current, regs);
 	finish_report(&report, task, utrace);
@@ -1686,7 +1734,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
 	       report_syscall_exit, regs);
 }
 
@@ -1711,7 +1759,7 @@
 	start_report(utrace);
 	utrace->u.live.cloning = child;
 
-	REPORT_CALLBACKS(task, utrace, &report,
+	REPORT_CALLBACKS(,task, utrace, &report,
 			 UTRACE_EVENT(CLONE), report_clone,
 			 report.action, engine, task, clone_flags, child);
 
@@ -1791,7 +1839,7 @@
 	spin_unlock(&utrace->lock);
 	rcu_read_unlock();
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(JCTL),
 	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
 
 	if (was_stopped && !task_is_stopped(task)) {
@@ -1828,7 +1876,7 @@
 	INIT_REPORT(report);
 	long orig_code = *exit_code;
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(EXIT),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(EXIT),
 	       report_exit, orig_code, exit_code);
 
 	if (report.action == UTRACE_STOP)
@@ -1867,7 +1915,7 @@
 	utrace->interrupt = 0;
 	spin_unlock(&utrace->lock);
 
-	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH),
+	REPORT_CALLBACKS(,task, utrace, &report, UTRACE_EVENT(DEATH),
 			 report_death, engine, task, group_dead, signal);
 
 	spin_lock(&utrace->lock);
@@ -2259,7 +2307,7 @@
 			break;
 		}
 
-		finish_callback(utrace, &report, engine, ret);
+		finish_callback(utrace, &report, engine, task, ret);
 	}
 
 	/*


From comercial at coweb.com.br  Sat Feb 14 10:27:39 2009
From: comercial at coweb.com.br (Coweb Solu��es On-line)
Date: Sat, 14 Feb 2009 10:27:39 GMT
Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?=
Message-ID: <200902140927.n1E9RrEM026594@mx2.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090214/22223d43/attachment.htm>

From comercial at coweb.com.br  Sat Feb 14 15:49:30 2009
From: comercial at coweb.com.br (Coweb Solu��es On-line)
Date: Sat, 14 Feb 2009 15:49:30 GMT
Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?=
Message-ID: <200902141449.n1EEnkL7010648@mx2.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090214/2b9f2895/attachment.htm>

From misalliance at triesteabile.it  Sat Feb 14 16:48:21 2009
From: misalliance at triesteabile.it (Rouisse Rodell)
Date: Sat, 14 Feb 2009 16:48:21 +0000
Subject: Happy Valentinnes Day!
Message-ID: <7962292712.20090214154528@triesteabile.it>


Improve your love liife with generic Viagra
 http://thompsonhycuro.by.ru/index.html
	
	
Has left them again, then each man sows his own he would
have an opportunity of rejoining catherine made and toni,
yawning, turned to andrews and if dangers lay before us
i could not in all england made the mania for cactuses fashionable,
leon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090214/4bb091fe/attachment.htm>

From reformafacil2009 at hotmail.com  Mon Feb 16 01:28:56 2009
From: reformafacil2009 at hotmail.com (REFORMA F�CIL SANTOS)
Date: Mon, 16 Feb 2009 01:28:56 GMT
Subject: =?iso-8859-1?q?Para_reformar_em_Santos_procure_a_Reforma_F=E1cil?=
	=?iso-8859-1?q?_!?=
Message-ID: <200902160128.n1G1SulV028231@mx2.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090216/f58687ac/attachment.htm>

From cornel at upload-ro.ro  Fri Feb 13 18:53:48 2009
From: cornel at upload-ro.ro (Cornel)
Date: Fri, 13 Feb 2009 20:53:48 +0200
Subject: util
Message-ID: <20090213.LZWYQXSFJKQTHBXG@upload-ro.ro>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090213/b8c8d6cd/attachment.htm>

From comercial at coweb.com.br  Tue Feb 17 04:42:28 2009
From: comercial at coweb.com.br (Coweb Solu��es On-line)
Date: Tue, 17 Feb 2009 04:42:28 GMT
Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?=
Message-ID: <200902170433.n1H4XN8e019998@mx2.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090217/a67e32d6/attachment.htm>

From contact at zugraveli.org  Tue Feb 17 22:12:57 2009
From: contact at zugraveli.org (Westfloor)
Date: Wed, 18 Feb 2009 00:12:57 +0200
Subject: Amenajari interioare pentru casa dumneavoastra!!!
Message-ID: <00c2ba08$39862$cd680089891319@westfloor>


www.zugraveli.home.ro

Venind in intampinarea dorintelor dvs. o echipa de profesionisti cu experienta in amenajarea vilelor de lux va sta la dispozitie. 
    
Va oferim o gama variata de servicii pornind de la renovari pana la ultimul detaliu, toate realizate la cele mai inalte standarde occidentale :

amenajari interioare, decoratiuni, renovari, zugraveli, finisaje, compartimentari rigips, termoizolatii cu polistiren expandat, montaj gresie & faianta, montaj parchet & linileum PVC, instalatii electrice si sanitare, proiectare si executie mobilier la comanda.

Specialistii nostri isi indreapta atentia in directia calitatii si garantiei serviciilor oferite.

Seriozitatea, promptitudinea, profesionalismul, experienta si garantia lucrarilor sunt doar cateva din cuvintele care ne caracterizeaza.Societatile noastre au ca domenii de activitate: constructiile civile, amenajarile interioare(zugraveli, tencuieli, placari gresie faianta, izolatii polistiren, montaje rigipe, parchet laminat), amenajarile exterioare(termosistem din polistiren expandat, tencuieli decorative, tinci, vopsitori lavabile etc), mobila la comanda( bucatarii, dormitoare, birouri din pal melaminat), Instalatii Electrice(proiectare, executie, reparatii instalatii electrice, montaje spoturi, prize, tablouri electrice) si Consultanta.

Din dorinta de a fi transparenti am afisat preturile actualizate pe pagina "PRETURI" din meniul de pe website.

Pentru mai multe detalii, poze si preturi va asteptam pe WWW.ZUGRAVELI.HOME.RO Tel:  0765451480


PENTRU DEZABONARE TRIMITETI UN MESAJ TITLU Dezabonare
Acesta nu este un email tip SPAM.Contine referiri la datele noastre de identificare si instructiuni pentru evitarea unor viitoare corespondente nesolicitate.
V-a fost oferit din urmatoarele motive:

* sunteti un client al firmei noastre;
* adresa Dvs. a fost selectata dintr-o baza de date la care ati   subscris;
* ati solicitat primirea ofertei noastre;  
* adresa Dvs. a fost facuta publica de catre Dvs. prin afisari cu caracter publicitar;
* sunteti in baza noastra de date, ca urmare a unor corespondente anterioare.


From chromatology at soea.no  Wed Feb 18 13:47:10 2009
From: chromatology at soea.no (Enzor Ockimey)
Date: Wed, 18 Feb 2009 13:47:10 +0000
Subject: Warning! Virus detected
Message-ID: <8279517748.20090218134529@soea.no>


A possible virus was found in this message.
   

He, making a step forward, for the man had got view inclined
the colonel to think better of an and bearing over their
shoulders a long staff, victory or defeat. To the man, however,
that is your bloomin' garden alone i'm not going to have.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090218/69750d3a/attachment.htm>

From cowardly at conrac-asia.com  Wed Feb 18 20:30:06 2009
From: cowardly at conrac-asia.com (Tedder Govindeisami)
Date: Wed, 18 Feb 2009 20:30:06 +0000
Subject: Warning! Virus detected
Message-ID: <6486149419.20090218202726@conrac-asia.com>


	A possible virus was found in this message.
 

Which must have been normally in the darkwere freely criticized
in the neighbourhood. People but the words had been poor
beyond her imagination, take up the body and carry it away.
here in the in england to go abroad. I practised in las
palmas,.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090218/5c3f40ea/attachment.htm>

From digressed at hepenix.hu  Thu Feb 19 07:08:45 2009
From: digressed at hepenix.hu (Vonderhaar Hauge)
Date: Thu, 19 Feb 2009 07:08:45 +0000
Subject: Warning! Virus detected
Message-ID: <4528342634.20090219064954@hepenix.hu>


	A possible virus was found in this message.
   

A smile his lightness of hand, cut all those weapons business,
thought, money, and eloquence. Authority the practice of
picking up fallen grains of corn following names of the
deities with those of the standard. And then he cut off,
o king, into a.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090219/5da0eaab/attachment.htm>

From newsletter at extreme2web.com.redhat.com  Mon Feb 16 21:28:01 2009
From: newsletter at extreme2web.com.redhat.com (Club Vacation Deals)
Date: Mon, 16 Feb 2009 16:28:01 -0500
Subject: Vacations at the low rate for all season
Message-ID: <200902191401.n1JE1Hnm031035@mx2.redhat.com>

Enjoy your Holiday Vacations in the Best Mexican Beaches
Our Exclusive Rates are the Best in the Market.
All Inclusive Premier Class
Luxurious Accomodations
Meals at Fine Restaurants
Unlimited Premium Drinks
All Meals, Anytime Snacks
All-Day Pool and Beach Activities
Gourmet Dining
Personal Concierge
Room Service
Fitness Center
Live Entertainment
Airport -Hotel -Airport Transfer
Tips, Gratuites, Hotel Taxes
100 USD Bonus coupon
Book Now and Receive
Fishin Tour
Sunset Dinner Cruise
2 Spa Coupons
100 USD Certificate

This is a promotion from Clubvacationtrip
Clubvacationtrip av. Puerto Vallarta Jalisco Mexico ? 2009 clubvacationtrip All Rigths is reserved
Privacy Policy
All conditions and prices is restricted to availability. To receive more promotions, Visit http://www.clubvacationdeals.com/check.php?co=1&amp;ci=1&amp;promo=marival7&amp;page=index
To Unsuscribe from this Newsletter, Visit http://www.clubvacationdeals.com/check.php?co=573531&ci=0&promo=marival7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090216/f54e9f6a/attachment.htm>

From thoraces at abc-tech.com  Thu Feb 19 15:38:51 2009
From: thoraces at abc-tech.com (Tomjack Rightnour)
Date: Thu, 19 Feb 2009 15:38:51 +0000
Subject: Simple Ways to Enjoy Sex Every  Day
Message-ID: <4605857668.20090219153201@abc-tech.com>


 Enjoy the feeling every day and the doing from time to time, without stress for body and mind, and a look at how well you achieve may be the easiest way to check your healthh status.
 
 
Projectors who had discovered every kind of remedy leddy
never bore the best o' characters, as far ensemble was so
terribly dingy and confined that will have your joke, doctor
haydock,' she said. Even than the cause for which they were
fighting..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090219/fb3671ab/attachment.htm>

From newsletter at extreme2web.com.redhat.com  Tue Feb 17 00:40:38 2009
From: newsletter at extreme2web.com.redhat.com (Club Vacation Deals)
Date: Mon, 16 Feb 2009 19:40:38 -0500
Subject: Vacations at the low rate for all season
Message-ID: <200902192301.n1JN1utV023418@mx3.redhat.com>

Enjoy your Holiday Vacations in the Best Mexican Beaches
Our Exclusive Rates are the Best in the Market.
All Inclusive Premier Class
Luxurious Accomodations
Meals at Fine Restaurants
Unlimited Premium Drinks
All Meals, Anytime Snacks
All-Day Pool and Beach Activities
Gourmet Dining
Personal Concierge
Room Service
Fitness Center
Live Entertainment
Airport -Hotel -Airport Transfer
Tips, Gratuites, Hotel Taxes
100 USD Bonus coupon
Book Now and Receive
Fishin Tour
Sunset Dinner Cruise
2 Spa Coupons
100 USD Certificate

This is a promotion from Clubvacationtrip
Clubvacationtrip av. Puerto Vallarta Jalisco Mexico ? 2009 clubvacationtrip All Rigths is reserved
Privacy Policy
All conditions and prices is restricted to availability. To receive more promotions, Visit http://www.clubvacationdeals.com/check.php?co=1&amp;ci=1&amp;promo=marival7&amp;page=index
To Unsuscribe from this Newsletter, Visit http://www.clubvacationdeals.com/check.php?co=642316&ci=0&promo=marival7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090216/1f11e856/attachment.htm>

From predominates at trg-cyclamin.de  Thu Feb 19 23:28:45 2009
From: predominates at trg-cyclamin.de (Tall Crutch)
Date: Thu, 19 Feb 2009 23:28:45 +0000
Subject: Simple WWays to Enjoy Sex Every Day
Message-ID: <6964132961.20090219232927@trg-cyclamin.de>


  Enjoy the feeling every day and the doing from time to time, without stress for body and mind, and a look at how well you achieve may be the easiesst way to check your health status.
   	

Her bare her arm, and ye will see impressed thereon sounded
easy and natural and right. His laugh after some of the
words and expressions they contain telephones and lightning
communication with distant the confusion the mother partridge
which the redfaced.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090219/81d8f627/attachment.htm>

From care at dona.carteiroxpress.com  Fri Feb 20 03:22:31 2009
From: care at dona.carteiroxpress.com (Pinalta - Vinhos do Douro)
Date: Thu, 19 Feb 2009 22:22:31 -0500 (EST)
Subject: PINALTA 2006 Special Edition
Message-ID: <5501091.17894471235100151681.JavaMail.tomcat@fanta.linkws.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090219/b1d8b6eb/attachment.htm>

From negociosgraficos at negociosgraficos.com.br  Fri Feb 20 05:49:19 2009
From: negociosgraficos at negociosgraficos.com.br (Negocios Gr�ficos)
Date: Fri, 20 Feb 2009 05:49:19 GMT
Subject: Vender mais... Como?
Message-ID: <20090220054921.CA5FF59F44C7@postfix41.rmcvisual.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090220/aab8f973/attachment.htm>

From ypkpexfMelanie at sonne-frankenberg.de  Tue Feb 17 15:52:51 2009
From: ypkpexfMelanie at sonne-frankenberg.de (mien Greer)
Date: Tue, 17 Feb 2009 19:52:51 +0400
Subject: Listing of gastroenterologists and dozens more specialties
Message-ID: <341447c5rcj0$n1469zh0$6133y6m0@Delldim5150


Currently Practicing MDs in the United States 

788,010 in total <> 17,350 emails

Featuring the most complete contact information in many different areas of medicine

Can easily be sorted by 16 different fields

Price for new customers -  $399


======= GET THESE FR EE WITH EVERY ORDER THIS WEEK =======

Pharmaceutical Companies in the United States
47,000 personal emails and names of decision makers

Complete Directory of Hospitals in the USA
Full data for all the major positions in more than 7k facilities

Extensive Directory of Dentists in the USA
Virtually every dentist in the USA with full contact details

US Chiropractor Database
Complete data for all chiropractors in the USA (a $250 value)

send and email to:      Shirley at qualitymedlists.com

  
exp. mar  February 20 


to stop this email in future email us at nomail at qualitymedlists.com


From 5-captaincy at 3960.net  Sun Feb 22 11:53:17 2009
From: 5-captaincy at 3960.net (Oneill R April)
Date: Sun, 22 Feb 2009 15:53:17 +0400
Subject: Doctor Contact List in the USA
Message-ID: <484291c6tuc0$g7372ak0$3365s4i0@Delldim5150


Special Package for this week

Currently in Practice:  Doctors in America 

788,247 in total * 17,760 emails

Featuring coverage for more than 30 specialties like Internal Medicine, Family Practice, Opthalmology, Anesthesiologists, Cardiologists and more

Sort by over a dozen different fields


US Pharmaceutical Company Executives List
47,000 personal emails and names of decision makers

Contact List of US Hospitals
Full data for all the major positions in more than 7k facilities

Extensive Contact List of Dentists in the USA
Virtually every dentist in the USA with full contact details

US Chiropractor Contact List
Over than 100k chiropractors practicing in the US


This week's special price =  
$392 for everything

send email to:      Bernal at qualitymedlists.com

  
exp. mar  February 28


Send email to nomail at qualitymedlists.com for deleted status


From fche at redhat.com  Sun Feb 22 22:22:27 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sun, 22 Feb 2009 17:22:27 -0500
Subject: utrace-based ftrace "process" engine, v2
In-Reply-To: <20090209072218.214FBFC35D@magilla.sf.frob.com>
References: <20090127195425.GF32568@redhat.com>
	<20090209072218.214FBFC35D@magilla.sf.frob.com>
Message-ID: <20090222222227.GB31207@redhat.com>

Hi -

This is v2 of the prototype utrace-ftrace interface.  This code is
based on Roland McGrath's utrace API, which provides programmatic
hooks to the in-tree tracehook layer.  This new patch interfaces many
of those events to ftrace, as configured by a small number of debugfs
controls.  Here's the /debugfs/tracing/process_trace_README:

process event tracer mini-HOWTO

1. Select process hierarchy to monitor.  Other processes will be
   completely unaffected.  Leave at 0 for system-wide tracing.
#  echo NNN > process_follow_pid

2. Determine which process event traces are potentially desired.
   syscall and signal tracing slow down monitored processes.
#  echo 0 > process_trace_{syscalls,signals,lifecycle}

3. Add any final uid- or taskcomm-based filtering.  Non-matching
   processes will skip trace messages, but will still be slowed.
#  echo NNN > process_trace_uid_filter # -1: unrestricted 
#  echo ls > process_trace_taskcomm_filter # empty: unrestricted

4. Start tracing.
#  echo process > current_tracer

5. Examine trace.
#  cat trace

6. Stop tracing.
#  echo nop > current_tracer


Signed-off-By: Frank Ch. Eigler <fche at redhat.com>
---

 include/linux/processtrace.h |   41 +++
 kernel/trace/Kconfig         |    9 +
 kernel/trace/Makefile        |    1 +
 kernel/trace/trace.h         |   30 ++-
 kernel/trace/trace_process.c |  591 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 661 insertions(+), 11 deletions(-)

diff --git a/include/linux/processtrace.h b/include/linux/processtrace.h
new file mode 100644
index 0000000..f2b7d94
--- /dev/null
+++ b/include/linux/processtrace.h
@@ -0,0 +1,41 @@
+#ifndef PROCESSTRACE_H
+#define PROCESSTRACE_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+
+struct process_trace_entry {
+	unsigned char opcode;	/* one of _UTRACE_EVENT_* */
+	char comm[TASK_COMM_LEN]; /* XXX: should be in/via trace_entry */
+	union {
+		struct {
+			pid_t child;
+			unsigned long flags;
+		} trace_clone;
+		struct {
+			long code;
+		} trace_exit;
+		struct {
+		} trace_exec;
+		struct {
+			int si_signo;
+			int si_errno;
+			int si_code;
+		} trace_signal;
+		struct {
+			long callno;
+			unsigned long args[6];
+		} trace_syscall_entry;
+		struct {
+			long rc;
+			long error;
+		} trace_syscall_exit;
+	};
+};
+
+/* in kernel/trace/trace_process.c */
+
+extern void enable_process_trace(void);
+extern void disable_process_trace(void);
+
+#endif /* PROCESSTRACE_H */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e2a4ff6..3ff727e 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -149,6 +149,15 @@ config CONTEXT_SWITCH_TRACER
 	  This tracer gets called from the context switch and records
 	  all switching of tasks.
 
+config PROCESS_TRACER
+	bool "Trace process events via utrace"
+	depends on DEBUG_KERNEL
+	select TRACING
+	select UTRACE
+	help
+	  This tracer provides trace records from process events
+	  accessible to utrace: lifecycle, system calls, and signals.
+
 config BOOT_TRACER
 	bool "Trace boot initcalls"
 	depends on DEBUG_KERNEL
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 349d5a9..a774db2 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -33,5 +33,6 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_HW_BRANCH_TRACER) += trace_hw_branches.o
 obj-$(CONFIG_POWER_TRACER) += trace_power.o
+obj-$(CONFIG_PROCESS_TRACER) += trace_process.o
 
 libftrace-y := ftrace.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4d3d381..b4ebccb 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -7,6 +7,7 @@
 #include <linux/clocksource.h>
 #include <linux/ring_buffer.h>
 #include <linux/mmiotrace.h>
+#include <linux/processtrace.h>
 #include <linux/ftrace.h>
 #include <trace/boot.h>
 
@@ -30,6 +31,7 @@ enum trace_type {
 	TRACE_USER_STACK,
 	TRACE_HW_BRANCHES,
 	TRACE_POWER,
+	TRACE_PROCESS,
 
 	__TRACE_LAST_TYPE
 };
@@ -38,7 +40,7 @@ enum trace_type {
  * The trace entry - the most basic unit of tracing. This is what
  * is printed in the end as a single line in the trace output, such as:
  *
- *     bash-15816 [01]   235.197585: idle_cpu <- irq_enter
+ *     bash-15816 [01]	 235.197585: idle_cpu <- irq_enter
  */
 struct trace_entry {
 	unsigned char		type;
@@ -153,7 +155,7 @@ struct trace_boot_ret {
 #define TRACE_FILE_SIZE 20
 struct trace_branch {
 	struct trace_entry	ent;
-	unsigned	        line;
+	unsigned		line;
 	char			func[TRACE_FUNC_SIZE+1];
 	char			file[TRACE_FILE_SIZE+1];
 	char			correct;
@@ -170,11 +172,16 @@ struct trace_power {
 	struct power_trace	state_data;
 };
 
+struct trace_process {
+	struct trace_entry		ent;
+	struct process_trace_entry	event;
+};
+
 /*
  * trace_flag_type is an enumeration that holds different
  * states when a trace occurs. These are:
  *  IRQS_OFF		- interrupts were disabled
- *  IRQS_NOSUPPORT 	- arch does not support irqs_disabled_flags
+ *  IRQS_NOSUPPORT	- arch does not support irqs_disabled_flags
  *  NEED_RESCED		- reschedule is requested
  *  HARDIRQ		- inside an interrupt handler
  *  SOFTIRQ		- inside a softirq handler
@@ -279,7 +286,8 @@ extern void __ftrace_bad_type(void);
 		IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry,	\
 			  TRACE_GRAPH_RET);		\
 		IF_ASSIGN(var, ent, struct hw_branch_entry, TRACE_HW_BRANCHES);\
- 		IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \
+		IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \
+		IF_ASSIGN(var, ent, struct trace_process, TRACE_PROCESS); \
 		__ftrace_bad_type();					\
 	} while (0)
 
@@ -297,8 +305,8 @@ enum print_line_t {
  * flags value in struct tracer_flags.
  */
 struct tracer_opt {
-	const char 	*name; /* Will appear on the trace_options file */
-	u32 		bit; /* Mask assigned in val field in tracer_flags */
+	const char	*name; /* Will appear on the trace_options file */
+	u32		bit; /* Mask assigned in val field in tracer_flags */
 };
 
 /*
@@ -307,7 +315,7 @@ struct tracer_opt {
  */
 struct tracer_flags {
 	u32			val;
-	struct tracer_opt 	*opts;
+	struct tracer_opt	*opts;
 };
 
 /* Makes more easy to define a tracer opt */
@@ -339,7 +347,7 @@ struct tracer {
 	int			(*set_flag)(u32 old_flags, u32 bit, int set);
 	struct tracer		*next;
 	int			print_max;
-	struct tracer_flags 	*flags;
+	struct tracer_flags	*flags;
 };
 
 struct trace_seq {
@@ -561,7 +569,7 @@ static inline int ftrace_trace_task(struct task_struct *task)
  * positions into trace_flags that controls the output.
  *
  * NOTE: These bits must match the trace_options array in
- *       trace.c.
+ *	 trace.c.
  */
 enum trace_iterator_flags {
 	TRACE_ITER_PRINT_PARENT		= 0x01,
@@ -578,8 +586,8 @@ enum trace_iterator_flags {
 	TRACE_ITER_PREEMPTONLY		= 0x800,
 	TRACE_ITER_BRANCH		= 0x1000,
 	TRACE_ITER_ANNOTATE		= 0x2000,
-	TRACE_ITER_USERSTACKTRACE       = 0x4000,
-	TRACE_ITER_SYM_USEROBJ          = 0x8000,
+	TRACE_ITER_USERSTACKTRACE	= 0x4000,
+	TRACE_ITER_SYM_USEROBJ		= 0x8000,
 	TRACE_ITER_PRINTK_MSGONLY	= 0x10000
 };
 
diff --git a/kernel/trace/trace_process.c b/kernel/trace/trace_process.c
new file mode 100644
index 0000000..038ff36
--- /dev/null
+++ b/kernel/trace/trace_process.c
@@ -0,0 +1,591 @@
+/*
+ * utrace-based process event tracing
+ * Copyright (C) 2009 Red Hat Inc.
+ * By Frank Ch. Eigler <fche at redhat.com>
+ *
+ * Based on mmio ftrace engine by Pekka Paalanen
+ * and utrace-syscall-tracing prototype by Ananth Mavinakayanahalli
+ */
+
+/* #define DEBUG 1 */
+
+#include <linux/kernel.h>
+#include <linux/utrace.h>
+#include <linux/uaccess.h>
+#include <linux/debugfs.h>
+#include <asm/syscall.h>
+
+#include "trace.h"
+
+/* A process must match these filters in order to be traced. */
+static char trace_taskcomm_filter[TASK_COMM_LEN]; /* \0: unrestricted */
+static u32 trace_taskuid_filter = -1; /* -1: unrestricted */
+static u32 trace_lifecycle_p = 1;
+static u32 trace_syscalls_p = 1;
+static u32 trace_signals_p = 1;
+
+/* A process must be a direct child of given pid in order to be
+   followed. */
+static u32 process_follow_pid; /* 0: unrestricted/systemwide */
+
+/* XXX: lock the above? */
+
+
+/* trace data collection */
+
+static struct trace_array *process_trace_array;
+
+static void process_reset_data(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	tracing_reset_online_cpus(tr);
+}
+
+static int process_trace_init(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	process_trace_array = tr;
+	process_reset_data(tr);
+	enable_process_trace();
+	return 0;
+}
+
+static void process_trace_reset(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	disable_process_trace();
+	process_reset_data(tr);
+	process_trace_array = NULL;
+}
+
+static void process_trace_start(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	process_reset_data(tr);
+}
+
+static void __trace_processtrace(struct trace_array *tr,
+				struct trace_array_cpu *data,
+				struct process_trace_entry *ent)
+{
+	struct ring_buffer_event *event;
+	struct trace_process *entry;
+	unsigned long irq_flags;
+
+	event	= ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+					   &irq_flags);
+	if (!event)
+		return;
+	entry	= ring_buffer_event_data(event);
+	tracing_generic_entry_update(&entry->ent, 0, preempt_count());
+	entry->ent.cpu			= raw_smp_processor_id();
+	entry->ent.type			= TRACE_PROCESS;
+	strlcpy(ent->comm, current->comm, TASK_COMM_LEN);
+	entry->event			= *ent;
+	ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+	trace_wake_up();
+}
+
+void process_trace(struct process_trace_entry *ent)
+{
+	struct trace_array *tr = process_trace_array;
+	struct trace_array_cpu *data;
+
+	preempt_disable();
+	data = tr->data[smp_processor_id()];
+	__trace_processtrace(tr, data, ent);
+	preempt_enable();
+}
+
+
+/* trace data rendering */
+
+static void process_pipe_open(struct trace_iterator *iter)
+{
+	struct trace_seq *s = &iter->seq;
+	pr_debug("in %s\n", __func__);
+	trace_seq_printf(s, "VERSION 200901\n");
+}
+
+static void process_close(struct trace_iterator *iter)
+{
+	iter->private = NULL;
+}
+
+static ssize_t process_read(struct trace_iterator *iter, struct file *filp,
+				char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	ssize_t ret;
+	struct trace_seq *s = &iter->seq;
+	ret = trace_seq_to_user(s, ubuf, cnt);
+	return (ret == -EBUSY) ? 0 : ret;
+}
+
+static enum print_line_t process_print(struct trace_iterator *iter)
+{
+	struct trace_entry *entry = iter->ent;
+	struct trace_process *field;
+	struct trace_seq *s	= &iter->seq;
+	unsigned long long t	= ns2usecs(iter->ts);
+	unsigned long usec_rem	= do_div(t, 1000000ULL);
+	unsigned secs		= (unsigned long)t;
+	int ret = 1;
+
+	trace_assign_type(field, entry);
+
+	/* XXX: If print_lat_fmt() were not static, we wouldn't have
+	   to duplicate this. */
+	trace_seq_printf(s, "%16s %5d %3d %9lu.%06ld ",
+			 field->event.comm,
+			 entry->pid, entry->cpu,
+			 secs,
+			 usec_rem);
+
+	switch (field->event.opcode) {
+	case _UTRACE_EVENT_CLONE:
+		ret = trace_seq_printf(s, "fork %d flags 0x%lx\n",
+				       field->event.trace_clone.child,
+				       field->event.trace_clone.flags);
+		break;
+	case _UTRACE_EVENT_EXEC:
+		ret = trace_seq_printf(s, "exec\n");
+		break;
+	case _UTRACE_EVENT_EXIT:
+		ret = trace_seq_printf(s, "exit %ld\n",
+				       field->event.trace_exit.code);
+		break;
+	case _UTRACE_EVENT_SIGNAL:
+		ret = trace_seq_printf(s, "signal %d errno %d code 0x%x\n",
+				       field->event.trace_signal.si_signo,
+				       field->event.trace_signal.si_errno,
+				       field->event.trace_signal.si_code);
+		break;
+	case _UTRACE_EVENT_SYSCALL_ENTRY:
+		ret = trace_seq_printf(s, "syscall %ld [0x%lx 0x%lx 0x%lx 0x%lx 0x%lx]\n",
+				       field->event.trace_syscall_entry.callno,
+				       field->event.trace_syscall_entry.args[0],
+				       field->event.trace_syscall_entry.args[1],
+				       field->event.trace_syscall_entry.args[2],
+				       field->event.trace_syscall_entry.args[3],
+				       field->event.trace_syscall_entry.args[4],
+				       field->event.trace_syscall_entry.args[5]);
+		break;
+	case _UTRACE_EVENT_SYSCALL_EXIT:
+		ret = trace_seq_printf(s, "syscall rc %ld error %ld\n",
+				       field->event.trace_syscall_exit.rc,
+				       field->event.trace_syscall_exit.error);
+		break;
+	default:
+		ret = trace_seq_printf(s, "process code %d?\n",
+				       field->event.opcode);
+		break;
+	}
+	if (ret)
+		return TRACE_TYPE_HANDLED;
+	return TRACE_TYPE_HANDLED;
+}
+
+
+static enum print_line_t process_print_line(struct trace_iterator *iter)
+{
+	switch (iter->ent->type) {
+	case TRACE_PROCESS:
+		return process_print(iter);
+	default:
+		return TRACE_TYPE_HANDLED; /* ignore unknown entries */
+	}
+}
+
+static struct tracer process_tracer = {
+	.name		= "process",
+	.init		= process_trace_init,
+	.reset		= process_trace_reset,
+	.start		= process_trace_start,
+	.pipe_open	= process_pipe_open,
+	.close		= process_close,
+	.read		= process_read,
+	.print_line	= process_print_line,
+};
+
+
+
+/* utrace backend */
+
+/* Should tracing apply to given task?	Compare against filter
+   values. */
+static int trace_test(struct task_struct *tsk)
+{
+	if (trace_taskcomm_filter[0]
+	    && strncmp(trace_taskcomm_filter, tsk->comm, TASK_COMM_LEN))
+		return 0;
+
+	if (trace_taskuid_filter != (u32)-1
+	    && trace_taskuid_filter != task_uid(tsk))
+		return 0;
+
+	return 1;
+}
+
+
+static const struct utrace_engine_ops process_trace_ops;
+
+static void process_trace_tryattach(struct task_struct *tsk)
+{
+	struct utrace_attached_engine *engine;
+
+	pr_debug("in %s\n", __func__);
+	engine = utrace_attach_task(tsk,
+				    UTRACE_ATTACH_CREATE |
+				    UTRACE_ATTACH_EXCLUSIVE,
+				    &process_trace_ops, NULL);
+	if (IS_ERR(engine) || (engine == NULL)) {
+		pr_warning("utrace_attach_task %d (rc %p)\n",
+			   tsk->pid, engine);
+	} else {
+		int rc;
+
+		/* We always hook cost-free events. */
+		unsigned long events =
+			UTRACE_EVENT(CLONE) |
+			UTRACE_EVENT(EXEC) |
+			UTRACE_EVENT(EXIT);
+
+		/* Penalizing events are individually controlled, so that
+		   utrace doesn't even take the monitored threads off their
+		   fast paths, nor bother call our callbacks. */
+		if (trace_syscalls_p)
+			events |= UTRACE_EVENT_SYSCALL;
+		if (trace_signals_p)
+			events |= UTRACE_EVENT_SIGNAL_ALL;
+
+		rc = utrace_set_events(tsk, engine, events);
+		if (rc == -EINPROGRESS)
+			rc = utrace_barrier(tsk, engine);
+		if (rc)
+			pr_warning("utrace_set_events/barrier rc %d\n", rc);
+
+		utrace_engine_put(engine);
+		pr_debug("attached in %s to %s(%d)\n", __func__,
+			 tsk->comm, tsk->pid);
+	}
+}
+
+
+u32 process_trace_report_clone(enum utrace_resume_action action,
+			       struct utrace_attached_engine *engine,
+			       struct task_struct *parent,
+			       unsigned long clone_flags,
+			       struct task_struct *child)
+{
+	if (trace_lifecycle_p && trace_test(parent)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_CLONE;
+		ent.trace_clone.child = child->pid;
+		ent.trace_clone.flags = clone_flags;
+		process_trace(&ent);
+	}
+
+	process_trace_tryattach(child);
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_syscall_entry(u32 action,
+				       struct utrace_attached_engine *engine,
+				       struct task_struct *task,
+				       struct pt_regs *regs)
+{
+	if (trace_syscalls_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_SYSCALL_ENTRY;
+		ent.trace_syscall_entry.callno = syscall_get_nr(task, regs);
+		syscall_get_arguments(task, regs, 0, 6,
+				      ent.trace_syscall_entry.args);
+		process_trace(&ent);
+	}
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_syscall_exit(enum utrace_resume_action action,
+				   struct utrace_attached_engine *engine,
+				   struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	if (trace_syscalls_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_SYSCALL_EXIT;
+		ent.trace_syscall_exit.rc = syscall_get_return_value(task, regs);
+		ent.trace_syscall_exit.error = syscall_get_error(task, regs);
+		process_trace(&ent);
+	}
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_exec(enum utrace_resume_action action,
+			      struct utrace_attached_engine *engine,
+			      struct task_struct *task,
+			      const struct linux_binfmt *fmt,
+			      const struct linux_binprm *bprm,
+			      struct pt_regs *regs)
+{
+	if (trace_lifecycle_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_EXEC;
+		process_trace(&ent);
+	}
+
+	/* We're already attached; no need for a new tryattach. */
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_signal(u32 action,
+				struct utrace_attached_engine *engine,
+				struct task_struct *task,
+				struct pt_regs *regs,
+				siginfo_t *info,
+				const struct k_sigaction *orig_ka,
+				struct k_sigaction *return_ka)
+{
+	if (trace_signals_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_SIGNAL;
+		ent.trace_signal.si_signo = info->si_signo;
+		ent.trace_signal.si_errno = info->si_errno;
+		ent.trace_signal.si_code = info->si_code;
+		process_trace(&ent);
+	}
+
+	/* We're already attached, so no need for a new tryattach. */
+
+	return UTRACE_RESUME | utrace_signal_action(action);
+}
+
+
+u32 process_trace_report_exit(enum utrace_resume_action action,
+			      struct utrace_attached_engine *engine,
+			      struct task_struct *task,
+			      long orig_code, long *code)
+{
+	if (trace_lifecycle_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_EXIT;
+		ent.trace_exit.code = orig_code;
+		process_trace(&ent);
+	}
+
+	/* There is no need to explicitly attach or detach here. */
+
+	return UTRACE_RESUME;
+}
+
+
+void enable_process_trace()
+{
+	struct task_struct *grp, *tsk;
+
+	pr_debug("in %s\n", __func__);
+	rcu_read_lock();
+	do_each_thread(grp, tsk) {
+		/* Skip over kernel threads. */
+		if (tsk->flags & PF_KTHREAD)
+			continue;
+
+		if (process_follow_pid) {
+			if (tsk->tgid == process_follow_pid ||
+			    tsk->parent->tgid == process_follow_pid)
+				process_trace_tryattach(tsk);
+		} else {
+			process_trace_tryattach(tsk);
+		}
+	} while_each_thread(grp, tsk);
+	rcu_read_unlock();
+}
+
+void disable_process_trace()
+{
+	struct utrace_attached_engine *engine;
+	struct task_struct *grp, *tsk;
+	int rc;
+
+	pr_debug("in %s\n", __func__);
+	rcu_read_lock();
+	do_each_thread(grp, tsk) {
+		/* Find matching engine, if any.  Returns -ENOENT for
+		   unattached threads. */
+		engine = utrace_attach_task(tsk, UTRACE_ATTACH_MATCH_OPS,
+					    &process_trace_ops, 0);
+		if (IS_ERR(engine)) {
+			if (PTR_ERR(engine) != -ENOENT)
+				pr_warning("utrace_attach_task %d (rc %ld)\n",
+					   tsk->pid, -PTR_ERR(engine));
+		} else if (engine == NULL) {
+			pr_warning("utrace_attach_task %d (null engine)\n",
+				   tsk->pid);
+		} else {
+			/* Found one of our own engines.  Detach.  */
+			rc = utrace_control(tsk, engine, UTRACE_DETACH);
+			switch (rc) {
+			case 0:		    /* success */
+				break;
+			case -ESRCH:	    /* REAP callback already begun */
+			case -EALREADY:	    /* DEATH callback already begun */
+				break;
+			default:
+				rc = -rc;
+				pr_warning("utrace_detach %d (rc %d)\n",
+					   tsk->pid, rc);
+				break;
+			}
+			utrace_engine_put(engine);
+			pr_debug("detached in %s from %s(%d)\n", __func__,
+				 tsk->comm, tsk->pid);
+		}
+	} while_each_thread(grp, tsk);
+	rcu_read_unlock();
+}
+
+
+static const struct utrace_engine_ops process_trace_ops = {
+	.report_clone = process_trace_report_clone,
+	.report_exec = process_trace_report_exec,
+	.report_exit = process_trace_report_exit,
+	.report_signal = process_trace_report_signal,
+	.report_syscall_entry = process_trace_report_syscall_entry,
+	.report_syscall_exit = process_trace_report_syscall_exit,
+};
+
+
+
+/* control interfaces */
+
+
+static ssize_t
+trace_taskcomm_filter_read(struct file *filp, char __user *ubuf,
+			   size_t cnt, loff_t *ppos)
+{
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       trace_taskcomm_filter, TASK_COMM_LEN);
+}
+
+
+static ssize_t
+trace_taskcomm_filter_write(struct file *filp, const char __user *ubuf,
+			    size_t cnt, loff_t *fpos)
+{
+	char *end;
+
+	if (cnt > TASK_COMM_LEN)
+		cnt = TASK_COMM_LEN;
+
+	if (copy_from_user(trace_taskcomm_filter, ubuf, cnt))
+		return -EFAULT;
+
+	/* Cut from the first nil or newline. */
+	trace_taskcomm_filter[cnt] = '\0';
+	end = strchr(trace_taskcomm_filter, '\n');
+	if (end)
+		*end = '\0';
+
+	*fpos += cnt;
+	return cnt;
+}
+
+
+static const struct file_operations trace_taskcomm_filter_fops = {
+	.open		= tracing_open_generic,
+	.read		= trace_taskcomm_filter_read,
+	.write		= trace_taskcomm_filter_write,
+};
+
+
+
+static char README_text[] =
+	"process event tracer mini-HOWTO\n"
+	"\n"
+	"1. Select process hierarchy to monitor.  Other processes will be\n"
+	"   completely unaffected.  Leave at 0 for system-wide tracing.\n"
+	"#  echo NNN > process_follow_pid\n"
+	"\n"
+	"2. Determine which process event traces are potentially desired.\n"
+	"   syscall and signal tracing slow down monitored processes.\n"
+	"#  echo 0 > process_trace_{syscalls,signals,lifecycle}\n"
+	"\n"
+	"3. Add any final uid- or taskcomm-based filtering.  Non-matching\n"
+	"   processes will skip trace messages, but will still be slowed.\n"
+	"#  echo NNN > process_trace_uid_filter # -1: unrestricted \n"
+	"#  echo ls > process_trace_taskcomm_filter # empty: unrestricted\n"
+	"\n"
+	"4. Start tracing.\n"
+	"#  echo process > current_tracer\n"
+	"\n"
+	"5. Examine trace.\n"
+	"#  cat trace\n"
+	"\n"
+	"6. Stop tracing.\n"
+	"#  echo nop > current_tracer\n"
+	;
+
+static struct debugfs_blob_wrapper README_blob = {
+	.data = README_text,
+	.size = sizeof(README_text),
+};
+
+
+static __init int init_process_trace(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+
+	entry = debugfs_create_blob("process_trace_README", 0444, d_tracer,
+				    &README_blob);
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_trace_README' entry\n");
+
+	/* Control for scoping process following. */
+	entry = debugfs_create_u32("process_follow_pid", 0644, d_tracer,
+				   &process_follow_pid);
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_follow_pid' entry\n");
+
+	/* Process-level filters */
+	entry = debugfs_create_file("process_trace_taskcomm_filter", 0644,
+				    d_tracer, NULL, &trace_taskcomm_filter_fops);
+	/* XXX: it'd be nice to have a read/write debugfs_create_blob. */
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_trace_taskcomm_filter' entry\n");
+
+	entry = debugfs_create_u32("process_trace_uid_filter", 0644, d_tracer,
+				   &trace_taskuid_filter);
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_trace_uid_filter' entry\n");
+
+	/* Event-level filters. */
+	entry = debugfs_create_u32("process_trace_lifecycle", 0644, d_tracer,
+				   &trace_lifecycle_p);
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_trace_lifecycle' entry\n");
+
+	entry = debugfs_create_u32("process_trace_syscalls", 0644, d_tracer,
+				   &trace_syscalls_p);
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_trace_syscalls' entry\n");
+
+	entry = debugfs_create_u32("process_trace_signals", 0644, d_tracer,
+				   &trace_signals_p);
+	if (!entry)
+		pr_warning("Could not create debugfs 'process_trace_signals' entry\n");
+
+	return register_tracer(&process_tracer);
+}
+
+device_initcall(init_process_trace);


From ananth at in.ibm.com  Mon Feb 23 07:47:17 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Mon, 23 Feb 2009 13:17:17 +0530
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: <20090121062825.GD3251@in.ibm.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
Message-ID: <20090223074717.GA3340@in.ibm.com>

On Wed, Jan 21, 2009 at 11:58:25AM +0530, Ananth N Mavinakayanahalli wrote:
> On Mon, Jan 19, 2009 at 03:20:31PM -0800, Roland McGrath wrote:
> > Thanks for working on this, Ananth.  (Btw, it's "embed.")
> > 
> > I think it would be less disruptive (and materially no different)
> > to leave utrace_flags as it is.  That field is the one (and only)
> > that is used in hot paths (or used anywhere outside utrace.c).
> > It might in future get moved around to stay in a cache-hot part
> > of task_struct, for example.
> > 
> > The long comment above struct utrace is really all about implementation
> > details inside utrace.c and I don't think you should move that commentary
> > to the header file.  Instead, put a comment saying that the contents of
> > struct utrace and their use is entirely private to kernel/utrace.c and it
> > is only defined in the header to make its size known for struct task_struct
> > layout (and init_task.h).
> > 
> > I committed some cosmetic changes that will make for a little less flutter
> > in your patch.
> 
> Here is V2 of the patch. Tested and works fine. Same two tests on
> powerpc fail, all tests pass on x86, while there are some occurances of
> the ptrace.c WARN_ON.
> 
> Roland,
> I've tried to tweak the comments appropriately. Please feel free to
> modify them as you consider fit.

Roland,

Any updates on this and the utrace upstream integration front?

Ananth


From ronen_zeboun at tottenhamhotspur-footballclub.com  Wed Feb 25 03:25:47 2009
From: ronen_zeboun at tottenhamhotspur-footballclub.com (Yamir)
Date: Wed, 25 Feb 2009 04:25:47 +0100
Subject: Fw: Degree - power !
Message-ID: <031901c99701$04a39800$88b22acf@[207.42.178.136]>


        If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.

      We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate.


      Within four to six weeks, you will be a college graduate.

      Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing?


      This is your chance to finally make the right move and receive your due benefits.

      Ring Anytime +1 9043461158

      CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE
      TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE!

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090225/dd54b239/attachment.htm>

From dragnev_valkiria at tva.corporate.be  Wed Feb 25 03:32:55 2009
From: dragnev_valkiria at tva.corporate.be (Indiana)
Date: Wed, 25 Feb 2009 04:32:55 +0100
Subject: Better degree - better pay !
Message-ID: <3ca101c99702$047fd1bc$66254b5e@h94-75-37-102.ufamts.ru>


        If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.

      We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate.


      Within four to six weeks, you will be a college graduate.

      Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing?


      This is your chance to finally make the right move and receive your due benefits.

      Ring Anytime +1 9043461158

      CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE
      TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE!

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090225/3dd0a470/attachment.htm>

From unal_tan at tkaccess.com  Wed Feb 25 03:42:15 2009
From: unal_tan at tkaccess.com (Fiedorowicz Olena)
Date: Wed, 25 Feb 2009 04:42:15 +0100
Subject: Fw: Passed up, again ?
Message-ID: <08ef01c99703$05de9f4a$9f00787c@ppp-124-120-0-159.revip2.asianet.co.th>


        If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.

      We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate.


      Within four to six weeks, you will be a college graduate.

      Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing?


      This is your chance to finally make the right move and receive your due benefits.

      Ring Anytime +1 9043461158

      CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE
      TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE!

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090225/021473b2/attachment.htm>

From khazanov_katz at tv.borisov-e.info  Wed Feb 25 03:51:14 2009
From: khazanov_katz at tv.borisov-e.info (Swindell Nassar)
Date: Wed, 25 Feb 2009 04:51:14 +0100
Subject: Fw: Do you have life experience ?
Message-ID: <6c1501c99704$012f27b0$ab16637d@[125.99.22.171]>


        If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.

      We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate.


      Within four to six weeks, you will be a college graduate.

      Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing?


      This is your chance to finally make the right move and receive your due benefits.

      Ring Anytime +1 9043461158

      CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE
      TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE!

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090225/b5af8c52/attachment.htm>

From toggle.buttiens at thescoutnetwork.com  Wed Feb 25 03:21:29 2009
From: toggle.buttiens at thescoutnetwork.com (Per Alexi)
Date: Wed, 25 Feb 2009 04:21:29 +0100
Subject: Fw: Better degree - more money !
Message-ID: <6fa901c99700$0295fd7e$b237bb4f@hcd178.internetdsl.tpnet.pl>


        If you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.

      We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree - Bachelors, Masters or even a Doctorate.


      Within four to six weeks, you will be a college graduate.

      Many people are doing the work of the person that has the degree and the person that has the degree is getting all the money. Don't you think that it is time you were paid fair compensation for the level of work you are already doing?


      This is your chance to finally make the right move and receive your due benefits.

      Ring Anytime +1 9043461158

      CALL US TODAY AND GIVE YOUR WORK EXPERIENCE THE CHANCE
      TO EARN YOU THE HIGHER COMPENSATION YOU DESERVE!

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090225/0c534908/attachment.htm>

From braga at g4s.slovnaft.sk  Wed Feb 25 15:21:41 2009
From: braga at g4s.slovnaft.sk (Fribley Terra)
Date: Wed, 25 Feb 2009 15:21:41 +0000
Subject: More orgasmms
Message-ID: <8221543643.20090225151945@g4s.slovnaft.sk>


New Orgasm Enhanncer
    

Church auspices. They supplemented it with a dance and originality
have produced an immense sensation, something in it. That's
why i've asked you all the expression, sirsaid she expected
me over on i saw elinor carlisle, she spoke to me of roses..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090225/3626ab78/attachment.htm>

From jkenisto at us.ibm.com  Wed Feb 25 19:53:48 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Wed, 25 Feb 2009 11:53:48 -0800
Subject: instruction-analysis API(s)
In-Reply-To: <20090210044230.GB12811@in.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
	<498CA248.2090708@redhat.com>
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
	<4990B6D4.2020907@redhat.com>  <20090210044230.GB12811@in.ibm.com>
Message-ID: <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>

On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote:
> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote:
> > Jim Keniston wrote:
> > > On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
> > >> Hi Jim,
> > >>
> > >> I'm also interested in the instruction decoder.
> > >> If you don't mind, could we share the API specification?
> > >> I'd like to port djprobe on it.
> > > 
> > > I'm enclosing the little x86 instruction-analysis protoype I hacked
> > > together (insn_x86.*), along with a copy of systemtap's
> > > runtime/uprobes2/uprobes_x86.c, which I modified to use it.
> > 
> > Hmm, actually, djprobe needs both of the length and the type of
> > instructions, since it has to know how many bytes must be copied
> > and be replaced by a long jump.
> > 
> > > But again, we haven't really settled on an API.  For example, my x86
> > > prototype doesn't collect all the info that kvm needs.  We're thinking
> > > that adapting some existing code (like kvm in the x86 case) might be
> > > more palatable to LKML.
> > 
> > Sure, since kvm and emulators have to fetch the values of src/dst
> > for the emulation, they need actual register values. On the other hand,
> > the disasm/*probe have to analysis code before hitting, so they
> > don't know the actual value of the registers.
> > 
> > So, I think we should split x86_decode_insn() into 2 parts, static
> > analysis and emulation preparation.
> > 
> > For example:
> > 1) analyzing code statically (x86_analyze_insn)
> >    - just decoding an instruction
> >    - this phase may consist of several sub-functions.
> > 
> > 2) preparing emulation (x86_evaluate_insn)
> >    - evaluating src/dst based on current(vcpu) registers
> > 
> > 3) executing emulation (x86_emulate_insn)
> >    - emulating an analyzed instruction
> 
> Right, that surely sounds like the way to go. However, we've been
> cautioned that the instruction emulation area of the kvm code is very
> performance sensitive. But, there is no harm in prototyping the above
> and then worrying about any optimizations so there isn't a performance
> issue -- in any case, I guess [ku]probes are very infrequent users of
> this compared to KVM.
> 
> Ananth

Hi, Masami.

Ananth, Srikar, Maneesh, and I talked about this last night.  While I
was on vacation, Srikar did further investigation into adapting x86
kvm's instuction analysis for more general use, and he's not optimistic.
For the short term, at least (i.e., between now and the Linux Foundation
Collaboration Summit in April), we're going to proceed based on the
prototype I developed.

As you noted, djprobes needs instruction lengths, and my prototype
doesn't provide that info.  (Uprobes computes instruction lengths for
rip-relative x86_64 instructions, but that's only a subset of what you
need.)  Are you interested in extending/enhancing my prototype to make
it useful for djprobes?  If so, I'd be happy to consult.

Thanks.
Jim


From mhiramat at redhat.com  Thu Feb 26 15:29:14 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Thu, 26 Feb 2009 10:29:14 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>	
	<498CA248.2090708@redhat.com>	
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>	
	<4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com>
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
Message-ID: <49A6B54A.9050408@redhat.com>

Jim Keniston wrote:
> Hi, Masami.
> 
> Ananth, Srikar, Maneesh, and I talked about this last night.  While I
> was on vacation, Srikar did further investigation into adapting x86
> kvm's instuction analysis for more general use, and he's not optimistic.
> For the short term, at least (i.e., between now and the Linux Foundation
> Collaboration Summit in April), we're going to proceed based on the
> prototype I developed.
> 
> As you noted, djprobes needs instruction lengths, and my prototype
> doesn't provide that info.  (Uprobes computes instruction lengths for
> rip-relative x86_64 instructions, but that's only a subset of what you
> need.)  Are you interested in extending/enhancing my prototype to make
> it useful for djprobes?  If so, I'd be happy to consult.

Hi Jim,

Thank you for considering djprobe.
Actually, I'm developing insn_get_length() based on your prototype and
porting djprobe on it. After tested code, I'd like to post the insn_x86 code.

Thank you,

> 
> Thanks.
> Jim
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com


From novelizer at osasto.org  Fri Feb 27 09:14:18 2009
From: novelizer at osasto.org (Goldrup Maheux)
Date: Fri, 27 Feb 2009 09:14:18 +0000
Subject: More orggasms
Message-ID: <8563915257.20090227111030@osasto.org>


	NNew Orgasm Enhancer


By the king. If a person, o yudhishthira, that and why should
they not be amused? Said lady mabel, rivers under the agreement
that a general rendezvous world. Hear, o arjuna, the arguments
by which the white, ashen face, in the dark hollowness.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090227/e4dfa3a8/attachment.htm>

From madusalif01 at yahoo.com  Fri Feb 27 17:46:27 2009
From: madusalif01 at yahoo.com (Madu Salif)
Date: 27 Feb 2009 09:46:27 -0800
Subject: URGENT RESPOND PLEASE
Message-ID: <200902271746.n1RHkRlm010164@mx1.redhat.com>

You are invited to "URGENT RESPOND PLEASE".


By your host Madu Salif:


     Date:		Friday February 27, 2009

     Time:		5:00 pm - 6:00 pm (GMT +00:00)
     Street:		I AM MR. MADU SALIF A BANKER IN ONE OF THE REPUTABLE BANK IN BURKINA FASO (A.D.B.). I HAVE DECIDED TO CONTACT YOU ON A BUSINESS PROPOSAL OF US$15M (FIFTEEN MILLION UNITED STATES DOLLAR, THE DEPOSITOR OF THE SAID FUND DIED WITH HIS ENTIRE FAMILY DURING THE IRAQ WAR IN 2006. THE DECEASED CUSTOMER USED HIS WIFE AS THE NEXT OF KIN BUT UNFORTUNATELY, THE WIFE DIED ALONG SIDE WITH HIM LEAVING NOBODY FOR THE CLAIM. ACCORDING TO OUR BANKING LAW, IF THE FUND REMAIN UNCLAIMED FOR TWO (3) TRANSFEYEARS THEN, THE FUND WILL BE INTO THE RESERVE BANK TREASURY AS UNCLAIMED BILL. I DON'T WANT THE FUND TO GO INTO THE BANK TREASURY AND AS SUCH,YOUR PERCENTAGE WILL BE 30%,10% WILL BE FOR EXPENSES WHILE 60% WILL BE FOR ME, PLEASE REPLY ME THROUGH THIS MY PRIVATE EMAIL ADDRESS:privatemadusalif at yahoo.com

Guests:

     * trishwalesandcompany at yahoo.ca
     * tropicanafruit at yahoo.com
     * trousdaleteam at yahoo.com
     * troyangavery at sympatico.ca
     * troyboydavis at yahoo.com
     * troynewcomers at yahoo.com
     * trudeau4bj at verizon.net
     * trulycohoon at yahoo.ca
     * truthbringers at yahoo.ca
     * tsedmonds at yahoo.ca
     * tshilundu90 at yahoo.fr
     * ttmaustria at yahoo.de
     * tumsai2004 at yahoo.co.uk
     * tv_crew at yahoo.ca
     * twoheartsofone at yahoo.com
     * typingisfun at yahoo.ca
     * tyranereese64 at yahoo.com
     * tysdal40 at yahoo.ca
     * u_wehr at yahoo.de
     * uanewman at yahoo.com
     * ubcjtai at yahoo.ca
     * ubiquitarius at hotmail.com
     * ucdgarcia at yahoo.com
     * ucsbnvp at yahoo.com
     * ucsbrollerhockey at yahoo.com
     * ujmed2006 at yahoo.com
     * ukbello01 at hotmail.com
     * ukclaimsdept_morrison at yahoo.co.uk
     * ukclaimsdept_morrison at yahoo.co.ukfrom
     * ul at knust.edu.gh
     * ultan at icebroadband.com
     * ultraplops at yahoo.com
     * uly_paya001 at yahoo.de
     * umfunkcb at cc.umanitoba.ca
     * umuariki at xtra.co.nz
     * umuscm01 at yahoo.com
     * un4gettable_grl at yahoo.ca
     * uniek at yahoo.com
     * unifeibr at yahoo.com.br
     * uniqua69 at yahoo.com
     * uofs_apala at yahoo.com
     * uprguad at gvtc.com
     * uraniumnews at yahoo.ca
     * urpinforma at comunevalmontone.it
     * usa at hotmail.com
     * users at tomcat.apache.org
     * uta_distro at yahoo.ca
     * utaeick at yahoo.de
     * utrace-devel at redhat.com
     * uwe_dornbusch at yahoo.de
     * uxf39ftjmcw at yahoo.co.uk
     * uyiyot at hotmail.com
     * uyiyot at yahoo.ca
     * v.chirkov at usask.ca
     * valenciapeete at yahoo.com
     * valorz_09 at yahoo.com
     * vancouver_doula at yahoo.ca
     * vanvlietp at yahoo.ca
     * vanyounker at yahoo.ca
     * vclarsen at smig.net
     * vdl1 at leicester.ac.uk
     * vecassell at yahoo.ca
     * veena.aumyogatherapy at yahoo.ca
     * vera.rosendahl at bmz.bund.de
     * verlyn at votevo.ca
     * vfoleybourgon at yahoo.ca
     * vfranz82 at yahoo.it
     * vicokojie at yahoo.com
     * vicsanvic at yahoo.com
     * videodansedubreuil at yahoo.com
     * viestards.lists at gmail.com
     * vikeda at ccsf.org
     * vilegarret at yahoo.de
     * vilmamiriam at yahoo.com.br
     * vilmamiriam at yahoo.com.brchris
     * vinay_sajip at yahoo.co.uk
     * vinids at pucrs.br
     * vinids at terra.com.br
     * virginia_seabrook at yahoo.com
     * lydiadaniels01 at yahoo.com

invitation_add_to_your_yahoo_calendar:

     http://calendar.yahoo.com/?v=60&ST=20090227T170000%2B0000&TITLE=URGENT+RESPOND+PLEASE&DUR=0100&VIEW=d&in_st=I+AM+MR.+MADU+SALIF+A+BANKER+IN+ONE+OF+THE+REPUTABLE+BANK+IN+BURKINA+FASO+(A.D.B.).+I+HAVE+DECIDED+TO+CONTACT+YOU+ON+A+BUSINESS+PROPOSAL+OF+US$15M+(FIFTEEN+MILLION+UNITED+STATES+DOLLAR,+THE+DEPOSITOR+OF+THE+SAID+FUND+DIED+WITH+HIS+ENTIRE+FAMILY+DURING+THE+IRAQ+WAR+IN+2006.+THE+DECEASED+CUSTOMER+USED+HIS+WIFE+AS+THE+NEXT+OF+KIN+BUT+UNFORTUNATELY,+THE+WIFE+DIED+ALONG+SIDE+WITH+HIM+LEAVING+NOBODY+FOR+THE+CLAIM.+ACCORDING+TO+OUR+BANKING+LAW,+IF+THE+FUND+REMAIN+UNCLAIMED+FOR+TWO+(3)+TRANSFEYEARS+THEN,+THE+FUND+WILL+BE+INTO+THE+RESERVE+BANK+TREASURY+AS+UNCLAIMED+BILL.+I+DON%27T+WANT+THE+FUND+TO+GO+INTO+THE+BANK+TREASURY+AND+AS+SUCH,YOUR+PERCENTAGE+WILL+BE+30%25,10%25+WILL+BE+FOR+EXPENSES+WHILE+60%25+WILL+BE+FOR+ME,+PLEASE+REPLY+ME+THROUGH+THIS+MY+PRIVATE+EMAIL+ADDRESS%3aprivatemadusalif at yahoo.com&TYPE=10

    
Copyright ? 2009 All Rights Reserved
 www.yahoo.com

Privacy Policy:
 http://privacy.yahoo.com/privacy/us

Terms of Service:
 http://docs.yahoo.com/info/terms/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090227/da3bb06a/attachment.htm>

From mhiramat at redhat.com  Fri Feb 27 21:20:02 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Fri, 27 Feb 2009 16:20:02 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>	
	<498CA248.2090708@redhat.com>	
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>	
	<4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com>
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
Message-ID: <49A85902.8000306@redhat.com>

Jim Keniston wrote:
> On Tue, 2009-02-10 at 10:12 +0530, Ananth N Mavinakayanahalli wrote:
>> On Mon, Feb 09, 2009 at 06:05:56PM -0500, Masami Hiramatsu wrote:
>>> Jim Keniston wrote:
>>>> On Fri, 2009-02-06 at 15:49 -0500, Masami Hiramatsu wrote:
>>>>> Hi Jim,
>>>>>
>>>>> I'm also interested in the instruction decoder.
>>>>> If you don't mind, could we share the API specification?
>>>>> I'd like to port djprobe on it.
>>>> I'm enclosing the little x86 instruction-analysis protoype I hacked
>>>> together (insn_x86.*), along with a copy of systemtap's
>>>> runtime/uprobes2/uprobes_x86.c, which I modified to use it.
>>> Hmm, actually, djprobe needs both of the length and the type of
>>> instructions, since it has to know how many bytes must be copied
>>> and be replaced by a long jump.
>>>
>>>> But again, we haven't really settled on an API.  For example, my x86
>>>> prototype doesn't collect all the info that kvm needs.  We're thinking
>>>> that adapting some existing code (like kvm in the x86 case) might be
>>>> more palatable to LKML.
>>> Sure, since kvm and emulators have to fetch the values of src/dst
>>> for the emulation, they need actual register values. On the other hand,
>>> the disasm/*probe have to analysis code before hitting, so they
>>> don't know the actual value of the registers.
>>>
>>> So, I think we should split x86_decode_insn() into 2 parts, static
>>> analysis and emulation preparation.
>>>
>>> For example:
>>> 1) analyzing code statically (x86_analyze_insn)
>>>    - just decoding an instruction
>>>    - this phase may consist of several sub-functions.
>>>
>>> 2) preparing emulation (x86_evaluate_insn)
>>>    - evaluating src/dst based on current(vcpu) registers
>>>
>>> 3) executing emulation (x86_emulate_insn)
>>>    - emulating an analyzed instruction
>> Right, that surely sounds like the way to go. However, we've been
>> cautioned that the instruction emulation area of the kvm code is very
>> performance sensitive. But, there is no harm in prototyping the above
>> and then worrying about any optimizations so there isn't a performance
>> issue -- in any case, I guess [ku]probes are very infrequent users of
>> this compared to KVM.
>>
>> Ananth
> 
> Hi, Masami.
> 
> Ananth, Srikar, Maneesh, and I talked about this last night.  While I
> was on vacation, Srikar did further investigation into adapting x86
> kvm's instuction analysis for more general use, and he's not optimistic.
> For the short term, at least (i.e., between now and the Linux Foundation
> Collaboration Summit in April), we're going to proceed based on the
> prototype I developed.
> 
> As you noted, djprobes needs instruction lengths, and my prototype
> doesn't provide that info.  (Uprobes computes instruction lengths for
> rip-relative x86_64 instructions, but that's only a subset of what you
> need.)  Are you interested in extending/enhancing my prototype to make
> it useful for djprobes?  If so, I'd be happy to consult.

Here are a patch against your code and an example code for
instruction length decoder.
Curiously, KVM's instruction decoder does not completely
cover all instructions(especially, Jcc/test...).
I had to refer Intel manuals.

Moreover, even with this patch, the decoder is incomplete.
- this doesn't cover 3bytes opcode yet.
- this doesn't decode sib, displacement and immediate.
- might have some bugs :-(


Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: insn_x86.patch
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090227/0e11f8e8/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: insndec.c
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090227/0e11f8e8/attachment.c>

From 2009-editions at canada-2009.com  Sat Feb 28 09:05:08 2009
From: 2009-editions at canada-2009.com (Annuaire subventions 2009)
Date: Sat, 28 Feb 2009 04:05:08 -0500
Subject: Available; canadian subsidies 2009
Message-ID: <12347743551abeca64e325c3800c7b56084dadaaf4@canada-2009.com>

Canadian Subsidy directory (2009 EDITION)

The new Subsidy Directory 2009 is now available, newly revised it is the most complete and affordable reference for anyone looking for financing.
It is the perfect tool for new and existing businesses, individuals, foundations and associations.

This Publication contains  more than 3500 direct and indirect financial subsidies, grants and loans offered by government departments and agencies, foundations, associations and organizations.  
In this edition all programs are well described.

Canadian Subsidy Directory (All Canada, federal + provincial + foundations)
CD-Rom (Pdf file).............................$ 69.95
Printed (430 pages)..........................$149.95

Also available for each province on CD-Rom only...........$ 49.95
Alberta 
British Columbia 
New Brunswick 
Newfoundland & Labrador 
Northwest Territories / Nunavut / Yukon 
Manitoba
Nova Scotia 
Ontario 
Prince Edward Island 
Quebec .............................$ 69.95
Saskatchewan 

To obtain a copy please call toll free 1-866-322-3376 or local 819-322-5756


Canadian Subsidy Directory
14-A Des Seigneurs
St-Sauveur
Qc
J0R 1R0
Qc


From metaplasmic at velpitaris.ro  Sat Feb 28 13:45:19 2009
From: metaplasmic at velpitaris.ro (Debrah Harwood)
Date: Sat, 28 Feb 2009 13:45:19 +0000
Subject: More orgasmss
Message-ID: <3735026269.20090228124529@velpitaris.ro>


	New OOrgasm Enhancer
   
 
Wailed. Sure, she'll be tuk sick in those woild remember
also the curious earnestness with which summed up his opinion
for pleasures, like schoolboys the things of the flesh,
she is no longer hard. As they had lately too often treated
their white.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090228/253ea586/attachment.htm>

From comercial at coweb.com.br  Sat Feb 28 16:39:30 2009
From: comercial at coweb.com.br (Coweb Solu��es On-line)
Date: Sat, 28 Feb 2009 16:39:30 GMT
Subject: =?iso-8859-1?q?Criamos_Sites_Din=E2micos_e_Personalizados=2E?=
Message-ID: <200902281639.n1SGclX3000562@mx3.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090228/6233326c/attachment.htm>

From roland at redhat.com  Mon Mar  2 12:07:54 2009
From: roland at redhat.com (Roland McGrath)
Date: Mon,  2 Mar 2009 04:07:54 -0800 (PST)
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: Ananth N Mavinakayanahalli's message of  Monday,
	23 February 2009 13:17:17 +0530 <20090223074717.GA3340@in.ibm.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
Message-ID: <20090302120754.9A64AFC3C6@magilla.sf.frob.com>

Hi, Ananth.  Sorry everything has slid so long (again).
(I have far too many hats and the past month not so many brains!)

Here is my immediate agenda for utrace hacking:

* I have incorporated the "embed struct utrace" changes.

  I did various small bits of reorganization and cosmetic cleanup
  first to make the actual data structure change a smaller patch.
  Since things had changed around, I didn't actually use your patch.
  I just did it over myself, but I think it's nearly the same.

  After this change, we now need some fresh testing of things like Frank's
  ftrace widget and stap's utrace-using modes.  (Nothing should have
  changed from the utrace API perspective.)

  I've created the new branch "utrace-indirect" with a revert of the
  change.  I think this is really the better way to organize the data
  structures, as I've mentioned before.  After we get an initial utrace
  merged in upstream, I intend to revive this branch and turn it into an
  incremental patch to (re-)improve the data structures later on.  That's
  for later; for the time being, the branch will just sit idle.

* I've renamed "struct utrace_attached_engine" to "struct utrace_engine".
  This was a cosmetic suggestion in an earlier LKML review, and I could not
  really find any good reason to keep the longer name.  We all seem to say
  "a utrace engine" in conversation when talking about this object.

  I added the UTRACE_API_VERSION macro to ease existing utrace-using code
  adapting to old/new names.

* I'll shortly scour the old review comments for more cosmetic things we
  might change.

* I would like to have a final "in-team" top-to-bottom review of the main
  utrace patch before sending to LKML.  i.e. maybe by you, Frank, me, and Oleg.
  Each pair of eyeballs should:  
  * make sure all barriers and other kinds of magic have adequate comments
    explaining why they are there and why they are correct
  * cite anything else that sticks out and maybe needs more comments
  * make sure all comments are accurate and understandable
  
* I want to resolve the UTRACE_STOP issues Renzo Davoli raised.
  (We don't have to get these API things perfect before posting upstream.
  I'm sure that once utrace is accepted on queue for merging, that later
  tweaks to its details will not meet particular resistance.)  But if there
  are problems and changes we can identify and work out now, we might as
  well get that done before posting upstream.

* When we on the team think the utrace patch is ready to post, we need to
  do a coordinated post of Frank's ftrace widget.  That is the first thing
  ready for upstream submission that uses utrace, and kernel people tell me
  they don't want to see utrace without also merging something that uses
  it.  I don't really want to get involved with that widget's code myself
  (got my hands full in the utrace layer), so others on the team should
  back Frank up on the review, testing, and fixing of the ftrace widget.


Thanks,
Roland


From edgarlemes at ymail.com  Mon Mar  2 12:26:14 2009
From: edgarlemes at ymail.com (Fabiola M. Lechuga)
Date: Mon, 2 Mar 2009 12:26:14 GMT
Subject: =?iso-8859-1?q?Novamente_Voc=EA_Pode!!!=2E=2E=2E?=
Message-ID: <200903021326.n22DQfvC021850@mx3.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090302/773c561b/attachment.htm>
-------------- next part --------------
Limpe seu nome na SERASA e no SPC sem pagar as CONTAS!!!

Novamente Voc? Pode!!!...
- Abrir conta em bancos;
- Comprar a prazo;
- Financiar bens;
- Obter empr?stimos;
- Conseguir emprego..., e muito mais...

Envie um e-mail para :creditoaprovadoja at gmail.com
e saiba como.

From cmoller at redhat.com  Mon Mar  2 15:08:01 2009
From: cmoller at redhat.com (Chris Moller)
Date: Mon, 02 Mar 2009 10:08:01 -0500
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: <20090302120754.9A64AFC3C6@magilla.sf.frob.com>
References: <20090119132838.GA3542@in.ibm.com>	<20090119232031.82675FC3C6@magilla.sf.frob.com>	<20090121062825.GD3251@in.ibm.com>	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
Message-ID: <49ABF651.2060708@redhat.com>

Roland,

Is this going to make into F11?  Or is it too early to tell that yet?

Roland McGrath wrote:
> Hi, Ananth.  Sorry everything has slid so long (again).
> (I have far too many hats and the past month not so many brains!)
>
> Here is my immediate agenda for utrace hacking:
>
>   

-- 
Chris Moller

  I know that you believe you understand what you think I said, but
  I'm not sure you realize that what you heard is not what I meant.
      -- Robert McCloskey


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090302/4442bb10/attachment.sig>

From roland at redhat.com  Mon Mar  2 20:12:35 2009
From: roland at redhat.com (Roland McGrath)
Date: Mon,  2 Mar 2009 12:12:35 -0800 (PST)
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: Chris Moller's message of  Monday,
	2 March 2009 10:08:01 -0500 <49ABF651.2060708@redhat.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
	<49ABF651.2060708@redhat.com>
Message-ID: <20090302201235.36E4EFC3C6@magilla.sf.frob.com>

> Is this going to make into F11?  Or is it too early to tell that yet?

F11 will have the latest utrace code at the time F11 freezes, certainly.

Thanks,
Roland


From ananth at in.ibm.com  Tue Mar  3 07:51:29 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 3 Mar 2009 13:21:29 +0530
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: <20090302120754.9A64AFC3C6@magilla.sf.frob.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
Message-ID: <20090303075129.GD22517@in.ibm.com>

On Mon, Mar 02, 2009 at 04:07:54AM -0800, Roland McGrath wrote:
> Hi, Ananth.  Sorry everything has slid so long (again).
> (I have far too many hats and the past month not so many brains!)

I understand. Thanks for the work, Roland.

> Here is my immediate agenda for utrace hacking:
> 
> * I have incorporated the "embed struct utrace" changes.
> 
>   I did various small bits of reorganization and cosmetic cleanup
>   first to make the actual data structure change a smaller patch.
>   Since things had changed around, I didn't actually use your patch.
>   I just did it over myself, but I think it's nearly the same.

The changes look simple and straightforward.

>   After this change, we now need some fresh testing of things like Frank's
>   ftrace widget and stap's utrace-using modes.  (Nothing should have
>   changed from the utrace API perspective.)

There is at least one change from the earlier behaviour -- rather than
utrace_attach_task() retrying by itself on a !parent attach, -EAGAIN is
returned to the user. That may need changes to the utrace client side.

>   I've created the new branch "utrace-indirect" with a revert of the
>   change.  I think this is really the better way to organize the data
>   structures, as I've mentioned before.  After we get an initial utrace
>   merged in upstream, I intend to revive this branch and turn it into an
>   incremental patch to (re-)improve the data structures later on.  That's
>   for later; for the time being, the branch will just sit idle.
> 
> * I've renamed "struct utrace_attached_engine" to "struct utrace_engine".
>   This was a cosmetic suggestion in an earlier LKML review, and I could not
>   really find any good reason to keep the longer name.  We all seem to say
>   "a utrace engine" in conversation when talking about this object.
> 
>   I added the UTRACE_API_VERSION macro to ease existing utrace-using code
>   adapting to old/new names.
> 
> * I'll shortly scour the old review comments for more cosmetic things we
>   might change.
> 
> * I would like to have a final "in-team" top-to-bottom review of the main
>   utrace patch before sending to LKML.  i.e. maybe by you, Frank, me, and Oleg.
>   Each pair of eyeballs should:  
>   * make sure all barriers and other kinds of magic have adequate comments
>     explaining why they are there and why they are correct
>   * cite anything else that sticks out and maybe needs more comments
>   * make sure all comments are accurate and understandable

I have just started staring at the new code and will pitch in with my
comments.
   
> * I want to resolve the UTRACE_STOP issues Renzo Davoli raised.
>   (We don't have to get these API things perfect before posting upstream.
>   I'm sure that once utrace is accepted on queue for merging, that later
>   tweaks to its details will not meet particular resistance.)  But if there
>   are problems and changes we can identify and work out now, we might as
>   well get that done before posting upstream.
> 
> * When we on the team think the utrace patch is ready to post, we need to
>   do a coordinated post of Frank's ftrace widget.  That is the first thing
>   ready for upstream submission that uses utrace, and kernel people tell me
>   they don't want to see utrace without also merging something that uses
>   it.  I don't really want to get involved with that widget's code myself
>   (got my hands full in the utrace layer), so others on the team should
>   back Frank up on the review, testing, and fixing of the ftrace widget.

I've just started with implementing a non-disruptive application core
dump. Its probably too early to commit, but it could also be a potential
in-kernel user of utrace. I've just started with quiescing all threads
but need to plug-in the core generating infrastructure for it. I am looking at
the possibility of tweaking do_coredump() to reuse it for this while the
workhorse can just be the binfmt->core_dump() itself. Its still in the
early prototype stage -- I'll post that when there is something more
concrete. Ideas/suggestions welcome!

Ananth


From srikar at linux.vnet.ibm.com  Tue Mar  3 13:26:53 2009
From: srikar at linux.vnet.ibm.com (Srikar Dronamraju)
Date: Tue, 3 Mar 2009 18:56:53 +0530
Subject: Running gdb and uprobes on the same program [ bug 9826 ]
Message-ID: <20090303132653.GA4464@linux.vnet.ibm.com>

Hi Roland, 

Here is analysis of the bug 9826. Can you please let me know your
thoughts?

Summary of the problem:
Probing a program started by gdb causes the traced program to receive
thousounds of SIGSEGV signals.

Consider two engines, first engine(gdb) which hasn't inserted any
breakpoints and second engine(uprobes) has inserted one breakpoint.  On
hitting a breakpoint,first engine(gdb) sets a UTRACE_STOP action while
the second engine (uprobes) sets a UTRACE_SINGLESTEP action.  The second
engine also shows interest in "quiesce" event. The quiesce handler would
return UTRACE_SINGLESTEP if the quiesce were to happen after the
UTRACE_SINGLESTEP has been requested. 

As expected this results in the traced program being stopped.  Once the
traced process is resumed, the UTRACE_SINGLESTEP action seems to be
ignored. Is this expected? 

1. How do we avoid singlestep from being ignored after resume?
2. Shouldn't gdb be interested only in breakpoint events that it has set
   earlier?
3. Is there a way for the engines to communicate to other engines that
   these engines and events are exclusively for itself and other engines
   need not bother? 

This is on a Fedora 10 kernel.

Details:
1. stap -ve 'probe process("ls").function("main") { print("hello world\n") }'

2. (In another window) gdb /bin/ls

3. run at gdb prompt.

	A. uprobes has inserted one breakpoint. 
	B. gdb has not inserted any breakpoints.
	C. Once breakpoint gets hit.
		I. ptrace engine (gdb) thro report_signal callback
		(ptrace_report_signal()) (gdb) sets the action to
		UTRACE_STOP.

		II. report_signal (uprobes) callback noticies that the
		breakpoint is of its interest and sets the instruction
		pointer to SSOL area and requests UTRACE_SINGLESTEP.  It
		also shows interest in quiesce event and the quiesce
		handler returns UTRACE_SINGLESTEP if the singlestep
		operation  is not complete.


	D. Since UTRACE_STOP is preferred over UTRACE_SINGLESTEP, the
	   traced program ("ls") is stopped and gdb prompt comes up.
	   with the message "

4. continue at gdb prompt
	A. uprobe_report_quiesce doesn't get called 
	B. does a resume and not a singlestep.
	C. Can result in SIGSEGV/SIGILL.
	D. report_signal callback for both engines run but for a
	different signal. 
		I. gdb engine sets UTRACE_STOP.
		II. uprobe engines set UTRACE_RESUME as it is in a
		different event (not a breakpoint or singlestep event).
	E. uprobes cannot complete singlestep and hence cannot change
	the instruction pointer to the main instruction stream.
	F. traced program is stopped and gdb prompt comes up with
	message  " ".

5. repeat step 4. 
	A. Same as in Step 4.
	B. process is in UTRACE_STOP hence has to be SIGKILLED.

--
Thanks and Regards
Srikar 


From srikar at linux.vnet.ibm.com  Tue Mar  3 13:43:42 2009
From: srikar at linux.vnet.ibm.com (Srikar Dronamraju)
Date: Tue, 3 Mar 2009 19:13:42 +0530
Subject: Running gdb and uprobes on the same program [ bug 9826 ]
In-Reply-To: <20090303132653.GA4464@linux.vnet.ibm.com>
References: <20090303132653.GA4464@linux.vnet.ibm.com>
Message-ID: <20090303134342.GC26404@linux.vnet.ibm.com>

* Srikar Dronamraju <srikar at linux.vnet.ibm.com> [2009-03-03 18:56:53]:

> Hi Roland, 
> 
> Here is analysis of the bug 9826. Can you please let me know your
> thoughts?
> 
> Summary of the problem:
> Probing a program started by gdb causes the traced program to receive
> thousounds of SIGSEGV signals.
> 
> Consider two engines, first engine(gdb) which hasn't inserted any
> breakpoints and second engine(uprobes) has inserted one breakpoint.  On
> hitting a breakpoint,first engine(gdb) sets a UTRACE_STOP action while
> the second engine (uprobes) sets a UTRACE_SINGLESTEP action.  The second
> engine also shows interest in "quiesce" event. The quiesce handler would
> return UTRACE_SINGLESTEP if the quiesce were to happen after the
> UTRACE_SINGLESTEP has been requested. 
> 
> As expected this results in the traced program being stopped.  Once the
> traced process is resumed, the UTRACE_SINGLESTEP action seems to be
> ignored. Is this expected? 
> 
> 1. How do we avoid singlestep from being ignored after resume?
> 2. Shouldn't gdb be interested only in breakpoint events that it has set
>    earlier?
> 3. Is there a way for the engines to communicate to other engines that
>    these engines and events are exclusively for itself and other engines
>    need not bother? 
> 
> This is on a Fedora 10 kernel.
> 
> Details:
> 1. stap -ve 'probe process("ls").function("main") { print("hello world\n") }'
> 
> 2. (In another window) gdb /bin/ls
> 
> 3. run at gdb prompt.
> 
> 	A. uprobes has inserted one breakpoint. 
> 	B. gdb has not inserted any breakpoints.
> 	C. Once breakpoint gets hit.
> 		I. ptrace engine (gdb) thro report_signal callback
> 		(ptrace_report_signal()) (gdb) sets the action to
> 		UTRACE_STOP.
> 
> 		II. report_signal (uprobes) callback noticies that the
> 		breakpoint is of its interest and sets the instruction
> 		pointer to SSOL area and requests UTRACE_SINGLESTEP.  It
> 		also shows interest in quiesce event and the quiesce
> 		handler returns UTRACE_SINGLESTEP if the singlestep
> 		operation  is not complete.
> 
> 
> 	D. Since UTRACE_STOP is preferred over UTRACE_SINGLESTEP, the
> 	   traced program ("ls") is stopped and gdb prompt comes up.
> 	   with the message"
"Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000000110020 in ?? () "

> 
> 4. continue at gdb prompt
> 	A. uprobe_report_quiesce doesn't get called 
> 	B. does a resume and not a singlestep.
> 	C. Can result in SIGSEGV/SIGILL.
> 	D. report_signal callback for both engines run but for a
> 	different signal. 
> 		I. gdb engine sets UTRACE_STOP.
> 		II. uprobe engines set UTRACE_RESUME as it is in a
> 		different event (not a breakpoint or singlestep event).
> 	E. uprobes cannot complete singlestep and hence cannot change
> 	the instruction pointer to the main instruction stream.
> 	F. traced program is stopped and gdb prompt comes up with
> 	message  " ".

Program received signal SIGSEGV, Segmentation fault.
0x0000000000111000 in ?? ()

> 
> 5. repeat step 4. 
> 	A. Same as in Step 4.
> 	B. process is in UTRACE_STOP hence has to be SIGKILLED.
> 

However if we use ni instead of continue at step 4 and then use
continue at step 5, the traced process runs to completion without any
issues. 

It looks like on the latest utrace code, utrace and ptrace on the same
task is disabled.

--
Thanks and Regards
Srikar


From fche at redhat.com  Tue Mar  3 15:47:37 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Tue, 03 Mar 2009 10:47:37 -0500
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: <20090302120754.9A64AFC3C6@magilla.sf.frob.com> (Roland McGrath's
	message of "Mon, 2 Mar 2009 04:07:54 -0800 (PST)")
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com> <20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
Message-ID: <y0mljrmk3qe.fsf@ton.toronto.redhat.com>


roland wrote:

>   After this change, we now need some fresh testing of things like Frank's
>   ftrace widget and stap's utrace-using modes.  (Nothing should have
>   changed from the utrace API perspective.)

Righto.

> * I've renamed "struct utrace_attached_engine" to "struct utrace_engine".
>   This was a cosmetic suggestion in an earlier LKML review, and I could not
>   really find any good reason to keep the longer name.  We all seem to say
>   "a utrace engine" in conversation when talking about this object.
>
>   I added the UTRACE_API_VERSION macro to ease existing utrace-using code
>   adapting to old/new names.

After a corresponding s/// of the ftrace patch, the code appears to
build fine.  I'll add an uglier #ifdef to the systemtap runtime and
will test the lot.

> * I would like to have a final "in-team" top-to-bottom review of the main
>   utrace patch before sending to LKML.  i.e. maybe by you, Frank, me, and Oleg.
>   [...]

I'll try to review it today.

> * When we on the team think the utrace patch is ready to post, we need to
>   do a coordinated post of Frank's ftrace widget.  [...]

Would you consider simply merging it into your git tree / patch suite?


- FChE


From oleg at redhat.com  Tue Mar  3 20:09:07 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 3 Mar 2009 21:09:07 +0100
Subject: [PATCH] tracehooks: kill death_cookie
Message-ID: <20090303200907.GA19207@redhat.com>

If I understand correctly death_cookie was needed before
"[PATCH] Embed struct utrace in task_struct - V2".

tracehook_report_death() could race with utrace_release_task() which
cleared ->utrace, that is why tracehook_notify_death() had to read
task_utrace_struct() in advance and then pass this argument to
utrace_report_death().

Looks like this is not needed any longer, kill this awful cookie.

Signed-off-by: Oleg Nesterov <oleg at redhat.com>

--- xxx/include/linux/utrace.h~KILL_COOKIE	2009-03-03 18:11:47.000000000 +0100
+++ xxx/include/linux/utrace.h	2009-03-03 20:43:43.000000000 +0100
@@ -100,7 +100,7 @@ void utrace_finish_vfork(struct task_str
 	__attribute__((weak));
 void utrace_report_exit(long *exit_code)
 	__attribute__((weak));
-void utrace_report_death(struct task_struct *, struct utrace *, bool, int)
+void utrace_report_death(struct task_struct *, bool, int)
 	__attribute__((weak));
 void utrace_report_jctl(int notify, int type)
 	__attribute__((weak));
--- xxx/include/linux/tracehook.h~KILL_COOKIE	2009-03-03 18:11:47.000000000 +0100
+++ xxx/include/linux/tracehook.h	2009-03-03 20:40:57.000000000 +0100
@@ -534,7 +534,6 @@ static inline int tracehook_notify_jctl(
 /**
  * tracehook_notify_death - task is dead, ready to notify parent
  * @task:		@current task now exiting
- * @death_cookie:	value to pass to tracehook_report_death()
  * @group_dead:		nonzero if this was the last thread in the group to die
  *
  * A return value >= 0 means call do_notify_parent() with that signal
@@ -546,10 +545,8 @@ static inline int tracehook_notify_jctl(
  * Called with write_lock_irq(&tasklist_lock) held.
  */
 static inline int tracehook_notify_death(struct task_struct *task,
-					 void **death_cookie, int group_dead)
+					 int group_dead)
 {
-	*death_cookie = task_utrace_struct(task);
-
 	if (task->exit_signal == -1)
 		return task->ptrace ? SIGCHLD : DEATH_REAP;
 
@@ -568,14 +565,12 @@ static inline int tracehook_notify_death
  * tracehook_report_death - task is dead and ready to be reaped
  * @task:		@current task now exiting
  * @signal:		return value from tracheook_notify_death()
- * @death_cookie:	value passed back from tracehook_notify_death()
  * @group_dead:		nonzero if this was the last thread in the group to die
  *
  * Thread has just become a zombie or is about to self-reap.  If positive,
  * @signal is the signal number just sent to the parent (usually %SIGCHLD).
  * If @signal is %DEATH_REAP, this thread will self-reap.  If @signal is
  * %DEATH_DELAYED_GROUP_LEADER, this is a delayed_group_leader() zombie.
- * The @death_cookie was passed back by tracehook_notify_death().
  *
  * If normal reaping is not inhibited, @task->exit_state might be changing
  * in parallel.
@@ -583,13 +578,12 @@ static inline int tracehook_notify_death
  * Called without locks.
  */
 static inline void tracehook_report_death(struct task_struct *task,
-					  int signal, void *death_cookie,
-					  int group_dead)
+					  int signal, int group_dead)
 {
 	smp_mb();
 	if (task_utrace_flags(task) & (UTRACE_EVENT(DEATH) |
 				       UTRACE_EVENT(QUIESCE)))
-		utrace_report_death(task, death_cookie, group_dead, signal);
+		utrace_report_death(task, group_dead, signal);
 }
 
 #ifdef TIF_NOTIFY_RESUME
--- xxx/kernel/exit.c~KILL_COOKIE	2009-03-03 18:11:47.000000000 +0100
+++ xxx/kernel/exit.c	2009-03-03 20:42:20.000000000 +0100
@@ -917,7 +917,6 @@ static void forget_original_parent(struc
 static void exit_notify(struct task_struct *tsk, int group_dead)
 {
 	int signal;
-	void *cookie;
 
 	/*
 	 * This does two things:
@@ -954,7 +953,7 @@ static void exit_notify(struct task_stru
 	    !capable(CAP_KILL))
 		tsk->exit_signal = SIGCHLD;
 
-	signal = tracehook_notify_death(tsk, &cookie, group_dead);
+	signal = tracehook_notify_death(tsk, group_dead);
 	if (signal >= 0)
 		signal = do_notify_parent(tsk, signal);
 
@@ -968,7 +967,7 @@ static void exit_notify(struct task_stru
 
 	write_unlock_irq(&tasklist_lock);
 
-	tracehook_report_death(tsk, signal, cookie, group_dead);
+	tracehook_report_death(tsk, signal, group_dead);
 
 	/* If the process is dead, release it - nobody will wait for it */
 	if (signal == DEATH_REAP)
--- xxx/kernel/utrace.c~KILL_COOKIE	2009-03-03 18:11:47.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-03 20:46:09.000000000 +0100
@@ -1675,9 +1675,9 @@ void utrace_report_exit(long *exit_code)
  * For this reason, utrace_release_task checks for the event bits that get
  * us here, and delays its cleanup for us to do.
  */
-void utrace_report_death(struct task_struct *task, struct utrace *utrace,
-			 bool group_dead, int signal)
+void utrace_report_death(struct task_struct *task, bool group_dead, int signal)
 {
+	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
 	BUG_ON(!task->exit_state);


From oleg at redhat.com  Tue Mar  3 22:09:43 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 3 Mar 2009 23:09:43 +0100
Subject: [PATCH] get_utrace_lock: kill the bogus engine->kref.refcount check
Message-ID: <20090303220943.GA24533@redhat.com>

When engine->kref.refcount becomes zero, engine is freed. No rcu, no
other delays. This means that if we see .refcount < 1 we already have
a bug: we are reading the freed (and perhaps unmapped) memory.

Perhaps it makes sense to use BUG_ON() but "return -EINVAL" just hides
the problem and looks misleading, kill this check.

Also remove the comment, the comment above get_utrace_lock() explains
that the caller has to hold a ref on the engine.

Signed-off-by: Oleg Nesterov <oleg at redhat.com>

--- xxx/kernel/utrace.c~WRONG_REFCNT_CK	2009-03-03 20:46:09.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-03 22:30:05.000000000 +0100
@@ -479,14 +479,6 @@ static struct utrace *get_utrace_lock(st
 {
 	struct utrace *utrace;
 
-	/*
-	 * You must hold a ref to be making a call.  A call from within
-	 * a report_* callback in @target might only have the ref for
-	 * being attached, not a second one of its own.
-	 */
-	if (unlikely(atomic_read(&engine->kref.refcount) < 1))
-		return ERR_PTR(-EINVAL);
-
 	rcu_read_lock();
 
 	/*


From roland at redhat.com  Tue Mar  3 23:06:17 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue,  3 Mar 2009 15:06:17 -0800 (PST)
Subject: [PATCH] get_utrace_lock: kill the bogus engine->kref.refcount
	check
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	3 March 2009 23:09:43 +0100 <20090303220943.GA24533@redhat.com>
References: <20090303220943.GA24533@redhat.com>
Message-ID: <20090303230617.3160AFC3C9@magilla.sf.frob.com>

Ok, applied.  I thought I'd seen that checking style in some other kref
user and was copying its style (which is admittedly a dubious thing, since
the free really has already happened), but I can't now find what I might
have been thinking of.


Thanks,
Roland


From roland at redhat.com  Tue Mar  3 23:08:38 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue,  3 Mar 2009 15:08:38 -0800 (PST)
Subject: [PATCH] tracehooks: kill death_cookie
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	3 March 2009 21:09:07 +0100 <20090303200907.GA19207@redhat.com>
References: <20090303200907.GA19207@redhat.com>
Message-ID: <20090303230838.476AEFC3C9@magilla.sf.frob.com>

I would rather not touch the tracehook interfaces now.  You are indeed
right that the motivation for this had to do with the utrace-indirect code.
As I've said, I do intend to resurrect that code and send it upstream later
on.  We can consider cleanups then.  For now, let's not do anything
preemptively that is likely to introduce a new need to touch non-utrace
code again later.


Thanks,
Roland


From roland at redhat.com  Tue Mar  3 23:14:01 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue,  3 Mar 2009 15:14:01 -0800 (PST)
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: Frank Ch. Eigler's message of  Tuesday,
	3 March 2009 10:47:37 -0500
	<y0mljrmk3qe.fsf@ton.toronto.redhat.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
	<y0mljrmk3qe.fsf@ton.toronto.redhat.com>
Message-ID: <20090303231401.3376CFC3C9@magilla.sf.frob.com>

> > * When we on the team think the utrace patch is ready to post, we need to
> >   do a coordinated post of Frank's ftrace widget.  [...]
> 
> Would you consider simply merging it into your git tree / patch suite?

Sure.  The way to do that is for you to publish a git repository that I can
pull from.  You can clone mine, and then make a new utrace-ftrace branch
forking from the utrace branch.  Tell me (e.g. use git-request-pull in email)
when you have an update.  Then I'll pull from you, and generate a patch for
people.redhat.com/roland/utrace/2.6-current/ as I do for my branches.


Thanks,
Roland


From jkenisto at us.ibm.com  Wed Mar  4 01:15:13 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Tue, 03 Mar 2009 17:15:13 -0800
Subject: instruction-analysis API(s)
In-Reply-To: <49A85902.8000306@redhat.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
	<498CA248.2090708@redhat.com>
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
	<4990B6D4.2020907@redhat.com>  <20090210044230.GB12811@in.ibm.com>
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
	<49A85902.8000306@redhat.com>
Message-ID: <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>

On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
...
> 
> Here are a patch against your code and an example code for
> instruction length decoder.
> Curiously, KVM's instruction decoder does not completely
> cover all instructions(especially, Jcc/test...).
> I had to refer Intel manuals.
> 
> Moreover, even with this patch, the decoder is incomplete.
> - this doesn't cover 3bytes opcode yet.
> - this doesn't decode sib, displacement and immediate.
> - might have some bugs :-(
> 
> 
> Thank you,

Thanks for your work on this.  Comments below.

Jim

> 
> plain text document attachment (insn_x86.patch)
> Index: insn_x86.h
> ===================================================================
> --- insn_x86.h	(revision 1510)
> +++ insn_x86.h	(working copy)
> @@ -66,6 +66,10 @@
>  	struct insn_field displacement;
>  	struct insn_field immediate;
> 
> +	u8 op_bytes;

I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
first saw "op", I thought "opcode".)  Also, we should clarify that these
are the EFFECTIVE lengths, not the lengths of the immediate and
displacement fields in the instruction.

> +	u8 ad_bytes;
> +	u8 length;
> +
>  	const u8 *kaddr;	/* kernel address of insn (copy) to analyze */
>  	const u8 *next_byte;
>  	bool x86_64;
> @@ -75,6 +79,7 @@
>  extern void insn_get_prefixes(struct insn *insn);
>  extern void insn_get_opcode(struct insn *insn);
>  extern void insn_get_modrm(struct insn *insn);
> +extern void insn_get_length(struct insn *insn);
> 
>  #ifdef CONFIG_X86_64
>  extern bool insn_rip_relative(struct insn *insn);
> Index: insn_x86.c
> ===================================================================
> --- insn_x86.c	(revision 1510)
> +++ insn_x86.c	(working copy)
> @@ -17,7 +17,7 @@
>   *
>   * Copyright (C) IBM Corporation, 2002, 2004, 2009
>   */
> -
> +#include <linux/module.h>
>  #include <linux/string.h>
>  // #include <asm/insn.h>
>  #include "insn_x86.h"
> @@ -34,6 +34,11 @@
>  	insn->kaddr = kaddr;
>  	insn->next_byte = kaddr;
>  	insn->x86_64 = x86_64;
> +	insn->op_bytes = 4;
> +	if (x86_64)
> +		insn->ad_bytes = 8;
> +	else
> +		insn->ad_bytes = 4;
>  }
>  EXPORT_SYMBOL_GPL(insn_init);
> 
> @@ -79,10 +84,51 @@
>  			break;
>  		prefixes->value |= pfx;
>  	}
> +	if (prefixes->value & X86_PFX_OPNDSZ) {
> +		/* oprand size switches 2/4 */
> +		insn->op_bytes ^= 6;
> +	}
> +	if (prefixes->value & X86_PFX_ADDRSZ) {
> +		/* address size switches 2/4 or 4/8 */
> +#ifdef CONFIG_X86_64
> +		if (insn->x86_64)
> +			insn->op_bytes ^= 12;
> +		else
> +#endif
> +			insn->op_bytes ^= 6;

This seems wrong.  You're checking the address-size prefix, but
adjusting the operand size.

> +	}
> +#ifdef CONFIG_X86_64
> +	if (prefixes->value & X86_PFX_REXW)
> +		insn->op_bytes = 8;
> +#endif
>  	prefixes->got = true;
>  }
>  EXPORT_SYMBOL_GPL(insn_get_prefixes);
> 
> +static bool __insn_is_stack(struct insn *insn)

It's not entirely clear to me what this function checks.  (A more
precise name might help.)  You have pushes, pops, and calls here, but
you also have some instructions that don't appear to affect the stack at
all.  And other push and pop instructions are missing.

> +{
> +	u8 reg;
> +	if (insn->opcode.nbytes == 2)
> +		return 0;

The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9.

Also, since the return value is bool, I'd prefer to see true/false
rather than 1/0.

> +
> +	switch(insn->opcode1) {
> +	case 0x68:
> +	case 0x6a:
> +	case 0x9c:
> +	case 0x9d:
> +	case 0xc5:

0xc5 = lds.  Why lds?

In general, it'd be nice to add a comment showing the mnemonic next to
each hex value -- e.g.,
	case 0x68: /* push */

> +	case 0xe8:
> +		return 1;
> +	}

Other related instructions: 9a, 1f, 07, 17, 8f.

> +	reg = ((*insn->next_byte) >> 3) & 7;
> +	if ((insn->opcode1 & 0xf0) == 0x50 ||
> +	    (insn->opcode1 == 0x1a && reg == 0) ||

The above line doesn't seem right.  It catches things like
sbb (%rax),%al .

> +	    (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) {

Looks like the interesting reg values are 2 (call), 3 (call), and 6
(push).

> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  /**
>   * insn_get_opcode - collect opcode(s)
>   * @insn:	&struct insn containing instruction
> @@ -108,6 +154,8 @@
>  		opcode->nbytes = 1;
>  	opcode->value = insn->opcode1;
>  	opcode->got = true;
> +	if (insn->x86_64 && __insn_is_stack(insn))
> +		insn->op_bytes = 8;
>  }
>  EXPORT_SYMBOL_GPL(insn_get_opcode);
> 
> @@ -208,3 +256,115 @@
>  }
>  EXPORT_SYMBOL_GPL(insn_rip_relative);
>  #endif
> +
> +/**
> + *
> + * insn_get_length() - Get the length of instruction
> + * @insn:	&struct insn containing instruction
> + *
> + * If necessary, first collects the instruction up to and including the
> + * ModRM byte.
> + */

As I mentioned in private email, you or I should probably refactor this
into:
- insn_get_sib()
- insn_get_displacement()
- insn_get_immediate()
- insn_get_length()

BTW, I'm going to have to change my definition of insn_field to
accommodate the 8-byte fields that can be found in instructions like
a0-a3 (8-byte displacement) and b8-bf (8-byte immediate).

> +void insn_get_length(struct insn *insn)
> +{
> +	u8 modrm;
> +	u8 mod = 0, reg = 0, rm = 0, sib;
> +	const u8 *next_byte;
> +	if (insn->length)
> +		return;
> +	if (!insn->modrm.got)
> +		insn_get_modrm(insn);
> +	next_byte = insn->next_byte;

This of course assumes that no fields have been fetched beyond the modrm
field.

> +
> +	if (insn->modrm.nbytes) {
> +		modrm = insn->modrm.value;
> +		mod = (modrm & 0xc0) >> 6;
> +		reg = (modrm & 0x38) >> 3;
> +		rm = (modrm & 0x07);

Some comments here would really help -- e.g...
/*
Interpreting the modrm byte:
mod = 00 - no displacement fields (exceptions below)
mod = 01 - 1-byte displacement field
mod = 10 - displacement field is 4 bytes, or 2 bytes if
	address size = 2 (0x67 prefix in 32-bit mode)
mod = 11 - no memory operand

If address size = 2...
mod = 00, r/m = 110 - displacement field is 2 bytes

If address size != 2...
mod != 11, r/m = 100 - SIB byte exists
mod = 00, SIB base field = 101 - displacement field is 4 bytes
mod = 00, r/m = 101 - rip-relative addressing, displacement
	field is 4 bytes
*/

> +		if (mod == 3)
> +			goto decode_src;
> +		if (insn->ad_bytes == 2) {
> +			if (mod == 1)
> +				next_byte++;
> +			else if (mod == 2)
> +				next_byte += 2;
> +			else if (rm == 6)
> +				next_byte += 2;
> +		} else {
> +			if (rm == 4) {
> +				sib = *(next_byte++);
> +				insn->sib.value = sib;
> +				insn->sib.nbytes = 1;
> +				insn->sib.got = 1;
> +				if ((sib & 7) == 5 && mod == 0)
> +					next_byte += 4;
> +			}
> +			if (mod == 1)
> +				next_byte++;
> +			else if (mod == 2)
> +				next_byte += 4;
> +			else if (rm == 5)
> +				next_byte += 4;
> +		}
> +	} else if (insn->opcode.nbytes == 1)
> +		if (0xa0 <= insn->opcode1 && insn->opcode1 < 0xa4)

Add comment:
			/* Displacement = entire address - up to 8 bytes */

> +			next_byte += insn->ad_bytes;
> +decode_src:

decode_src is a misnomer.  Here we're decoding the immediate operand
(which is always a source operand, but not the only kind).

> +	if (insn->opcode.nbytes == 1) {
> +		switch (insn->opcode1) {
> +		case 0x05:
> +		case 0x25:

What about (hex) 15, 35, 01, 0d, 2d?

> +		case 0x3d:
> +		case 0x68: // pushl
> +		case 0x69: // imul
> +		case 0x9a: /* long call */

0x9a (lcall) seems to have 6 bytes following the opcode, disassembled as
2 immediate operands.

> +		case 0xa9: // test
> +		case 0xc7:
> +		case 0xe8:
> +		case 0xe9:
> +		case 0xea: /* long jump */

Similarly, 0xea (ljmp) seems to have 6 bytes following the opcode,
disassembled as 2 immediate operands.

> +		case 0x82: /* Group */

s/82/81/ here.

> +			goto imm_common;
> +		case 0x04:
> +		case 0x24:

What about (hex) 14, 34, 0c, 1c, 2c?

> +		case 0x3c:
> +		case 0x6a: //pushb
> +		case 0x6b: //imul
> +		case 0xa8: //testb
> +		case 0xeb:
> +		case 0xc0:
> +		case 0xc1:
> +		case 0xc6:
> +		case 0x80: /* Group */
> +		case 0x81: /* Group */

s/81/82/ here.

> +		case 0x83: /* Group */
> +			goto immbyte_common;
> +		}
> +		if ((insn->opcode1 & 0xf8) == 0xb8 ||

I don't think this is right.  b8-bf can have 8-byte immediate fields
(with 0x48 prefix).

> +		    (insn->opcode1 == 0xf7 && reg == 0

or reg == 1

> ) ) {
> +imm_common:

Jumping into the middle of an if block is ugly, and not necessary here.

> +			next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes;
> +		} else if ((insn->opcode1 & 0xf8) == 0xb0 || //
> +			   (insn->opcode1 & 0xf0) == 0x70 || // Jcc
> +			   (insn->opcode1 & 0xf8) == 0xe0 || // loop/in/out
> +			    (insn->opcode1 == 0xf6 && reg == 0)) {
> +immbyte_common:

Jumping into the middle of an if block is ugly, and not necessary here.

> +			next_byte++;
> +		}

0xc8 and 0xcd are weird cases that we should handle .

> +	} else {
> +		switch (insn->opcode2) {

Add 0x70.

> +		case 0xa4:
> +		case 0xac:
> +		case 0xba:
> +		case 0x0f: // 3dnow
> +		case 0x3a: // ssse3
> +			next_byte++;
> +			break;
> +		default:
> +			if ((insn->opcode2 & 0xf0) == 0x80)
> +				next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes;
> +		}
> +	}
> +	insn->length = (u8)(next_byte - insn->kaddr);
> +}
> +EXPORT_SYMBOL_GPL(insn_get_length);
> 


From oleg at redhat.com  Wed Mar  4 21:27:35 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Wed, 4 Mar 2009 22:27:35 +0100
Subject: Q: utrace_attach_task && utrace_release_task
In-Reply-To: <20090303230838.476AEFC3C9@magilla.sf.frob.com>
References: <20090303200907.GA19207@redhat.com>
	<20090303230838.476AEFC3C9@magilla.sf.frob.com>
Message-ID: <20090304212735.GA21703@redhat.com>

On 03/03, Roland McGrath wrote:
>
> I would rather not touch the tracehook interfaces now.  You are indeed
> right that the motivation for this had to do with the utrace-indirect code.
> As I've said, I do intend to resurrect that code and send it upstream later
> on.  We can consider cleanups then.  For now, let's not do anything
> preemptively that is likely to introduce a new need to touch non-utrace
> code again later.

OK, understand, thanks.

A couple of questions...

utrace_attach_task() checks ->exit_state == EXIT_DEAD. Why? I mean,
how can it help, we don't hold any locks, target can change its
->exit_state right after the check.

So, looks like we can attach to the EXIT_DEAD target. Is it safe?
The only in-kernel user of utrace is ptrace, in that case I _think_
we are safe, we should notice that the task is dead later, for
example in get_utrace_lock(), and do UTRACE_DETACH. But in general,
is it OK?

Hmm... utrace_release_task() checks only ->attached, I can't understand
why it ignores ->attaching. Let's suppose we are doing PTRACE_ATTACH to
the exiting task, isn't it possible to leak the attached engine?

I don't understand why utrace_release_task() doesn't set ->reap = 1
unconditionally. In that case we could use this flag instead of
EXIT_DEAD to verify it is "safe" to attach or get_utrace_lock().

Back to utrace_attach_task(),

static inline int utrace_attach_delay(struct task_struct *target)
{
	if (target->flags & PF_STARTING) {
		struct utrace *utrace = task_utrace_struct(current);
		if (!utrace || utrace->cloning != target) {
			yield();
			if (signal_pending(current))
				return -ERESTARTNOINTR;
			return -EAGAIN;

Why does it call yield() before returning the error? This looks
really strange. And what is the point to check signal_pending()
here?

(btw, "!utrace" above is not possible).

Oleg.


From mhiramat at redhat.com  Thu Mar  5 02:10:08 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Wed, 04 Mar 2009 21:10:08 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>	
	<498CA248.2090708@redhat.com>	
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>	
	<4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com>	
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>	
	<49A85902.8000306@redhat.com>
	<1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>
Message-ID: <49AF3480.1040804@redhat.com>

Hi Jim,

Jim Keniston wrote:
> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
> ...
>> Here are a patch against your code and an example code for
>> instruction length decoder.
>> Curiously, KVM's instruction decoder does not completely
>> cover all instructions(especially, Jcc/test...).
>> I had to refer Intel manuals.
>>
>> Moreover, even with this patch, the decoder is incomplete.
>> - this doesn't cover 3bytes opcode yet.
>> - this doesn't decode sib, displacement and immediate.
>> - might have some bugs :-(
>>
>>
>> Thank you,
> 
> Thanks for your work on this.  Comments below.

Thank you very much for review!

Actually, that code was based on KVM code, so I also found many
opcodes were not supported.

> As I mentioned in private email, you or I should probably refactor this
> into:
> - insn_get_sib()
> - insn_get_displacement()
> - insn_get_immediate()
> - insn_get_length()

Agreed, these should be supported.

I also would like to change struct insn as below;

struct insn {
        struct insn_field prefixes;     /* prefixes.value is a bitmap */
        struct insn_field opcode;       /* opcode.bytes[n] == opcode_n */
        struct insn_field modrm;
        struct insn_field sib;
        struct insn_field displacement;
        union {
                struct insn_field immediate;
                struct insn_field moffset1;     /* for 64bit MOV */
                struct insn_field immediate1;   /* for 64bit imm or off16/32 */
        };
        union {
                struct insn_field moffset2;     /* for 64bit MOV */
                struct insn_field immediate2;   /* for 64bit imm or seg16 */
        };

        u8 opnd_bytes;
        u8 addr_bytes;
        u8 length;
        bool x86_64;

        const u8 *kaddr;        /* kernel address of insn (copy) to analyze */
        const u8 *next_byte;
};

opcode2 and opcode3 will be stored in opcode.value with opcode1.

Now, I'm updating my code. Would anyone also be working on it?

Thank you,

> 
> Jim
> 
>> plain text document attachment (insn_x86.patch)
>> Index: insn_x86.h
>> ===================================================================
>> --- insn_x86.h	(revision 1510)
>> +++ insn_x86.h	(working copy)
>> @@ -66,6 +66,10 @@
>>  	struct insn_field displacement;
>>  	struct insn_field immediate;
>>
>> +	u8 op_bytes;
> 
> I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
> first saw "op", I thought "opcode".)  Also, we should clarify that these
> are the EFFECTIVE lengths, not the lengths of the immediate and
> displacement fields in the instruction.
> 
>> +	u8 ad_bytes;
>> +	u8 length;
>> +
>>  	const u8 *kaddr;	/* kernel address of insn (copy) to analyze */
>>  	const u8 *next_byte;
>>  	bool x86_64;
>> @@ -75,6 +79,7 @@
>>  extern void insn_get_prefixes(struct insn *insn);
>>  extern void insn_get_opcode(struct insn *insn);
>>  extern void insn_get_modrm(struct insn *insn);
>> +extern void insn_get_length(struct insn *insn);
>>
>>  #ifdef CONFIG_X86_64
>>  extern bool insn_rip_relative(struct insn *insn);
>> Index: insn_x86.c
>> ===================================================================
>> --- insn_x86.c	(revision 1510)
>> +++ insn_x86.c	(working copy)
>> @@ -17,7 +17,7 @@
>>   *
>>   * Copyright (C) IBM Corporation, 2002, 2004, 2009
>>   */
>> -
>> +#include <linux/module.h>
>>  #include <linux/string.h>
>>  // #include <asm/insn.h>
>>  #include "insn_x86.h"
>> @@ -34,6 +34,11 @@
>>  	insn->kaddr = kaddr;
>>  	insn->next_byte = kaddr;
>>  	insn->x86_64 = x86_64;
>> +	insn->op_bytes = 4;
>> +	if (x86_64)
>> +		insn->ad_bytes = 8;
>> +	else
>> +		insn->ad_bytes = 4;
>>  }
>>  EXPORT_SYMBOL_GPL(insn_init);
>>
>> @@ -79,10 +84,51 @@
>>  			break;
>>  		prefixes->value |= pfx;
>>  	}
>> +	if (prefixes->value & X86_PFX_OPNDSZ) {
>> +		/* oprand size switches 2/4 */
>> +		insn->op_bytes ^= 6;
>> +	}
>> +	if (prefixes->value & X86_PFX_ADDRSZ) {
>> +		/* address size switches 2/4 or 4/8 */
>> +#ifdef CONFIG_X86_64
>> +		if (insn->x86_64)
>> +			insn->op_bytes ^= 12;
>> +		else
>> +#endif
>> +			insn->op_bytes ^= 6;
> 
> This seems wrong.  You're checking the address-size prefix, but
> adjusting the operand size.
> 
>> +	}
>> +#ifdef CONFIG_X86_64
>> +	if (prefixes->value & X86_PFX_REXW)
>> +		insn->op_bytes = 8;
>> +#endif
>>  	prefixes->got = true;
>>  }
>>  EXPORT_SYMBOL_GPL(insn_get_prefixes);
>>
>> +static bool __insn_is_stack(struct insn *insn)
> 
> It's not entirely clear to me what this function checks.  (A more
> precise name might help.)  You have pushes, pops, and calls here, but
> you also have some instructions that don't appear to affect the stack at
> all.  And other push and pop instructions are missing.
> 
>> +{
>> +	u8 reg;
>> +	if (insn->opcode.nbytes == 2)
>> +		return 0;
> 
> The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9.
> 
> Also, since the return value is bool, I'd prefer to see true/false
> rather than 1/0.
> 
>> +
>> +	switch(insn->opcode1) {
>> +	case 0x68:
>> +	case 0x6a:
>> +	case 0x9c:
>> +	case 0x9d:
>> +	case 0xc5:
> 
> 0xc5 = lds.  Why lds?
> 
> In general, it'd be nice to add a comment showing the mnemonic next to
> each hex value -- e.g.,
> 	case 0x68: /* push */
> 
>> +	case 0xe8:
>> +		return 1;
>> +	}
> 
> Other related instructions: 9a, 1f, 07, 17, 8f.
> 
>> +	reg = ((*insn->next_byte) >> 3) & 7;
>> +	if ((insn->opcode1 & 0xf0) == 0x50 ||
>> +	    (insn->opcode1 == 0x1a && reg == 0) ||
> 
> The above line doesn't seem right.  It catches things like
> sbb (%rax),%al .
> 
>> +	    (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) {
> 
> Looks like the interesting reg values are 2 (call), 3 (call), and 6
> (push).
> 
>> +		return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>>  /**
>>   * insn_get_opcode - collect opcode(s)
>>   * @insn:	&struct insn containing instruction
>> @@ -108,6 +154,8 @@
>>  		opcode->nbytes = 1;
>>  	opcode->value = insn->opcode1;
>>  	opcode->got = true;
>> +	if (insn->x86_64 && __insn_is_stack(insn))
>> +		insn->op_bytes = 8;
>>  }
>>  EXPORT_SYMBOL_GPL(insn_get_opcode);
>>
>> @@ -208,3 +256,115 @@
>>  }
>>  EXPORT_SYMBOL_GPL(insn_rip_relative);
>>  #endif
>> +
>> +/**
>> + *
>> + * insn_get_length() - Get the length of instruction
>> + * @insn:	&struct insn containing instruction
>> + *
>> + * If necessary, first collects the instruction up to and including the
>> + * ModRM byte.
>> + */
> 
> As I mentioned in private email, you or I should probably refactor this
> into:
> - insn_get_sib()
> - insn_get_displacement()
> - insn_get_immediate()
> - insn_get_length()
> 
> BTW, I'm going to have to change my definition of insn_field to
> accommodate the 8-byte fields that can be found in instructions like
> a0-a3 (8-byte displacement) and b8-bf (8-byte immediate).
> 
>> +void insn_get_length(struct insn *insn)
>> +{
>> +	u8 modrm;
>> +	u8 mod = 0, reg = 0, rm = 0, sib;
>> +	const u8 *next_byte;
>> +	if (insn->length)
>> +		return;
>> +	if (!insn->modrm.got)
>> +		insn_get_modrm(insn);
>> +	next_byte = insn->next_byte;
> 
> This of course assumes that no fields have been fetched beyond the modrm
> field.
> 
>> +
>> +	if (insn->modrm.nbytes) {
>> +		modrm = insn->modrm.value;
>> +		mod = (modrm & 0xc0) >> 6;
>> +		reg = (modrm & 0x38) >> 3;
>> +		rm = (modrm & 0x07);
> 
> Some comments here would really help -- e.g...
> /*
> Interpreting the modrm byte:
> mod = 00 - no displacement fields (exceptions below)
> mod = 01 - 1-byte displacement field
> mod = 10 - displacement field is 4 bytes, or 2 bytes if
> 	address size = 2 (0x67 prefix in 32-bit mode)
> mod = 11 - no memory operand
> 
> If address size = 2...
> mod = 00, r/m = 110 - displacement field is 2 bytes
> 
> If address size != 2...
> mod != 11, r/m = 100 - SIB byte exists
> mod = 00, SIB base field = 101 - displacement field is 4 bytes
> mod = 00, r/m = 101 - rip-relative addressing, displacement
> 	field is 4 bytes
> */
> 
>> +		if (mod == 3)
>> +			goto decode_src;
>> +		if (insn->ad_bytes == 2) {
>> +			if (mod == 1)
>> +				next_byte++;
>> +			else if (mod == 2)
>> +				next_byte += 2;
>> +			else if (rm == 6)
>> +				next_byte += 2;
>> +		} else {
>> +			if (rm == 4) {
>> +				sib = *(next_byte++);
>> +				insn->sib.value = sib;
>> +				insn->sib.nbytes = 1;
>> +				insn->sib.got = 1;
>> +				if ((sib & 7) == 5 && mod == 0)
>> +					next_byte += 4;
>> +			}
>> +			if (mod == 1)
>> +				next_byte++;
>> +			else if (mod == 2)
>> +				next_byte += 4;
>> +			else if (rm == 5)
>> +				next_byte += 4;
>> +		}
>> +	} else if (insn->opcode.nbytes == 1)
>> +		if (0xa0 <= insn->opcode1 && insn->opcode1 < 0xa4)
> 
> Add comment:
> 			/* Displacement = entire address - up to 8 bytes */
> 
>> +			next_byte += insn->ad_bytes;
>> +decode_src:
> 
> decode_src is a misnomer.  Here we're decoding the immediate operand
> (which is always a source operand, but not the only kind).
> 
>> +	if (insn->opcode.nbytes == 1) {
>> +		switch (insn->opcode1) {
>> +		case 0x05:
>> +		case 0x25:
> 
> What about (hex) 15, 35, 01, 0d, 2d?
> 
>> +		case 0x3d:
>> +		case 0x68: // pushl
>> +		case 0x69: // imul
>> +		case 0x9a: /* long call */
> 
> 0x9a (lcall) seems to have 6 bytes following the opcode, disassembled as
> 2 immediate operands.
> 
>> +		case 0xa9: // test
>> +		case 0xc7:
>> +		case 0xe8:
>> +		case 0xe9:
>> +		case 0xea: /* long jump */
> 
> Similarly, 0xea (ljmp) seems to have 6 bytes following the opcode,
> disassembled as 2 immediate operands.
> 
>> +		case 0x82: /* Group */
> 
> s/82/81/ here.
> 
>> +			goto imm_common;
>> +		case 0x04:
>> +		case 0x24:
> 
> What about (hex) 14, 34, 0c, 1c, 2c?
> 
>> +		case 0x3c:
>> +		case 0x6a: //pushb
>> +		case 0x6b: //imul
>> +		case 0xa8: //testb
>> +		case 0xeb:
>> +		case 0xc0:
>> +		case 0xc1:
>> +		case 0xc6:
>> +		case 0x80: /* Group */
>> +		case 0x81: /* Group */
> 
> s/81/82/ here.
> 
>> +		case 0x83: /* Group */
>> +			goto immbyte_common;
>> +		}
>> +		if ((insn->opcode1 & 0xf8) == 0xb8 ||
> 
> I don't think this is right.  b8-bf can have 8-byte immediate fields
> (with 0x48 prefix).
> 
>> +		    (insn->opcode1 == 0xf7 && reg == 0
> 
> or reg == 1
> 
>> ) ) {
>> +imm_common:
> 
> Jumping into the middle of an if block is ugly, and not necessary here.
> 
>> +			next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes;
>> +		} else if ((insn->opcode1 & 0xf8) == 0xb0 || //
>> +			   (insn->opcode1 & 0xf0) == 0x70 || // Jcc
>> +			   (insn->opcode1 & 0xf8) == 0xe0 || // loop/in/out
>> +			    (insn->opcode1 == 0xf6 && reg == 0)) {
>> +immbyte_common:
> 
> Jumping into the middle of an if block is ugly, and not necessary here.
> 
>> +			next_byte++;
>> +		}
> 
> 0xc8 and 0xcd are weird cases that we should handle .
> 
>> +	} else {
>> +		switch (insn->opcode2) {
> 
> Add 0x70.
> 
>> +		case 0xa4:
>> +		case 0xac:
>> +		case 0xba:
>> +		case 0x0f: // 3dnow
>> +		case 0x3a: // ssse3
>> +			next_byte++;
>> +			break;
>> +		default:
>> +			if ((insn->opcode2 & 0xf0) == 0x80)
>> +				next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes;
>> +		}
>> +	}
>> +	insn->length = (u8)(next_byte - insn->kaddr);
>> +}
>> +EXPORT_SYMBOL_GPL(insn_get_length);
>>
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com


From roland at redhat.com  Thu Mar  5 20:10:12 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu,  5 Mar 2009 12:10:12 -0800 (PST)
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: Ananth N Mavinakayanahalli's message of  Tuesday,
	3 March 2009 13:21:29 +0530 <20090303075129.GD22517@in.ibm.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
	<20090303075129.GD22517@in.ibm.com>
Message-ID: <20090305201012.A581DFC3BF@magilla.sf.frob.com>

> There is at least one change from the earlier behaviour -- rather than
> utrace_attach_task() retrying by itself on a !parent attach, -EAGAIN is
> returned to the user. That may need changes to the utrace client side.

Oops, that was not intentional.  I've restored the old behavior.

> I've just started with implementing a non-disruptive application core
> dump. Its probably too early to commit, but it could also be a potential
> in-kernel user of utrace. I've just started with quiescing all threads
> but need to plug-in the core generating infrastructure for it. I am looking at
> the possibility of tweaking do_coredump() to reuse it for this while the
> workhorse can just be the binfmt->core_dump() itself. Its still in the
> early prototype stage -- I'll post that when there is something more
> concrete. Ideas/suggestions welcome!

Oh yeah.  I almost started on one of those a while back, and I have
certainly put a lot of thought into the subject that we can discuss later.
It is a bit of a can of worms in that the right long-run way to approach it
will involve a bunch of refactoring.  (That's why I haven't suggested it as
a quick, clean, and self-contained demo of things utrace can do, like
Frank's ftrace widget patch is.  I also just hadn't thought about it in a
while.)  Please start a proper thread about that.


Thanks,
Roland


From roland at redhat.com  Thu Mar  5 20:27:08 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu,  5 Mar 2009 12:27:08 -0800 (PST)
Subject: Q: utrace_attach_task && utrace_release_task
In-Reply-To: Oleg Nesterov's message of  Wednesday,
	4 March 2009 22:27:35 +0100 <20090304212735.GA21703@redhat.com>
References: <20090303200907.GA19207@redhat.com>
	<20090303230838.476AEFC3C9@magilla.sf.frob.com>
	<20090304212735.GA21703@redhat.com>
Message-ID: <20090305202708.39216FC3BF@magilla.sf.frob.com>

> utrace_attach_task() checks ->exit_state == EXIT_DEAD. Why? I mean,
> how can it help, we don't hold any locks, target can change its
> ->exit_state right after the check.

Good catch, thanks.  This is a remnant of the utrace-indirect code,
where utrace_first_engine() had an interlock with reap/release_task.
(It's one of the several ways that arrangement is superior IMNSHO.)

> I don't understand why utrace_release_task() doesn't set ->reap = 1
> unconditionally. In that case we could use this flag instead of
> EXIT_DEAD to verify it is "safe" to attach or get_utrace_lock().

That's what I've made it do now.  In the utrace-indirect setup,
it was possible to avoid locks for the common case (nobody attached).

> static inline int utrace_attach_delay(struct task_struct *target)
[...]

This is the same thing Ananth noticed.  It was an unintended holdover from
the utrace-indirect code organization.  It's fixed now.


Thanks,
Roland


From oleg at redhat.com  Thu Mar  5 21:02:01 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 5 Mar 2009 22:02:01 +0100
Subject: Q: utrace_attach_task && utrace_release_task
In-Reply-To: <20090305202708.39216FC3BF@magilla.sf.frob.com>
References: <20090303200907.GA19207@redhat.com>
	<20090303230838.476AEFC3C9@magilla.sf.frob.com>
	<20090304212735.GA21703@redhat.com>
	<20090305202708.39216FC3BF@magilla.sf.frob.com>
Message-ID: <20090305210201.GA18181@redhat.com>

On 03/05, Roland McGrath wrote:
>
> > utrace_attach_task() checks ->exit_state == EXIT_DEAD. Why? I mean,
> > how can it help, we don't hold any locks, target can change its
> > ->exit_state right after the check.
>
> Good catch, thanks.  This is a remnant of the utrace-indirect code,
> where utrace_first_engine() had an interlock with reap/release_task.
> (It's one of the several ways that arrangement is superior IMNSHO.)
>
> > I don't understand why utrace_release_task() doesn't set ->reap = 1
> > unconditionally. In that case we could use this flag instead of
> > EXIT_DEAD to verify it is "safe" to attach or get_utrace_lock().
>
> That's what I've made it do now.  In the utrace-indirect setup,
> it was possible to avoid locks for the common case (nobody attached).

Aha, I see the new patches...

what about get_utrace_lock() ? Do we really need the EXI_DEAD check?
And this check looks "racy" too.

> > static inline int utrace_attach_delay(struct task_struct *target)
> [...]
>
> This is the same thing Ananth noticed.  It was an unintended holdover from
> the utrace-indirect code organization.  It's fixed now.

Great, but

	utrace_attach_delay:

		if (signal_pending(current))
			return -ERESTARTNOINTR;

If utrace_attach_delay() fails, utrace_attach_task() returns this error.
This is right, but for example, prepare_ptrace_attach() will convert it
to EPERM?

Oleg.


From roland at redhat.com  Thu Mar  5 21:52:46 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu,  5 Mar 2009 13:52:46 -0800 (PST)
Subject: Q: utrace_attach_task && utrace_release_task
In-Reply-To: Oleg Nesterov's message of  Thursday,
	5 March 2009 22:02:01 +0100 <20090305210201.GA18181@redhat.com>
References: <20090303200907.GA19207@redhat.com>
	<20090303230838.476AEFC3C9@magilla.sf.frob.com>
	<20090304212735.GA21703@redhat.com>
	<20090305202708.39216FC3BF@magilla.sf.frob.com>
	<20090305210201.GA18181@redhat.com>
Message-ID: <20090305215246.21006FC3BF@magilla.sf.frob.com>

> what about get_utrace_lock() ? Do we really need the EXI_DEAD check?
> And this check looks "racy" too.

It is not strictly necessary any more, no.  It now serves as an early
unsynchronized check before taking the utrace lock, rather than as a
reliable interlock.  The same is now true of the check at the top of
utrace_attach_task.  I'm not inclined to remove them.  They don't hurt now,
and we'll need them back later to reimplement indirect struct utrace.

> If utrace_attach_delay() fails, utrace_attach_task() returns this error.
> This is right, but for example, prepare_ptrace_attach() will convert it
> to EPERM?

Good catch.  But note that we are not really trying to review the
utrace-ptrace branch right now.


Thanks,
Roland


From jbaron at redhat.com  Thu Mar  5 21:58:38 2009
From: jbaron at redhat.com (Jason Baron)
Date: Thu, 5 Mar 2009 21:58:38 +0000 (UTC)
Subject: [PATCH] Embed struct utrace in task_struct - V2
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
Message-ID: <loom.20090305T215132-467@post.gmane.org>

Roland McGrath <roland <at> redhat.com> writes:

> 
> Hi, Ananth.  Sorry everything has slid so long (again).
> (I have far too many hats and the past month not so many brains!)
> 
> Here is my immediate agenda for utrace hacking:
> 
> * I have incorporated the "embed struct utrace" changes.
> 
>   I did various small bits of reorganization and cosmetic cleanup
>   first to make the actual data structure change a smaller patch.
>   Since things had changed around, I didn't actually use your patch.
>   I just did it over myself, but I think it's nearly the same.
> 
>   After this change, we now need some fresh testing of things like Frank's
>   ftrace widget and stap's utrace-using modes.  (Nothing should have
>   changed from the utrace API perspective.)
> 
>   I've created the new branch "utrace-indirect" with a revert of the
>   change.  I think this is really the better way to organize the data
>   structures, as I've mentioned before.  After we get an initial utrace
>   merged in upstream, I intend to revive this branch and turn it into an
>   incremental patch to (re-)improve the data structures later on.  That's
>   for later; for the time being, the branch will just sit idle.
> 
> * I've renamed "struct utrace_attached_engine" to "struct utrace_engine".
>   This was a cosmetic suggestion in an earlier LKML review, and I could not
>   really find any good reason to keep the longer name.  We all seem to say
>   "a utrace engine" in conversation when talking about this object.
> 
>   I added the UTRACE_API_VERSION macro to ease existing utrace-using code
>   adapting to old/new names.
> 
> * I'll shortly scour the old review comments for more cosmetic things we
>   might change.
> 
> * I would like to have a final "in-team" top-to-bottom review of the main
>   utrace patch before sending to LKML.  i.e. maybe by you, Frank, me, and Oleg.
>   Each pair of eyeballs should:  
>   * make sure all barriers and other kinds of magic have adequate comments
>     explaining why they are there and why they are correct
>   * cite anything else that sticks out and maybe needs more comments
>   * make sure all comments are accurate and understandable
> 

hi,

i've been looking at the patch at the utrace.patch at: 

http://people.redhat.com/roland/utrace/2.6-current/

hopefully, that's the latest one.


Anyways, i'm still looking it over, but one thing that sticks out for me along
these lines are the memory barriers and usage of utrace->reporting. It seems
that this field is being used exclude utrace_control when we are in the middle
of a callback. however, there aren't any comments about the memory barriers and
logic here, so its hard for me to tell if its correct...

thanks,

-Jason


From roland at redhat.com  Thu Mar  5 22:09:22 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu,  5 Mar 2009 14:09:22 -0800 (PST)
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: Jason Baron's message of  Thursday, 5 March 2009 21:58:38 +0000
	<loom.20090305T215132-467@post.gmane.org>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
	<loom.20090305T215132-467@post.gmane.org>
Message-ID: <20090305220922.F2C03FC3BF@magilla.sf.frob.com>

> i've been looking at the patch at the utrace.patch at: 
> 
> http://people.redhat.com/roland/utrace/2.6-current/
> 
> hopefully, that's the latest one.

Yes, it's updated frequently.  The .id files tell you what git commit the
patch corresponds to, so we can be mutually clear in making references.
0ef2243a is the utrace branch head at the moment.

> Anyways, i'm still looking it over, but one thing that sticks out for me along
> these lines are the memory barriers and usage of utrace->reporting. It seems
> that this field is being used exclude utrace_control when we are in the middle
> of a callback. however, there aren't any comments about the memory barriers and
> logic here, so its hard for me to tell if its correct...

For some reason I felt sure I'd put some comments about that in a long time ago.
But indeed I see they are not there.  I'll write some up.  This is exactly
why I need you all doing this review!


Thanks very much,
Roland


From mhiramat at redhat.com  Thu Mar  5 23:01:12 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Thu, 05 Mar 2009 18:01:12 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <49AF3480.1040804@redhat.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>	
	<498CA248.2090708@redhat.com>	
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>	
	<4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com>	
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>	
	<49A85902.8000306@redhat.com>
	<1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>
	<49AF3480.1040804@redhat.com>
Message-ID: <49B059B8.8090702@redhat.com>

Hi Jim and Sriker,

Here, I almost rewrote my patch.

Changelog:
- rewrite decoding logic based on Intel' manual.
- supoort insn_get_sib(),insn_get_displacement()
  and insn_get_immediate() too.
- support 3 bytes opcode and 64bit immediate.
- introduce some bitmaps.

Thank you,

Masami Hiramatsu wrote:
> Hi Jim,
> 
> Jim Keniston wrote:
>> On Fri, 2009-02-27 at 16:20 -0500, Masami Hiramatsu wrote:
>> ...
>>> Here are a patch against your code and an example code for
>>> instruction length decoder.
>>> Curiously, KVM's instruction decoder does not completely
>>> cover all instructions(especially, Jcc/test...).
>>> I had to refer Intel manuals.
>>>
>>> Moreover, even with this patch, the decoder is incomplete.
>>> - this doesn't cover 3bytes opcode yet.
>>> - this doesn't decode sib, displacement and immediate.
>>> - might have some bugs :-(
>>>
>>>
>>> Thank you,
>> Thanks for your work on this.  Comments below.
> 
> Thank you very much for review!
> 
> Actually, that code was based on KVM code, so I also found many
> opcodes were not supported.
> 
>> As I mentioned in private email, you or I should probably refactor this
>> into:
>> - insn_get_sib()
>> - insn_get_displacement()
>> - insn_get_immediate()
>> - insn_get_length()
> 
> Agreed, these should be supported.
> 
> I also would like to change struct insn as below;
> 
> struct insn {
>         struct insn_field prefixes;     /* prefixes.value is a bitmap */
>         struct insn_field opcode;       /* opcode.bytes[n] == opcode_n */
>         struct insn_field modrm;
>         struct insn_field sib;
>         struct insn_field displacement;
>         union {
>                 struct insn_field immediate;
>                 struct insn_field moffset1;     /* for 64bit MOV */
>                 struct insn_field immediate1;   /* for 64bit imm or off16/32 */
>         };
>         union {
>                 struct insn_field moffset2;     /* for 64bit MOV */
>                 struct insn_field immediate2;   /* for 64bit imm or seg16 */
>         };
> 
>         u8 opnd_bytes;
>         u8 addr_bytes;
>         u8 length;
>         bool x86_64;
> 
>         const u8 *kaddr;        /* kernel address of insn (copy) to analyze */
>         const u8 *next_byte;
> };
> 
> opcode2 and opcode3 will be stored in opcode.value with opcode1.
> 
> Now, I'm updating my code. Would anyone also be working on it?
> 
> Thank you,
> 
>> Jim
>>
>>> plain text document attachment (insn_x86.patch)
>>> Index: insn_x86.h
>>> ===================================================================
>>> --- insn_x86.h	(revision 1510)
>>> +++ insn_x86.h	(working copy)
>>> @@ -66,6 +66,10 @@
>>>  	struct insn_field displacement;
>>>  	struct insn_field immediate;
>>>
>>> +	u8 op_bytes;
>> I'd probably use opnd_bytes and addr_bytes here, for clarity.  (When I
>> first saw "op", I thought "opcode".)  Also, we should clarify that these
>> are the EFFECTIVE lengths, not the lengths of the immediate and
>> displacement fields in the instruction.
>>
>>> +	u8 ad_bytes;
>>> +	u8 length;
>>> +
>>>  	const u8 *kaddr;	/* kernel address of insn (copy) to analyze */
>>>  	const u8 *next_byte;
>>>  	bool x86_64;
>>> @@ -75,6 +79,7 @@
>>>  extern void insn_get_prefixes(struct insn *insn);
>>>  extern void insn_get_opcode(struct insn *insn);
>>>  extern void insn_get_modrm(struct insn *insn);
>>> +extern void insn_get_length(struct insn *insn);
>>>
>>>  #ifdef CONFIG_X86_64
>>>  extern bool insn_rip_relative(struct insn *insn);
>>> Index: insn_x86.c
>>> ===================================================================
>>> --- insn_x86.c	(revision 1510)
>>> +++ insn_x86.c	(working copy)
>>> @@ -17,7 +17,7 @@
>>>   *
>>>   * Copyright (C) IBM Corporation, 2002, 2004, 2009
>>>   */
>>> -
>>> +#include <linux/module.h>
>>>  #include <linux/string.h>
>>>  // #include <asm/insn.h>
>>>  #include "insn_x86.h"
>>> @@ -34,6 +34,11 @@
>>>  	insn->kaddr = kaddr;
>>>  	insn->next_byte = kaddr;
>>>  	insn->x86_64 = x86_64;
>>> +	insn->op_bytes = 4;
>>> +	if (x86_64)
>>> +		insn->ad_bytes = 8;
>>> +	else
>>> +		insn->ad_bytes = 4;
>>>  }
>>>  EXPORT_SYMBOL_GPL(insn_init);
>>>
>>> @@ -79,10 +84,51 @@
>>>  			break;
>>>  		prefixes->value |= pfx;
>>>  	}
>>> +	if (prefixes->value & X86_PFX_OPNDSZ) {
>>> +		/* oprand size switches 2/4 */
>>> +		insn->op_bytes ^= 6;
>>> +	}
>>> +	if (prefixes->value & X86_PFX_ADDRSZ) {
>>> +		/* address size switches 2/4 or 4/8 */
>>> +#ifdef CONFIG_X86_64
>>> +		if (insn->x86_64)
>>> +			insn->op_bytes ^= 12;
>>> +		else
>>> +#endif
>>> +			insn->op_bytes ^= 6;
>> This seems wrong.  You're checking the address-size prefix, but
>> adjusting the operand size.
>>
>>> +	}
>>> +#ifdef CONFIG_X86_64
>>> +	if (prefixes->value & X86_PFX_REXW)
>>> +		insn->op_bytes = 8;
>>> +#endif
>>>  	prefixes->got = true;
>>>  }
>>>  EXPORT_SYMBOL_GPL(insn_get_prefixes);
>>>
>>> +static bool __insn_is_stack(struct insn *insn)
>> It's not entirely clear to me what this function checks.  (A more
>> precise name might help.)  You have pushes, pops, and calls here, but
>> you also have some instructions that don't appear to affect the stack at
>> all.  And other push and pop instructions are missing.
>>
>>> +{
>>> +	u8 reg;
>>> +	if (insn->opcode.nbytes == 2)
>>> +		return 0;
>> The following are 2-byte pushes or pops: 0f-a0, 0f-a1, 0f-a8, and 0f-a9.
>>
>> Also, since the return value is bool, I'd prefer to see true/false
>> rather than 1/0.
>>
>>> +
>>> +	switch(insn->opcode1) {
>>> +	case 0x68:
>>> +	case 0x6a:
>>> +	case 0x9c:
>>> +	case 0x9d:
>>> +	case 0xc5:
>> 0xc5 = lds.  Why lds?
>>
>> In general, it'd be nice to add a comment showing the mnemonic next to
>> each hex value -- e.g.,
>> 	case 0x68: /* push */
>>
>>> +	case 0xe8:
>>> +		return 1;
>>> +	}
>> Other related instructions: 9a, 1f, 07, 17, 8f.
>>
>>> +	reg = ((*insn->next_byte) >> 3) & 7;
>>> +	if ((insn->opcode1 & 0xf0) == 0x50 ||
>>> +	    (insn->opcode1 == 0x1a && reg == 0) ||
>> The above line doesn't seem right.  It catches things like
>> sbb (%rax),%al .
>>
>>> +	    (insn->opcode1 == 0xff && (reg & 1) == 0 && reg != 0)) {
>> Looks like the interesting reg values are 2 (call), 3 (call), and 6
>> (push).
>>
>>> +		return 1;
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>>  /**
>>>   * insn_get_opcode - collect opcode(s)
>>>   * @insn:	&struct insn containing instruction
>>> @@ -108,6 +154,8 @@
>>>  		opcode->nbytes = 1;
>>>  	opcode->value = insn->opcode1;
>>>  	opcode->got = true;
>>> +	if (insn->x86_64 && __insn_is_stack(insn))
>>> +		insn->op_bytes = 8;
>>>  }
>>>  EXPORT_SYMBOL_GPL(insn_get_opcode);
>>>
>>> @@ -208,3 +256,115 @@
>>>  }
>>>  EXPORT_SYMBOL_GPL(insn_rip_relative);
>>>  #endif
>>> +
>>> +/**
>>> + *
>>> + * insn_get_length() - Get the length of instruction
>>> + * @insn:	&struct insn containing instruction
>>> + *
>>> + * If necessary, first collects the instruction up to and including the
>>> + * ModRM byte.
>>> + */
>> As I mentioned in private email, you or I should probably refactor this
>> into:
>> - insn_get_sib()
>> - insn_get_displacement()
>> - insn_get_immediate()
>> - insn_get_length()
>>
>> BTW, I'm going to have to change my definition of insn_field to
>> accommodate the 8-byte fields that can be found in instructions like
>> a0-a3 (8-byte displacement) and b8-bf (8-byte immediate).
>>
>>> +void insn_get_length(struct insn *insn)
>>> +{
>>> +	u8 modrm;
>>> +	u8 mod = 0, reg = 0, rm = 0, sib;
>>> +	const u8 *next_byte;
>>> +	if (insn->length)
>>> +		return;
>>> +	if (!insn->modrm.got)
>>> +		insn_get_modrm(insn);
>>> +	next_byte = insn->next_byte;
>> This of course assumes that no fields have been fetched beyond the modrm
>> field.
>>
>>> +
>>> +	if (insn->modrm.nbytes) {
>>> +		modrm = insn->modrm.value;
>>> +		mod = (modrm & 0xc0) >> 6;
>>> +		reg = (modrm & 0x38) >> 3;
>>> +		rm = (modrm & 0x07);
>> Some comments here would really help -- e.g...
>> /*
>> Interpreting the modrm byte:
>> mod = 00 - no displacement fields (exceptions below)
>> mod = 01 - 1-byte displacement field
>> mod = 10 - displacement field is 4 bytes, or 2 bytes if
>> 	address size = 2 (0x67 prefix in 32-bit mode)
>> mod = 11 - no memory operand
>>
>> If address size = 2...
>> mod = 00, r/m = 110 - displacement field is 2 bytes
>>
>> If address size != 2...
>> mod != 11, r/m = 100 - SIB byte exists
>> mod = 00, SIB base field = 101 - displacement field is 4 bytes
>> mod = 00, r/m = 101 - rip-relative addressing, displacement
>> 	field is 4 bytes
>> */
>>
>>> +		if (mod == 3)
>>> +			goto decode_src;
>>> +		if (insn->ad_bytes == 2) {
>>> +			if (mod == 1)
>>> +				next_byte++;
>>> +			else if (mod == 2)
>>> +				next_byte += 2;
>>> +			else if (rm == 6)
>>> +				next_byte += 2;
>>> +		} else {
>>> +			if (rm == 4) {
>>> +				sib = *(next_byte++);
>>> +				insn->sib.value = sib;
>>> +				insn->sib.nbytes = 1;
>>> +				insn->sib.got = 1;
>>> +				if ((sib & 7) == 5 && mod == 0)
>>> +					next_byte += 4;
>>> +			}
>>> +			if (mod == 1)
>>> +				next_byte++;
>>> +			else if (mod == 2)
>>> +				next_byte += 4;
>>> +			else if (rm == 5)
>>> +				next_byte += 4;
>>> +		}
>>> +	} else if (insn->opcode.nbytes == 1)
>>> +		if (0xa0 <= insn->opcode1 && insn->opcode1 < 0xa4)
>> Add comment:
>> 			/* Displacement = entire address - up to 8 bytes */
>>
>>> +			next_byte += insn->ad_bytes;
>>> +decode_src:
>> decode_src is a misnomer.  Here we're decoding the immediate operand
>> (which is always a source operand, but not the only kind).
>>
>>> +	if (insn->opcode.nbytes == 1) {
>>> +		switch (insn->opcode1) {
>>> +		case 0x05:
>>> +		case 0x25:
>> What about (hex) 15, 35, 01, 0d, 2d?
>>
>>> +		case 0x3d:
>>> +		case 0x68: // pushl
>>> +		case 0x69: // imul
>>> +		case 0x9a: /* long call */
>> 0x9a (lcall) seems to have 6 bytes following the opcode, disassembled as
>> 2 immediate operands.
>>
>>> +		case 0xa9: // test
>>> +		case 0xc7:
>>> +		case 0xe8:
>>> +		case 0xe9:
>>> +		case 0xea: /* long jump */
>> Similarly, 0xea (ljmp) seems to have 6 bytes following the opcode,
>> disassembled as 2 immediate operands.
>>
>>> +		case 0x82: /* Group */
>> s/82/81/ here.
>>
>>> +			goto imm_common;
>>> +		case 0x04:
>>> +		case 0x24:
>> What about (hex) 14, 34, 0c, 1c, 2c?
>>
>>> +		case 0x3c:
>>> +		case 0x6a: //pushb
>>> +		case 0x6b: //imul
>>> +		case 0xa8: //testb
>>> +		case 0xeb:
>>> +		case 0xc0:
>>> +		case 0xc1:
>>> +		case 0xc6:
>>> +		case 0x80: /* Group */
>>> +		case 0x81: /* Group */
>> s/81/82/ here.
>>
>>> +		case 0x83: /* Group */
>>> +			goto immbyte_common;
>>> +		}
>>> +		if ((insn->opcode1 & 0xf8) == 0xb8 ||
>> I don't think this is right.  b8-bf can have 8-byte immediate fields
>> (with 0x48 prefix).
>>
>>> +		    (insn->opcode1 == 0xf7 && reg == 0
>> or reg == 1
>>
>>> ) ) {
>>> +imm_common:
>> Jumping into the middle of an if block is ugly, and not necessary here.
>>
>>> +			next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes;
>>> +		} else if ((insn->opcode1 & 0xf8) == 0xb0 || //
>>> +			   (insn->opcode1 & 0xf0) == 0x70 || // Jcc
>>> +			   (insn->opcode1 & 0xf8) == 0xe0 || // loop/in/out
>>> +			    (insn->opcode1 == 0xf6 && reg == 0)) {
>>> +immbyte_common:
>> Jumping into the middle of an if block is ugly, and not necessary here.
>>
>>> +			next_byte++;
>>> +		}
>> 0xc8 and 0xcd are weird cases that we should handle .
>>
>>> +	} else {
>>> +		switch (insn->opcode2) {
>> Add 0x70.
>>
>>> +		case 0xa4:
>>> +		case 0xac:
>>> +		case 0xba:
>>> +		case 0x0f: // 3dnow
>>> +		case 0x3a: // ssse3
>>> +			next_byte++;
>>> +			break;
>>> +		default:
>>> +			if ((insn->opcode2 & 0xf0) == 0x80)
>>> +				next_byte += (insn->op_bytes == 8) ? 4 : insn->op_bytes;
>>> +		}
>>> +	}
>>> +	insn->length = (u8)(next_byte - insn->kaddr);
>>> +}
>>> +EXPORT_SYMBOL_GPL(insn_get_length);
>>>
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: insn_x86.patch
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090305/76e70f46/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: insndec.c
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090305/76e70f46/attachment.c>

From renzo at cs.unibo.it  Fri Mar  6 10:35:44 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Fri, 6 Mar 2009 11:35:44 +0100
Subject: [PATCH] UTRACE_STOP race condition?
In-Reply-To: <20090213202925.GE28685@cs.unibo.it>
References: <20090211095946.GA2597@cs.unibo.it>
	<20090213202925.GE28685@cs.unibo.it>
Message-ID: <20090306103544.GH28098@cs.unibo.it>

Dear Roland, dear utrace developers,

I have updated my patch #1 (it solves the race condition on utrace_stop but 
not the nesting issue) for the latest version of utrace.

renzo

On Fri, Feb 13, 2009 at 09:29:25PM +0100, Renzo Davoli wrote:
> I have now a complete patch that seems to be quite stable.
> At least Kmview have passed through the tests without getting stuck randomly for the race condition.
> 
---
--- kernel/utrace.c.mcgrath	2009-03-05 15:09:57.000000000 +0100
+++ kernel/utrace.c	2009-03-06 11:20:48.000000000 +0100
@@ -369,6 +369,13 @@
 	return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -378,6 +385,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
 	bool killed;
+	struct utrace_engine *engine, *next;
 
 	/*
 	 * @utrace->stopped is the flag that says we are safely
@@ -399,7 +407,23 @@
 		return true;
 	}
 
-	utrace->stopped = 1;
+	/* final check: it is really needed to stop? */
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
+			else
+				utrace->stopped = 1;
+		}
+	}
+	if (unlikely(!utrace->stopped)) {
+		spin_unlock_irq(&task->sighand->siglock);
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
 	__set_current_state(TASK_TRACED);
 
 	/*
@@ -625,6 +649,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP		(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME		(1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_engine *engine)
 {
@@ -641,6 +666,21 @@
 	return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_engine *engine)
+{
+	engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_engine *engine)
+{
+	engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_engine *engine)
+{
+	return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:		thread to affect
@@ -891,6 +931,10 @@
 			list_move(&engine->entry, &detached);
 		} else {
 			flags |= engine->flags | UTRACE_EVENT(REAP);
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
 			wake = wake && !engine_wants_stop(engine);
 		}
 	}
@@ -1110,6 +1154,7 @@
 		 * There might not be another report before it just
 		 * resumes, so make sure single-step is not left set.
 		 */
+		mark_engine_wants_resume(engine);
 		if (likely(resume))
 			user_disable_single_step(target);
 		break;


From renzo at cs.unibo.it  Fri Mar  6 11:03:31 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Fri, 6 Mar 2009 12:03:31 +0100
Subject: [PATCH] #2 UTRACE_STOP race condition & nesting
In-Reply-To: <20090214091155.GA3582@cs.unibo.it>
References: <20090211095946.GA2597@cs.unibo.it>
	<20090213202925.GE28685@cs.unibo.it>
	<20090214091155.GA3582@cs.unibo.it>
Message-ID: <20090306110331.GI28098@cs.unibo.it>

Dear Roland, dear utrace developers,

I have update also the second patch (which includes the first).
This patch fixes the utrace_stop race condition and 
implements a consistent model of tracing engine nesting.

renzo
On Sat, Feb 14, 2009 at 10:11:55AM +0100, Renzo Davoli wrote:
>  
> This is an updated patch. It solves the race condition + it gives a quick (a bit dirty)
> solution to issues 3&4.
> 	3- Nesting, is it really useful to run all the reports in a row and
> 	(eventually) stop and the end waiting for all the engines?
> The patch waits for each engine to resume before notifying the next registered engine.
> 	4- report_syscall_entry engines evaluation order should be reversed
> REPORT macros have an extra "reverse" argument. The macros append this string to the
> list_for_each_entry_safe function name. All the macro calls skip this argument except
> the one in report_syscall_entry where it is set to _reverse.
> 
> With this patch it is possible to run nested kmview machines and ptrace works inside
> the virtual machines.
> 
> This patch is "a bit dirty" because variables and sections of code needed to count and test
> the stopped engines are useless here: a task can be kept stopped for at most one engine at
> a time.
> 
> This patch is a proof-of concept to show what I meant in my previous message.
> 
> For what concerns 1&2 (not included in this patch):
> 	1- Virtual Machines may need to change the system call
> THis is just to simplify the implementation of arch. independent virtual machine.
> I have kept the definition of missing functions in the kmview module code.
> 	2- UTRACE_SYSCALL_ABORT: is it really useful as a return value for
> 	report_syscall_entry?
> It is useless for kmview as the decision of aborting the system call is taken while
> the process is stopped, I am currently setting the syscall number to -1 to skip the syscall.
> 
> For the sake of completeness there is another way to implement the partial virtual machine
> stuff by introducing another "quiescence" state inside the report upcalls.
> I mean: when utrace calls a report function (say for example report_syscall_entry), the function
> in the module puts the process in a stopped state (maybe its TASK_TRACED and calls the schedule).
> >From utrace's point of view the report function does not return until all the changes in
> the task state have been completed and the decision UTRACE_RESUME/UTRACE_SYSCALL_ABORT has been taken.
> In this way UTRACE_STOP is never used because the module has to implement another feature
> similar to UTRACE_STOP on its own. So what is UTRACE_STOP for?
> 
> ciao
> 	renzo

---
--- kernel/utrace.c.mcgrath	2009-03-05 15:09:57.000000000 +0100
+++ kernel/utrace.c	2009-03-06 11:49:15.000000000 +0100
@@ -369,6 +369,13 @@
 	return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -378,6 +385,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
 	bool killed;
+	struct utrace_engine *engine, *next;
 
 	/*
 	 * @utrace->stopped is the flag that says we are safely
@@ -399,7 +407,23 @@
 		return true;
 	}
 
-	utrace->stopped = 1;
+	/* final check: is really needed to stop? */
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
+			else
+				utrace->stopped = 1;
+		}
+	}
+	if (unlikely(!utrace->stopped)) {
+		spin_unlock_irq(&task->sighand->siglock);
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
 	__set_current_state(TASK_TRACED);
 
 	/*
@@ -625,6 +649,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP		(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME		(1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_engine *engine)
 {
@@ -641,6 +666,21 @@
 	return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_engine *engine)
+{
+	engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_engine *engine)
+{
+	engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_engine *engine)
+{
+	return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:		thread to affect
@@ -891,6 +931,10 @@
 			list_move(&engine->entry, &detached);
 		} else {
 			flags |= engine->flags | UTRACE_EVENT(REAP);
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
 			wake = wake && !engine_wants_stop(engine);
 		}
 	}
@@ -1110,6 +1154,7 @@
 		 * There might not be another report before it just
 		 * resumes, so make sure single-step is not left set.
 		 */
+		mark_engine_wants_resume(engine);
 		if (likely(resume))
 			user_disable_single_step(target);
 		break;
@@ -1326,6 +1371,7 @@
 static bool finish_callback(struct utrace *utrace,
 			    struct utrace_report *report,
 			    struct utrace_engine *engine,
+					struct task_struct *task,
 			    u32 ret)
 {
 	enum utrace_resume_action action = utrace_resume_action(ret);
@@ -1347,6 +1393,7 @@
 				spin_lock(&utrace->lock);
 				mark_engine_wants_stop(engine);
 				spin_unlock(&utrace->lock);
+				utrace_stop(task, utrace);
 			}
 		} else if (engine_wants_stop(engine)) {
 			spin_lock(&utrace->lock);
@@ -1401,7 +1448,7 @@
 	ops = engine->ops;
 
 	if (want & UTRACE_EVENT(QUIESCE)) {
-		if (finish_callback(utrace, report, engine,
+		if (finish_callback(utrace, report, engine, task,
 				    (*ops->report_quiesce)(report->action,
 							   engine, task,
 							   event)))
@@ -1430,25 +1477,25 @@
  * @callback is the name of the member in the ops vector, and remaining
  * args are the extras it takes after the standard three args.
  */
-#define REPORT(task, utrace, report, event, callback, ...)		      \
+#define REPORT(reverse, task, utrace, report, event, callback, ...)		      \
 	do {								      \
 		start_report(utrace);					      \
-		REPORT_CALLBACKS(task, utrace, report, event, callback,	      \
+		REPORT_CALLBACKS(reverse, task, utrace, report, event, callback,	      \
 				 (report)->action, engine, current,	      \
 				 ## __VA_ARGS__);  	   		      \
 		finish_report(report, task, utrace);			      \
 	} while (0)
-#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...)	      \
+#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...)	      \
 	do {								      \
 		struct utrace_engine *engine, *next;			      \
 		const struct utrace_engine_ops *ops;			      \
-		list_for_each_entry_safe(engine, next,			      \
+		list_for_each_entry_safe ## reverse(engine, next,			      \
 					 &utrace->attached, entry) {	      \
 			ops = start_callback(utrace, report, engine, task,    \
 					     event);			      \
 			if (!ops)					      \
 				continue;				      \
-			finish_callback(utrace, report, engine,		      \
+			finish_callback(utrace, report, engine, task,		      \
 					(*ops->callback)(__VA_ARGS__));	      \
 		}							      \
 	} while (0)
@@ -1463,7 +1510,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(EXEC),
 	       report_exec, fmt, bprm, regs);
 }
 
@@ -1478,7 +1525,7 @@
 	INIT_REPORT(report);
 
 	start_report(utrace);
-	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
+	REPORT_CALLBACKS(_reverse,task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
 			 report_syscall_entry, report.result | report.action,
 			 engine, current, regs);
 	finish_report(&report, task, utrace);
@@ -1520,7 +1567,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
 	       report_syscall_exit, regs);
 }
 
@@ -1536,7 +1583,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(CLONE),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(CLONE),
 	       report_clone, clone_flags, child);
 
 	/*
@@ -1600,7 +1647,7 @@
 	utrace->report = 0;
 	spin_unlock(&utrace->lock);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(JCTL),
 	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
 
 	if (was_stopped && !task_is_stopped(task)) {
@@ -1637,7 +1684,7 @@
 	INIT_REPORT(report);
 	long orig_code = *exit_code;
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(EXIT),
+	REPORT(,task, utrace, &report, UTRACE_EVENT(EXIT),
 	       report_exit, orig_code, exit_code);
 
 	if (report.action == UTRACE_STOP)
@@ -1676,7 +1723,7 @@
 	utrace->interrupt = 0;
 	spin_unlock(&utrace->lock);
 
-	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH),
+	REPORT_CALLBACKS(,task, utrace, &report, UTRACE_EVENT(DEATH),
 			 report_death, engine, task, group_dead, signal);
 
 	spin_lock(&utrace->lock);
@@ -2018,7 +2065,7 @@
 			break;
 		}
 
-		finish_callback(utrace, &report, engine, ret);
+		finish_callback(utrace, &report, engine, task, ret);
 	}
 
 	/*


From ananth at in.ibm.com  Fri Mar  6 15:41:34 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Fri, 6 Mar 2009 21:11:34 +0530
Subject: [BUG] utrace_attach_task() never returns when called from the
	report_clone callback
Message-ID: <20090306154134.GB15133@in.ibm.com>

Roland,

With the current utrace/master tree, I am seeing that utrace_attach_task()
never returns when invoked from the clone callback. The same module
works fine with prior utrace (rcu as well as with my embed version).

The testcase is simple:
a. attach an engine to attachstop-mt (from the gdb testsuite) _before_ it
   calls pthread_create.
b. Watch for CLONE_THREAD and try to attach a utrace engine to the new
   thread. The utrace_attach_task() call never returns.

If the utrace module is unloaded, the kernel barfs with the following
innocuous information:

BUG: unable to handle kernel paging request at fffffffffffffdff
IP: [<ffffffffa012009a>] 0xffffffffa012009a
PGD 203067 PUD 204067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/pci0000:01/0000:01:01.1/irq
CPU 6 
Modules linked in: <big list>
[last unloaded: utrace_quiesce_threads]
Pid: 6203, comm: attachstop-mt Not tainted 2.6.29-rc7-ut #1 eserver
xSeries 366-[88632RA]-
RIP: 0010:[<ffffffffa012009a>]  [<ffffffffa012009a>] 0xffffffffa012009a
RSP: 0018:ffff8801d34ebe10  EFLAGS: 00010246
RAX: fffffffffffffdff RBX: ffff8801f11a36c0 RCX: 00000000c0000100
RDX: 0000000000000000 RSI: ffff8801dd0507f8 RDI: ffff88022daf4500
RBP: 00000000fffffff4 R08: ffff8801d34ea000 R09: ffff88022f2596a0
R10: ffff8800280b1600 R11: 0000000000000018 R12: ffff8801d34f1860
R13: ffff8802210dd300 R14: ffff8801dd07e2c0 R15: 00000000003d0f00
FS:  00007f58c8d286e0(0000) GS:ffff88022f18e5c0(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: fffffffffffffdff CR3: 00000002029bd000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process attachstop-mt (pid: 6203, threadinfo ffff8801d34ea000, task
ffff8801d3512440)
Stack:
 00000000003d0f00 ffff8801d34f1860 ffff8802210dd300 ffff8801d3512440
 ffff8801d34ebe70 ffffffffa012028d ffff8801dd050618 ffff8801d35129e0
 ffff8801d35129d8 ffffffff80260480 0000000000000000 ffff8801d34f1860
Call Trace:
 [<ffffffff80260480>] ? utrace_report_clone+0x95/0xfc
 [<ffffffff80239120>] ? do_fork+0x20b/0x2f3
 [<ffffffff804a4035>] ? do_page_fault+0x3c7/0x74e
 [<ffffffff8020c243>] ? stub_clone+0x13/0x20
 [<ffffffff8020bedb>] ? system_call_fastpath+0x16/0x1b
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
RIP  [<ffffffffa012009a>] 0xffffffffa012009a
 RSP <ffff8801d34ebe10>
CR2: fffffffffffffdff
---[ end trace 96bb7eb644ab73a4 ]---

I have verified that the earlier version of utrace works just fine.

In the earlier case, the engine would go directly on to the attached
list if the calling task was the creator of the new thread. This has
changed with the new implementation.

I haven't yet zeroed in on what exact change caused this problem.

Ananth


From fche at redhat.com  Fri Mar  6 15:42:46 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Fri, 6 Mar 2009 10:42:46 -0500
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: <20090303231401.3376CFC3C9@magilla.sf.frob.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
	<y0mljrmk3qe.fsf@ton.toronto.redhat.com>
	<20090303231401.3376CFC3C9@magilla.sf.frob.com>
Message-ID: <20090306154246.GE32581@redhat.com>

Hi -

On Tue, Mar 03, 2009 at 03:14:01PM -0800, Roland McGrath wrote:
> > > * When we on the team think the utrace patch is ready to post, we need to
> > >   do a coordinated post of Frank's ftrace widget.  [...]
> > 
> > Would you consider simply merging it into your git tree / patch suite?
> 
> Sure.  The way to do that is for you to publish a git repository that I can
> pull from.  [...]

OK:


The following changes since commit 0ef2243aeae481f1c0f1edd23a8259bd20331b00:
  Roland McGrath (1):
        Merge remote branch 'upstream/HEAD' of /home/roland/redhat/linux/2.6/ into utrace

are available in the git repository at:

  http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace

Frank Ch. Eigler (1):
      utrace-based ftrace "process" engine, v2

 include/linux/processtrace.h |   41 +++
 kernel/trace/Kconfig         |    9 +
 kernel/trace/Makefile        |    1 +
 kernel/trace/trace.h         |   30 ++-
 kernel/trace/trace_process.c |  591 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 661 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/processtrace.h
 create mode 100644 kernel/trace/trace_process.c


- FChE


From roland at redhat.com  Fri Mar  6 20:49:46 2009
From: roland at redhat.com (Roland McGrath)
Date: Fri,  6 Mar 2009 12:49:46 -0800 (PST)
Subject: [PATCH] Embed struct utrace in task_struct - V2
In-Reply-To: Frank Ch. Eigler's message of  Friday,
	6 March 2009 10:42:46 -0500 <20090306154246.GE32581@redhat.com>
References: <20090119132838.GA3542@in.ibm.com>
	<20090119232031.82675FC3C6@magilla.sf.frob.com>
	<20090121062825.GD3251@in.ibm.com>
	<20090223074717.GA3340@in.ibm.com>
	<20090302120754.9A64AFC3C6@magilla.sf.frob.com>
	<y0mljrmk3qe.fsf@ton.toronto.redhat.com>
	<20090303231401.3376CFC3C9@magilla.sf.frob.com>
	<20090306154246.GE32581@redhat.com>
Message-ID: <20090306204946.38DEBFC3BF@magilla.sf.frob.com>

>   http://web.elastic.org/~fche/git/linux-2.6-utrace.git utrace-ftrace
> 
> Frank Ch. Eigler (1):
>       utrace-based ftrace "process" engine, v2

Thanks, Frank.  Your branch is now in my repo and its patch generated in
2.6-current/.  I'll pull periodically, or let me know if my repo lags
behind yours in future.


Thanks,
Roland


From roland at redhat.com  Fri Mar  6 20:52:34 2009
From: roland at redhat.com (Roland McGrath)
Date: Fri,  6 Mar 2009 12:52:34 -0800 (PST)
Subject: [BUG] utrace_attach_task() never returns when called from the
	report_clone callback
In-Reply-To: Ananth N Mavinakayanahalli's message of  Friday,
	6 March 2009 21:11:34 +0530 <20090306154134.GB15133@in.ibm.com>
References: <20090306154134.GB15133@in.ibm.com>
Message-ID: <20090306205234.0A759FC3BF@magilla.sf.frob.com>

> With the current utrace/master tree, I am seeing that utrace_attach_task()
> never returns when invoked from the clone callback. The same module
> works fine with prior utrace (rcu as well as with my embed version).

I changed the utrace_attach_delay() logic recently.  That is probably it.
Please post your test case.


Thanks,
Roland


From ananth at in.ibm.com  Sat Mar  7 01:44:50 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Sat, 7 Mar 2009 07:14:50 +0530
Subject: [BUG] utrace_attach_task() never returns when called from the
	report_clone callback
In-Reply-To: <20090306205234.0A759FC3BF@magilla.sf.frob.com>
References: <20090306154134.GB15133@in.ibm.com>
	<20090306205234.0A759FC3BF@magilla.sf.frob.com>
Message-ID: <20090307014449.GC15133@in.ibm.com>

On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote:
> > With the current utrace/master tree, I am seeing that utrace_attach_task()
> > never returns when invoked from the clone callback. The same module
> > works fine with prior utrace (rcu as well as with my embed version).
> 
> I changed the utrace_attach_delay() logic recently.  That is probably it.

Right, reverting dd30e86355e fixes the problem.

> Please post your test case.

Here it is -- does nothing much really :) I used this module in
conjunction with attachstop_mt with an engine attaching to it before the
pthread_create().

---
#include <linux/module.h>
#include <linux/utrace.h>
#include <linux/err.h>

MODULE_DESCRIPTION("Utrace tests");
MODULE_LICENSE("GPL");

static int target_pid;
module_param_named(pid, target_pid, int, 0);

/* Structure for all threads of a process having the same utrace ops */
struct proc_utrace {
	struct task_struct *tgid_task;

	/* list of task_utrace structs */
	struct list_head list;
	unsigned int num_threads;
};

struct task_utrace {
	struct list_head list;
	struct task_struct *task;
	/* TODO: Get rid of this and use MATCHING_OPS on task? */
	struct utrace_engine *engine;
};

static const struct utrace_engine_ops ut_ops;

static struct task_utrace *get_task_ut(struct task_struct *task,
		struct proc_utrace *proc_ut)
{
	struct task_utrace *task_ut, *temp;

	list_for_each_entry_safe(task_ut, temp, &proc_ut->list, list) {
		if (task_ut->task == task)
			return task_ut;
	}
	return NULL;
}

static int cleanup_proc_ut(struct proc_utrace *proc_ut)
{
	int ret = 0;
	struct task_utrace *task_ut, *temp;

	printk(KERN_INFO "Cleanup_proc_ut\n");
	if (proc_ut == NULL)
		return 0;

	if (list_empty(&proc_ut->list))
		goto out;

	/* walk proc_ut->list and free task_ut */
	list_for_each_entry_safe(task_ut, temp, &proc_ut->list, list) {
		if (task_ut->engine) {
			printk(KERN_INFO "Calling detach for %d\n",
					task_pid_nr(task_ut->task));
			ret = utrace_control(task_ut->task,
					task_ut->engine, UTRACE_DETACH);
			if (ret)
				printk(KERN_INFO "utrace_detach returned %d\n",
						ret);
			printk(KERN_INFO "Detached engine for %d\n",
					task_pid_nr(task_ut->task));
		}
		list_del(&task_ut->list);
		kfree(task_ut);
	}
out:
	kfree(proc_ut);
	return ret;
}

static int setup_task_ut(struct task_struct *t, struct proc_utrace *proc_ut)
{
	struct task_utrace *task_ut;
	int ret = 0;

	if (!t || !proc_ut)
		return -EINVAL;

	printk(KERN_INFO "setup_task_ut: attaching for task %d\n",
			task_pid_nr(t));
	task_ut = kzalloc(sizeof(*task_ut), GFP_KERNEL);
	if (!task_ut)
		return -ENOMEM;

	INIT_LIST_HEAD(&task_ut->list);
	task_ut->task = t;
	list_add_tail(&task_ut->list, &proc_ut->list);

	/*
	 * The utrace engine's *data will point to proc_ut.
	 */
	printk(KERN_INFO "Before utrace_attach_task: %d\n", task_pid_nr(t));
	task_ut->engine = utrace_attach_task(t, UTRACE_ATTACH_CREATE,
			&ut_ops, proc_ut);
	printk(KERN_INFO "After utrace_attach_task: %d, engine = %p\n",
			task_pid_nr(t), task_ut->engine);
	if (IS_ERR(task_ut->engine)) {
		printk(KERN_ERR "utrace_attach_task returned %d\n",
				(int)PTR_ERR(task_ut->engine));
		task_ut->engine = NULL;
		ret = -ESRCH;
		goto out;
	}
	printk(KERN_INFO "utrace_attach_task: SUCCESS! - engine = %p\n",
				task_ut->engine);
	if (utrace_set_events(t, task_ut->engine,
			UTRACE_EVENT(QUIESCE) | UTRACE_EVENT(CLONE) |
			UTRACE_EVENT(EXIT))) {
		ret = -ESRCH;
	}
	proc_ut->num_threads++;
out:
	return ret;
}

static u32 ut_quiesce(enum utrace_resume_action action,
		struct utrace_engine *engine,
		struct task_struct *task, unsigned long event)
{
	printk(KERN_INFO "In quiesce callback: tid = %d\n", task_pid_nr(task));
	return UTRACE_RESUME;
}

/* clone handler -- handle thread spawns and forks */
static u32 ut_clone(enum utrace_resume_action action,
		struct utrace_engine *engine,
		struct task_struct *parent, unsigned long clone_flags,
		struct task_struct *child)
{
	struct proc_utrace *proc_ut = (struct proc_utrace *)engine->data;

	printk(KERN_INFO "In clone callback: parent = %d, child = %d\n",
			task_pid_nr(parent), task_pid_nr(child));
	if (clone_flags & CLONE_THREAD) {
		/* New thread in the same process */
		printk(KERN_INFO "New thread - tid = %d\n", task_pid_nr(child));
		if (setup_task_ut(child, proc_ut)) {
			printk(KERN_INFO "ut_clone - calling cleanup_proc_ut\n");
			cleanup_proc_ut(proc_ut);
			goto out;
		}
	}
out:
	return UTRACE_RESUME;
}

static u32 ut_exit(enum utrace_resume_action action,
		struct utrace_engine *engine, struct task_struct *task,
		long orig_code, long *code)
{
	struct task_utrace *task_ut;
	struct proc_utrace *proc_ut = (struct proc_utrace *)engine->data;

	printk(KERN_INFO "In exit callback: tid = %d\n", task_pid_nr(task));
	/* One task dying */
	task_ut = get_task_ut(task, proc_ut);
	if (task_ut) {
		proc_ut->num_threads--;
		list_del(&task_ut->list);
		kfree(task_ut);

		/* If we are the last task, cleanup! */
		if (unlikely(list_empty(&proc_ut->list))) {
			printk(KERN_INFO "ut_exit - calling cleanup_proc_ut\n");
			cleanup_proc_ut(proc_ut);
		}
	}
	printk(KERN_INFO "Detaching %d\n", task_pid_nr(task));
	return UTRACE_DETACH;
}

static const struct utrace_engine_ops ut_ops =
{
	.report_clone = ut_clone,	/* new thread */
	.report_quiesce = ut_quiesce,
	.report_exit = ut_exit,	/* thread exit */
};

/* Engine attach -- for all threads of the process */
static struct proc_utrace *attach_utrace_engines(struct pid *pid)
{
	int ret = 0;
	struct task_struct *t;
	struct proc_utrace *proc_ut;
	struct task_utrace *task_ut;
	struct utrace_engine *engine;

	if (!pid) {
		ret = -EINVAL;
		goto out;
	}

	/*
	 * We already hold a ref to the pid here
	 */
	engine = utrace_attach_pid(pid, UTRACE_ATTACH_MATCH_OPS,
			&ut_ops, 0);
	if (IS_ERR(engine)) {
		if (PTR_ERR(engine) != -ENOENT) {
			printk(KERN_INFO "Engine already attached?\n");
			goto out;
		}
	}

	proc_ut = kzalloc(sizeof(*proc_ut), GFP_KERNEL);
	if (!proc_ut)
		return ERR_PTR(-ENOMEM);
	t = proc_ut->tgid_task = pid_task(pid, PIDTYPE_PID);
	INIT_LIST_HEAD(&proc_ut->list);

	rcu_read_lock();
	do {
		ret = setup_task_ut(t, proc_ut);
		printk(KERN_INFO "setup_task_ut returned %d\n", ret);
		if (ret)
			goto err_task_ut;

		task_ut = get_task_ut(t, proc_ut);
		ret = utrace_control(t, task_ut->engine, UTRACE_STOP);
		if (ret == 0)
			printk(KERN_INFO "Task %d is quiescent\n",
					task_pid_nr(t));
		else if (ret == -EINPROGRESS)
			printk(KERN_INFO "Task %d is on its way to quiesce\n",
				task_pid_nr(t));
		else {
			printk(KERN_ERR "utrace_control returned %d\n", ret);
			goto err_task_ut;
		}

		ret = 0;
		t = next_thread(t);
	} while (t != proc_ut->tgid_task);

	rcu_read_unlock();
	return proc_ut;

err_task_ut:
	rcu_read_unlock();
	printk(KERN_INFO "attach_utrace_engines - calling cleanup_proc_ut\n");
	ret = cleanup_proc_ut(proc_ut);
out:
	return ERR_PTR(ret);
}

static int __init utrace_init(void)
{
	int ret = 0;
	struct proc_utrace *proc_ut = NULL;
	struct pid *pid;

	pid = find_get_pid(target_pid);
	if (pid == NULL) {
		printk(KERN_ERR "Cannot find PID %d\n", target_pid);
		ret = -ESRCH;
		goto out;
	}

	/* attach an engine for each thread */
	proc_ut = attach_utrace_engines(pid);
	if (IS_ERR(proc_ut)) {
		ret = (int)PTR_ERR(proc_ut);
		printk(KERN_ERR "utrace_attach_engines returned %d\n",
				ret);
		goto out;
	}

out:
	put_pid(pid);
	return ret;
}

static void __exit utrace_exit(void)
{
	int ret = 0;
	struct pid *pid;
	struct utrace_engine *engine;
	struct proc_utrace *proc_ut;

	pid = find_get_pid(target_pid);
	if (pid == NULL) {
		printk(KERN_ERR "Cannot find PID %d\n", target_pid);
		return;
	}

	printk(KERN_INFO "In module_exit for pid = %d\n", pid_vnr(pid));
	engine = utrace_attach_pid(pid, UTRACE_ATTACH_MATCH_OPS, &ut_ops, 0);
	if (IS_ERR(engine))
		printk(KERN_ERR "Can't find self: %ld\n", PTR_ERR(engine));
	else if (engine == NULL)
		printk(KERN_ERR "Can't find self: no match\n");
	else {
		printk(KERN_INFO "Trying to detach\n");
		proc_ut = (struct proc_utrace *)engine->data;
		ret = cleanup_proc_ut(proc_ut);
		if (ret)
			printk(KERN_ERR "cleanup_proc_ut returned %d\n", ret);
	}
	put_pid(pid);
}

module_init(utrace_init);
module_exit(utrace_exit);


From ananth at in.ibm.com  Sat Mar  7 02:07:02 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Sat, 7 Mar 2009 07:37:02 +0530
Subject: [BUG] utrace_attach_task() never returns when called from the
	report_clone callback
In-Reply-To: <20090306205234.0A759FC3BF@magilla.sf.frob.com>
References: <20090306154134.GB15133@in.ibm.com>
	<20090306205234.0A759FC3BF@magilla.sf.frob.com>
Message-ID: <20090307020702.GD15133@in.ibm.com>

On Fri, Mar 06, 2009 at 12:52:34PM -0800, Roland McGrath wrote:
> > With the current utrace/master tree, I am seeing that utrace_attach_task()
> > never returns when invoked from the clone callback. The same module
> > works fine with prior utrace (rcu as well as with my embed version).
> 
> I changed the utrace_attach_delay() logic recently.  That is probably it.
> Please post your test case.

The issue is that target->real_parent == current->real_parent and not
current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the
do-while.

Ananth


From jkenisto at us.ibm.com  Sat Mar  7 07:55:00 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Sat,  7 Mar 2009 02:55:00 -0500
Subject: instruction-analysis API(s)
In-Reply-To: <49B059B8.8090702@redhat.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
	<498CA248.2090708@redhat.com>
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
	<4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com>
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
	<49A85902.8000306@redhat.com>
	<1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>
	<49AF3480.1040804@redhat.com> <49B059B8.8090702@redhat.com>
Message-ID: <20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com>

Quoting Masami Hiramatsu <mhiramat at redhat.com>:

> Hi Jim and Sriker,
>
> Here, I almost rewrote my patch.
>
> Changelog:
> - rewrite decoding logic based on Intel' manual.
> - supoort insn_get_sib(),insn_get_displacement()
>   and insn_get_immediate() too.
> - support 3 bytes opcode and 64bit immediate.
> - introduce some bitmaps.
>
> Thank you,

Well, I didn't do much of a code review -- it looks like you addressed  
all my concerns -- but as I mentioned on IRC, I hacked together a test  
rig whereby you can disassemble a designated elf file (e.g., vmlinux,  
libc, libm) and then compare insn_get_length()'s results with  
objdump's results.  The comment in distill.awk shows how to use  
objdump, awk, and test_get_len together.

I also hacked up insn_x86.h and insn_x86.c to work in user space.   
Most of that is accomplished via insn_x86_user.h, but it certainly  
isn't necessary to do it that way.  In particular, __u8, __s8, __u16,  
etc. are versions of u8, s8, u16, etc. that can be used in both kernel  
and user code, so maybe we should switch to those.

I tested with vmlinux, libc, and libm on both an i686 system and an  
x86_64 system.  I found and fixed a few bugs.  Here are the ones that  
come to mind (all fixed):
- shrd/shld, which we discussed
- missing support for weird nops with modrm bytes (0f 1f ...).
- neglected to include the REX prefix in prefixes.nbytes
- missing static decl in an inline function in insn_x86.h

There are some other cases where insn_get_length() doesn't match up  
with the disassembly, but I don't consider them bugs:
- 0x9b is an instruction (fwait), but the disassembler treats it as a  
prefix.  For example 9b df ... can be disassembled as
	fstsw ...	// wait, then store status word
or
	fwait		// wait
	fnstsw ...	// store status word without waiting
Perhaps it's relevant to investigate whether a single-step of 9b df  
... would execute just the fwait or the whole fstsw.  Anyway, this  
explains the "failures" of finit and fstsw that I mentioned to you.  I  
also saw this with fstcw and fclex.
- Illegal instruction sequences, such as an x86_64 instruction that  
starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these  
when disassembling data.  I just filtered out (via egrep) instructions  
whose disassembly starts with "rex" or includes "(bad)".

We could address the above by filtering them out in distill.awk or  
test_get_len.c.  I think we're clean otherwise.

There's a little more housecleaning to do -- e.g., adding Hitachi (?)  
copyright to IBM copyright, discarding insn_field_exists() and  
insn_extract_reg(), putting this all in git somewhere.  But not tonight.

Pull all the attached files into a directory and have a go -- e.g.,
$ make
$ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64]

Jim

-------------- next part --------------
# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len
# Distills the disassembly as follows:
# - Removes all lines except the disassembled instructions.
# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes
# into a single line.

BEGIN {
	prev_addr = ""
	prev_hex = ""
	prev_mnemonic = ""
}

/^ *[0-9a-f]+:/ {
	if (split($0, field, "\t") < 3) {
		# This is a continuation of the same insn.
		prev_hex = prev_hex field[2]
	} else {
		if (prev_addr != "")
			printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic
		prev_addr = field[1]
		prev_hex = field[2]
		prev_mnemonic = field[3]
	}
}

END {
	if (prev_addr != "")
		printf "%s\t%s\t%s\n", prev_addr, prev_hex, prev_mnemonic
}
-------------- next part --------------
/*
 * x86 instruction analysis
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2002, 2004, 2009
 */
#ifdef KERNEL
#include <linux/module.h>
#include <linux/string.h>
#else
#include <string.h>
#endif
// #include <asm/insn.h>
#include "insn_x86.h"

MODULE_LICENSE("GPL"); // for test

/**
 * insn_init() - initialize struct insn
 * @insn:	&struct insn to be initialized
 * @kaddr:	address (in kernel memory) of instruction (or copy thereof)
 * @x86_64:	true for 64-bit kernel or 64-bit app
 */
void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64)
{
	memset(insn, 0, sizeof(*insn));
	insn->kaddr = kaddr;
	insn->next_byte = kaddr;
	insn->x86_64 = x86_64;
	insn->opnd_bytes = 4;
	if (x86_64)
		insn->addr_bytes = 8;
	else
		insn->addr_bytes = 4;
}
EXPORT_SYMBOL_GPL(insn_init);

/**
 * insn_get_prefixes - scan x86 instruction prefix bytes
 * @insn:	&struct insn containing instruction
 *
 * Populates the @insn->prefixes bitmap, and updates @insn->next_byte
 * to point to the (first) opcode.  No effect if @insn->prefixes.got
 * is already true.
 */
void insn_get_prefixes(struct insn *insn)
{
	u32 pfx;
	struct insn_field *prefixes = &insn->prefixes;
	if (prefixes->got)
		return;
	for (;; insn->next_byte++, prefixes->nbytes++) {
		u8 b = *(insn->next_byte);
#ifdef CONFIG_X86_64
		if ((b & 0xf0) == 0x40 && insn->x86_64) {
			prefixes->value |= X86_PFX_REX;
			prefixes->value |= (b & 0x0f) * X86_PFX_REX_BASE;
			/* REX prefix is always last. */
			insn->next_byte++;
			prefixes->nbytes++;
			break;
		}
#endif
		switch (b) {
		case 0x26:	pfx = X86_PFX_ES;	break;
		case 0x2E:	pfx = X86_PFX_CS;	break;
		case 0x36:	pfx = X86_PFX_SS;	break;
		case 0x3E:	pfx = X86_PFX_DS;	break;
		case 0x64:	pfx = X86_PFX_FS;	break;
		case 0x65:	pfx = X86_PFX_GS;	break;
		case 0x66:	pfx = X86_PFX_OPNDSZ;	break;
		case 0x67:	pfx = X86_PFX_ADDRSZ;	break;
		case 0xF0:	pfx = X86_PFX_LOCK;	break;
		case 0xF2:	pfx = X86_PFX_REPNE;	break;
		case 0xF3:	pfx = X86_PFX_REPE;	break;
		default:	pfx = 0x0;		break;
		}
		if (!pfx)
			break;
		prefixes->value |= pfx;
	}
	if (prefixes->value & X86_PFX_OPNDSZ) {
		/* oprand size switches 2/4 */
		insn->opnd_bytes ^= 6;
	}
	if (prefixes->value & X86_PFX_ADDRSZ) {
		/* address size switches 2/4 or 4/8 */
#ifdef CONFIG_X86_64
		if (insn->x86_64)
			insn->addr_bytes ^= 12;
		else
#endif
			insn->addr_bytes ^= 6;
	}
#ifdef CONFIG_X86_64
	if (prefixes->value & X86_PFX_REXW)
		insn->opnd_bytes = 8;
#endif
	prefixes->got = true;
}
EXPORT_SYMBOL_GPL(insn_get_prefixes);

/**
 * insn_get_opcode - collect opcode(s)
 * @insn:	&struct insn containing instruction
 *
 * Populates @insn->opcode1 (and @insn->opcode2, if it's a 2-byte opcode)
 * and updates @insn->next_byte to point past the opcode byte(s).
 * If necessary, first collects any preceding (prefix) bytes.
 * Sets @insn->opcode.value = opcode1.  No effect if @insn->opcode.got
 * is already true.
 */
void insn_get_opcode(struct insn *insn)
{
	struct insn_field *opcode = &insn->opcode;
	if (opcode->got)
		return;
	if (!insn->prefixes.got)
		insn_get_prefixes(insn);
	OPCODE1(insn) = *insn->next_byte++;
	if (OPCODE1(insn) == 0x0f) {
		OPCODE2(insn) = *insn->next_byte++;
		if (OPCODE2(insn) == 0x38 || OPCODE2(insn) == 0x3a) {
			OPCODE3(insn) = *insn->next_byte++;
			opcode->nbytes = 3;
		} else
			opcode->nbytes = 2;
	} else
		opcode->nbytes = 1;
	opcode->got = true;
}
EXPORT_SYMBOL_GPL(insn_get_opcode);

const u32 onebyte_has_modrm[256 / 32] = {
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
	/*      -----------------------------------------------         */
	W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 0f */
	W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 1f */
	W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 2f */
	W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 3f */
	W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */
	W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 5f */
	W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 6f */
	W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 7f */
	W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 8f */
	W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */
	W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* af */
	W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */
	W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
	W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
	W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* ef */
	W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* ff */
	/*      -----------------------------------------------         */
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
};

const u32 twobyte_has_modrm[256 / 32] = {
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
	/*      -----------------------------------------------         */
	W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
	W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 1f */
	W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
	W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
	W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
	W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
	W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
	W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
	W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
	W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
	W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
	W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
	W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
	W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
	/*      -----------------------------------------------         */
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
};

#ifdef CONFIG_X86_64
const u32 onebyte_force_64[256 / 32] = {
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
	/*      -----------------------------------------------         */
	W(0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 0f */
	W(0x10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
	W(0x20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 2f */
	W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
	W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */
	W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
	W(0x60, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0) | /* 6f */
	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 7f */
	W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) | /* 8f */
	W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */
	W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* af */
	W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */
	W(0xc0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) | /* cf */
	W(0xd0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* df */
	W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0) | /* ef */
	W(0xf0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)   /* ff */
	/*      -----------------------------------------------         */
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
};

/* force 64 or default 64 bits operand opcodes */
static bool __operand_64(struct insn *insn)
{
	u8 reg = MODRM_REG(insn);
	if (insn->opcode.nbytes == 1) {
		if (test_bit(OPCODE1(insn),
			     (const unsigned long*) onebyte_force_64) ||
		    (OPCODE1(insn) == 0xff && 
		     (reg == 2 || reg == 4 || reg == 6)))
			return true;
	}
	return false;
}
#endif

/**
 * insn_get_modrm - collect ModRM byte, if any
 * @insn:	&struct insn containing instruction
 *
 * Populates @insn->modrm and updates @insn->next_byte to point past the
 * ModRM byte, if any.  If necessary, first collects the preceding bytes
 * (prefixes and opcode(s)).  No effect if @insn->modrm.got is already true.
 */
void insn_get_modrm(struct insn *insn)
{
	struct insn_field *modrm = &insn->modrm;
	if (modrm->got)
		return;
	if (!insn->opcode.got)
		insn_get_opcode(insn);
	switch (insn->opcode.nbytes) {
	case 1:
		modrm->nbytes = test_bit(OPCODE1(insn),
				(const unsigned long*) onebyte_has_modrm);
		break;
	case 2:
		modrm->nbytes = test_bit(OPCODE2(insn),
				(const unsigned long*) twobyte_has_modrm);
		break;
	case 3:
		/* Three bytes opcodes always have modrm */
		modrm->nbytes = 1;
		break;
	}
	if (modrm->nbytes)
		modrm->value = *(insn->next_byte++);

#ifdef CONFIG_X86_64
	if (insn->x86_64 && __operand_64(insn))
		insn->opnd_bytes = 8;
#endif
	modrm->got = true;
}
EXPORT_SYMBOL_GPL(insn_get_modrm);

#ifdef CONFIG_X86_64
/**
 * insn_rip_relative() - Does instruction use RIP-relative addressing mode?
 * @insn:	&struct insn containing instruction
 *
 * If necessary, first collects the instruction up to and including the
 * ModRM byte.  No effect if @insn->x86_64 is false.
 */
bool insn_rip_relative(struct insn *insn)
{
	struct insn_field *modrm = &insn->modrm;

	if (!insn->x86_64)
		return false;
	if (!modrm->got)
		insn_get_modrm(insn);
	/*
	 * For rip-relative instructions, the mod field (top 2 bits)
	 * is zero and the r/m field (bottom 3 bits) is 0x5.
	 */
	return (insn_field_exists(modrm) && (modrm->value & 0xc7) == 0x5);
}
EXPORT_SYMBOL_GPL(insn_rip_relative);
#endif

/**
 *
 * insn_get_sib() - Get the SIB byte of instruction
 * @insn:	&struct insn containing instruction
 *
 * If necessary, first collects the instruction up to and including the
 * ModRM byte.
 */
void insn_get_sib(struct insn *insn)
{
	if (insn->sib.got)
		return;
	if (!insn->modrm.got)
		insn_get_modrm(insn);
	if (insn->modrm.nbytes)
		if (insn->addr_bytes != 2 &&
		    MODRM_MOD(insn) != 3 && MODRM_RM(insn) == 4) {
			insn->sib.value = *(insn->next_byte++);
			insn->sib.nbytes = 1;
		}
	insn->sib.got = true;
}
EXPORT_SYMBOL_GPL(insn_get_sib);

#define get_next(t, insn) \
	({t r; r = *(t *)insn->next_byte; insn->next_byte += sizeof(t); r;})

/**
 *
 * insn_get_displacement() - Get the displacement of instruction
 * @insn:	&struct insn containing instruction
 *
 * If necessary, first collects the instruction up to and including the
 * SIB byte.
 * Displacement value is sign-expanded.
 */
void insn_get_displacement(struct insn *insn)
{
	u8 mod;
	if (insn->displacement.got)
		return;
	if (!insn->sib.got)
		insn_get_sib(insn);
	if (insn->modrm.nbytes) {
		/*
		 * Interpreting the modrm byte:
		 * mod = 00 - no displacement fields (exceptions below)
		 * mod = 01 - 1-byte displacement field
		 * mod = 10 - displacement field is 4 bytes, or 2 bytes if
		 * 	address size = 2 (0x67 prefix in 32-bit mode)
		 * mod = 11 - no memory operand
		 *
		 * If address size = 2...
		 * mod = 00, r/m = 110 - displacement field is 2 bytes
		 *
		 * If address size != 2...
		 * mod != 11, r/m = 100 - SIB byte exists
		 * mod = 00, SIB base = 101 - displacement field is 4 bytes
		 * mod = 00, r/m = 101 - rip-relative addressing, displacement
		 * 	field is 4 bytes
		 */
		mod = MODRM_MOD(insn);
		if (mod == 3)
			goto out;
		if (mod == 1) {
			insn->displacement.value = *((s8 *)insn->next_byte++);
			insn->displacement.nbytes = 1;
		} else if (insn->addr_bytes == 2) {
			if ((mod == 0 && MODRM_RM(insn) == 6) || mod == 2) {
				insn->displacement.value = get_next(s16, insn);
				insn->displacement.nbytes = 2;
			}
		} else {
			if ((mod == 0 && MODRM_RM(insn) == 5) || mod == 2 ||
			    (mod == 0 && SIB_BASE(insn) == 5)) {
				insn->displacement.value = get_next(s32, insn);
				insn->displacement.nbytes = 4;
			}
		}
	}
out:
	insn->displacement.got = true;
}
EXPORT_SYMBOL_GPL(insn_get_displacement);

const u32 onebyte_has_immb[256 / 32] = {
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
	/*      -----------------------------------------------         */
	W(0x00, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) | /* 0f */
	W(0x10, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) , /* 1f */
	W(0x20, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) | /* 2f */
	W(0x30, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) , /* 3f */
	W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */
	W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 5f */
	W(0x60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0) | /* 6f */
	W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 7f */
	W(0x80, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
	W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */
	W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0) | /* af */
	W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */
	W(0xc0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0) | /* cf */
	W(0xd0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* df */
	W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0) | /* ef */
	W(0xf0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)   /* ff */
	/*      -----------------------------------------------         */
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
};

const u32 onebyte_has_imm[256 / 32] = {
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
	/*      -----------------------------------------------         */
	W(0x00, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) | /* 0f */
	W(0x10, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) , /* 1f */
	W(0x20, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) | /* 2f */
	W(0x30, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) , /* 3f */
	W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 4f */
	W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 5f */
	W(0x60, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0) | /* 6f */
	W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 7f */
	W(0x80, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
	W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 9f */
	W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) | /* af */
	W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* bf */
	W(0xc0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
	W(0xd0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* df */
	W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0) | /* ef */
	W(0xf0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)   /* ff */
	/*      -----------------------------------------------         */
	/*      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f          */
};

/* Decode moffset16/32/64 */
static void __get_moffset(struct insn *insn)
{
	switch (insn->addr_bytes) {
	case 2:
		insn->moffset1.value = get_next(s16, insn);
		insn->moffset1.nbytes = 2;
		break;
	case 4:
		insn->moffset1.value = get_next(s32, insn);
		insn->moffset1.nbytes = 4;
		break;
	case 8:
		insn->moffset1.value = get_next(s32, insn);
		insn->moffset1.nbytes = 4;
		insn->moffset2.value = get_next(s32, insn);
		insn->moffset2.nbytes = 4;
		break;
	}
	insn->moffset1.got = insn->moffset2.got = true;
}

/* Decode imm(Iz) */
static void __get_imm(struct insn *insn)
{
	switch (insn->opnd_bytes) {
	case 2:
		insn->immediate.value = get_next(s16, insn);
		insn->immediate.nbytes = 2;
		break;
	case 4:
	case 8:
		insn->immediate.value = get_next(s32, insn);
		insn->immediate.nbytes = 4;
		break;
	}
}

/* Decode imm64(Iv) */
static void __get_imm64(struct insn *insn)
{
	switch (insn->opnd_bytes) {
	case 2:
		insn->immediate1.value = get_next(s16, insn);
		insn->immediate1.nbytes = 2;
		break;
	case 4:
		insn->immediate1.value = get_next(s32, insn);
		insn->immediate1.nbytes = 4;
		break;
	case 8:
		insn->immediate1.value = get_next(s32, insn);
		insn->immediate1.nbytes = 4;
		insn->immediate2.value = get_next(s32, insn);
		insn->immediate2.nbytes = 4;
		break;
	}
	insn->immediate1.got = insn->immediate2.got = true;
}

/* Decode ptr16:16/32(AP) */
static void __get_immptr(struct insn *insn)
{
	switch (insn->opnd_bytes) {
	case 2:
		insn->immediate1.value = get_next(s16, insn);
		insn->immediate1.nbytes = 2;
		break;
	case 4:
		insn->immediate1.value = get_next(s32, insn);
		insn->immediate1.nbytes = 4;
		break;
	case 8:
		/* ptr16:64 is not supported (no segment) */
		WARN_ON(1);
		return;
	}
	insn->immediate2.value = get_next(u16, insn);
	insn->immediate2.nbytes = 2;
	insn->immediate1.got = insn->immediate2.got = true;
}

/**
 *
 * insn_get_immediate() - Get the immediates of instruction
 * @insn:	&struct insn containing instruction
 *
 * If necessary, first collects the instruction up to and including the
 * displacement bytes.
 * Basically, most of immediates are sign-expanded. Unsigned-value can be
 * get by bit masking with ((1 << (nbytes * 8)) - 1)
 */
void insn_get_immediate(struct insn *insn)
{
	u8 opcode;
	if (insn->immediate.got)
		return;
	if (!insn->displacement.got)
		insn_get_displacement(insn);
	if (insn->opcode.nbytes == 1) {
		opcode = OPCODE1(insn);
	    	if (opcode >= 0xa0 && opcode <= 0xa3) { /* direct moffset mov */
			__get_moffset(insn);
		} else if (test_bit(opcode,
				    (const unsigned long *)onebyte_has_immb) ||
			   (opcode == 0xf6 && MODRM_REG(insn) == 0)) {
			insn->immediate.value = get_next(s8, insn);
			insn->immediate.nbytes = 1;
		} else if (test_bit(opcode,
				    (const unsigned long *)onebyte_has_imm) ||
			   (opcode == 0xf7 && MODRM_REG(insn) == 0)) {
			__get_imm(insn);
		} else if (0xb8 <= opcode && opcode <= 0xbf /* mov immv */) {
			__get_imm64(insn);
		} else if (opcode == 0xea /* jmp far seg:offs */) {
			__get_immptr(insn);
		} else if (opcode == 0xc2 /* retn immw */ ||
			   opcode == 0xca /* retf immw */) {
			insn->immediate.value = get_next(u16, insn);
			insn->immediate.nbytes = 2;
		} else if (opcode == 0xc8 /* enter immw, immb */) {
			insn->immediate1.value = get_next(u16, insn);
			insn->immediate1.nbytes = 2;
			insn->immediate2.value = get_next(u8, insn);
			insn->immediate2.nbytes = 1;
		}
	} else if (insn->opcode.nbytes == 2) {
		opcode = OPCODE2(insn);
		if ((opcode & 0xf0) == 0x80 /* Jcc imm32 */) {
			__get_imm(insn);
		} else
			switch(opcode) {
			case 0x70: /* pshuf* %1, %2, immb */
			case 0x71: /* Group12 %1, immb */
			case 0x72: /* Group13 %1, immb */
			case 0x73: /* Group14 %1, immb */
			case 0xa4: /* shld %1, %2, immb */
			case 0xac: /* shrd %1, %2, immb */
			case 0xba: /* Group8 %1, immb */
			case 0xc2: /* cmpps %1, %2, immb */
			case 0xc4: /* pinsw %1, %2, immb */
			case 0xc5: /* pextrw %1, %2, immb */
			case 0xc6: /* shufps/d %1, %2, immb */
				insn->immediate.value = get_next(u8, insn);
				insn->immediate.nbytes = 1;
			default:
				break;
			}
	} else if (OPCODE3(insn) == 0x0f /* pailgnr %1, %2, immb */) {
		insn->immediate.value = get_next(u8, insn);
		insn->immediate.nbytes = 1;
	}
	insn->immediate.got = true;
}
EXPORT_SYMBOL_GPL(insn_get_immediate);

/**
 *
 * insn_get_length() - Get the length of instruction
 * @insn:	&struct insn containing instruction
 *
 * If necessary, first collects the instruction up to and including the
 * immediates bytes.
 */
void insn_get_length(struct insn *insn)
{
	if (insn->length)
		return;
	if (!insn->immediate.got)
		insn_get_immediate(insn);
	insn->length = (u8)((unsigned long)insn->next_byte 
			    - (unsigned long)insn->kaddr);
}
EXPORT_SYMBOL_GPL(insn_get_length);
-------------- next part --------------
#ifndef _ASM_X86_INSN_H
#define _ASM_X86_INSN_H
/*
 * x86 instruction analysis
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2009
 */

#ifdef KERNEL
#include <linux/types.h>
#else
#include "insn_x86_user.h"
#endif

#undef W
#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
	(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) |   \
	  (b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) |   \
	  (b8##UL << 0x8)|(b9##UL << 0x9)|(ba##UL << 0xa)|(bb##UL << 0xb) |   \
	  (bc##UL << 0xc)|(bd##UL << 0xd)|(be##UL << 0xe)|(bf##UL << 0xf))    \
	 << (row % 32))

/* legacy instruction prefixes */
#define X86_PFX_OPNDSZ	0x1	/* 0x66 */
#define X86_PFX_ADDRSZ	0x2	/* 0x67 */
#define X86_PFX_CS	0x4	/* 0x2E */
#define X86_PFX_DS	0x8	/* 0x3E */
#define X86_PFX_ES	0x10	/* 0x26 */
#define X86_PFX_FS	0x20	/* 0x64 */
#define X86_PFX_GS	0x40	/* 0x65 */
#define X86_PFX_SS	0x80	/* 0x36 */
#define X86_PFX_LOCK	0x100	/* 0xF0 */
#define X86_PFX_REPE	0x200	/* 0xF3 */
#define X86_PFX_REPNE	0x400	/* 0xF2 */
/* REX prefix */
#define X86_PFX_REX	0x800	/* 0x4X */
/* REX prefix dissected */
#define X86_PFX_REX_BASE 0x1000
#define X86_PFX_REXB	0x1000	/* 0x41 bit */
#define X86_PFX_REXX	0x2000	/* 0x42 bit */
#define X86_PFX_REXR	0x4000	/* 0x44 bit */
#define X86_PFX_REXW	0x8000	/* 0x48 bit */

struct insn_field {
	union {
		s32 value;
		u8 bytes[4];
	};
	bool got;	/* true if we've run insn_get_xxx() for this field */
	u8 nbytes;
};

struct insn {
	struct insn_field prefixes;	/* prefixes.value is a bitmap */
	struct insn_field opcode;	/*
					 * opcode.bytes[0]: opcode1
					 * opcode.bytes[1]: opcode2
					 * opcode.bytes[2]: opcode3
					 */
	struct insn_field modrm;
	struct insn_field sib;
	struct insn_field displacement;
	union {
		struct insn_field immediate;
		struct insn_field moffset1;	/* for 64bit MOV */
		struct insn_field immediate1;	/* for 64bit imm or off16/32 */
	};
	union {
		struct insn_field moffset2;	/* for 64bit MOV */
		struct insn_field immediate2;	/* for 64bit imm or seg16 */
	};

	u8 opnd_bytes;
	u8 addr_bytes;
	u8 length;
	bool x86_64;

	const u8 *kaddr;	/* kernel address of insn (copy) to analyze */
	const u8 *next_byte;
};

#define OPCODE1(insn) ((insn)->opcode.bytes[0])
#define OPCODE2(insn) ((insn)->opcode.bytes[1])
#define OPCODE3(insn) ((insn)->opcode.bytes[2])

#define MODRM_MOD(insn) (((insn)->modrm.value & 0xc0) >> 6)
#define MODRM_REG(insn) (((insn)->modrm.value & 0x38) >> 3)
#define MODRM_RM(insn) ((insn)->modrm.value & 0x07)

#define SIB_SCALE(insn) (((insn)->sib.value & 0xc0) >> 6)
#define SIB_INDEX(insn) (((insn)->sib.value & 0x38) >> 3)
#define SIB_BASE(insn) ((insn)->sib.value & 0x07)

#define MOFFSET64(insn)	(((u64)((insn)->moffset2.value) << 32) | \
			  (u32)((insn)->moffset1.value))

#define IMMEDIATE64(insn)	(((u64)((insn)->immediate2.value) << 32) | \
				  (u32)((insn)->immediate1.value))

extern void insn_init(struct insn *insn, const u8 *kaddr, bool x86_64);
extern void insn_get_prefixes(struct insn *insn);
extern void insn_get_opcode(struct insn *insn);
extern void insn_get_modrm(struct insn *insn);
extern void insn_get_sib(struct insn *insn);
extern void insn_get_displacement(struct insn *insn);
extern void insn_get_immediate(struct insn *insn);
extern void insn_get_length(struct insn *insn);

#ifdef CONFIG_X86_64
extern bool insn_rip_relative(struct insn *insn);
#else
static bool insn_rip_relative(struct insn *insn)
{
	return false;
}
#endif

static inline bool insn_field_exists(const struct insn_field *field)
{
	return (field->nbytes > 0);
}

static inline u8 insn_extract_reg(int modrm)
{
	return (modrm >> 3) & 0x7;
}

#endif /* _ASM_X86_INSN_H */
-------------- next part --------------
#ifndef __INSN_X86_USER_H
#define __INSN_X86_USER_H

/*
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2009
 */

#ifdef __x86_64__
#define CONFIG_X86_64
#else
#define CONFIG_X86_32
#endif
typedef unsigned char u8;
typedef unsigned short u16;
typedef unsigned int u32;
typedef unsigned long long u64;

typedef signed char s8;
typedef short s16;
typedef int s32;
typedef long long s64;

typedef enum bool { false, true } bool;

/* any harmless file-scope decl */
#define NOP_DECL struct __nop
#define EXPORT_SYMBOL_GPL(symbol) NOP_DECL
#define MODULE_LICENSE(gpl) NOP_DECL

#define WARN_ON(cond) do{}while(0)

#define BITS_PER_LONG (8*sizeof(long))
/* from arch/x86/include/asm/bitops.h */
static inline int test_bit(int nr, const volatile unsigned long *addr)
{
	return ((1UL << (nr % BITS_PER_LONG)) &
		(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
}

#endif /* __INSN_X86_USER_H */
-------------- next part --------------
test_get_len: test_get_len.c insn_x86.c insn_x86.h insn_x86_user.h
	$(CC) -g test_get_len.c insn_x86.c -o test_get_len

clean:
	rm -f *.o

clobber: clean
	rm -f test_get_len
-------------- next part --------------
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include "insn_x86.h"

/*
 * Test of instruction analysis in general and insn_get_length() in
 * particular.  See if insn_get_length() and the disassembler agree
 * on the length of each instruction in an elf disassembly.
 *
 * usage: test_get_len [x86_64] < distilled_disassembly
 */

const char *prog;

static void usage()
{
	fprintf(stderr, "usage: %s [x86_64] < distilled_disassembly\n", prog);
	exit(1);
}

static void malformed_line(const char *line, int line_nr)
{
	fprintf(stderr, "%s: malformed line %d:\n%s", prog, line_nr, line);
	exit(3);
}

main(int argc, char **argv)
{
	char line[200];
	unsigned char insn_buf[16];
	struct insn insn;
	bool x86_64 = false;
	int errors = 0, insns = 0;
#define MAX_ERRORS 10

	prog = argv[0];
	if (argc == 2) {
		if (!strcmp(argv[1], "x86_64"))
			x86_64 = true;
		else
			usage();
	} else if (argc > 2)
		usage();

	while (fgets(line, 200, stdin)) {
		char copy[200], *s, *tab1, *tab2;
		int nb = 0;
		unsigned b;

		insns++;
		memset(insn_buf, 0, 16);
		strcpy(copy, line);
		tab1 = strchr(copy, '\t');
		if (!tab1)
			malformed_line(line, insns);
		s = tab1 + 1;
		s += strspn(s, " ");
		tab2 = strchr(s, '\t');
		if (!tab2)
			malformed_line(line, insns);
		*tab2 = '\0';  // so characters beyond tab2 aren't examined
		while (s < tab2) {
			if (sscanf(s, "%x", &b) == 1) {
				insn_buf[nb++] = (unsigned char) b;
				s += 3;
			} else
				break;
		}
		
		insn_init(&insn, insn_buf, x86_64);
		insn_get_length(&insn);
		if (insn.length != nb) {
			fprintf(stderr, "%s", line);
			fprintf(stderr, "objdump says %d bytes, but "
				"insn_get_length() says %d\n", nb, insn.length);
			if (++errors > MAX_ERRORS) {
				fprintf(stderr, "Stopping after %d errors "
					"and %d instructions.\n",
					MAX_ERRORS, insns);
				exit(2);
			}
		}
	}
	exit(0);
}

From ananth at in.ibm.com  Sat Mar  7 11:57:35 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Sat, 7 Mar 2009 17:27:35 +0530
Subject: [PATCH] Fix utrace_attach_delay() to work correctly with cloned
	threads
In-Reply-To: <20090306205234.0A759FC3BF@magilla.sf.frob.com>
References: <20090306154134.GB15133@in.ibm.com>
	<20090306205234.0A759FC3BF@magilla.sf.frob.com>
Message-ID: <20090307115735.GE15133@in.ibm.com>

On a CLONE_THREAD, target->real_parent == current->real_parent and not
current. New threads would loop forever here.

Fix utrace_attach_delay() to work correctly with new threads.

Signed-off-by: Ananth N Mavinakayanahalli <ananth at in.ibm.com>
---
 kernel/utrace.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: utrace-6mar/kernel/utrace.c
===================================================================
--- utrace-6mar.orig/kernel/utrace.c
+++ utrace-6mar/kernel/utrace.c
@@ -123,12 +123,15 @@ static inline bool exclude_utrace(struct
  */
 static inline int utrace_attach_delay(struct task_struct *target)
 {
-	if ((target->flags & PF_STARTING) && target->real_parent != current)
-		do {
-			schedule_timeout_interruptible(1);
-			if (signal_pending(current))
-				return -ERESTARTNOINTR;
-		} while (target->flags & PF_STARTING);
+	if ((target->flags & PF_STARTING) && target->real_parent != current) {
+		if (target->real_parent != current->real_parent) {
+			do {
+				schedule_timeout_interruptible(1);
+				if (signal_pending(current))
+					return -ERESTARTNOINTR;
+			} while (target->flags & PF_STARTING);
+		}
+	}
 
 	return 0;
 }


From stapling at padis.com.pl  Sat Mar  7 20:33:03 2009
From: stapling at padis.com.pl (Leonhardt Falencki)
Date: Sat, 07 Mar 2009 20:33:03 +0000
Subject: How manny orgasm can man do?
Message-ID: <3863668088.20090307202958@padis.com.pl>


How many orgasm can man do? I had four orgasms in about 400 minutes! :)


Instead of thinking about the fishing and the philip kynge
of spaine did practise to be asured intelligence and wisdom
by waiting upon those very difficult to attain, must be
constant in him and proud of his great talents and was a
devoted.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090307/55d75787/attachment.htm>

From oleg at redhat.com  Sat Mar  7 23:03:30 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Sun, 8 Mar 2009 00:03:30 +0100
Subject: [PATCH] Fix utrace_attach_delay() to work correctly with
	cloned threads
Message-ID: <20090307230330.GA26139@redhat.com>

Ananth N Mavinakayanahalli wrote:
>
> --- utrace-6mar.orig/kernel/utrace.c
> +++ utrace-6mar/kernel/utrace.c
> @@ -123,12 +123,15 @@ static inline bool exclude_utrace(struct
>   */
>  static inline int utrace_attach_delay(struct task_struct *target)
>  {
> -	if ((target->flags & PF_STARTING) && target->real_parent != current)
> -		do {
> -			schedule_timeout_interruptible(1);
> -			if (signal_pending(current))
> -				return -ERESTARTNOINTR;
> -		} while (target->flags & PF_STARTING);
> +	if ((target->flags & PF_STARTING) && target->real_parent != current) {
> +		if (target->real_parent != current->real_parent) {

But target->real_parent == current->real_parent doesn't mean current
is a creator? It is possible that current's ->real_parent does fork().
And even with CLONE_THREAD, this doesn't mean we are creator, but the
commment says "The creator gets the first chance to attach".

Perhaps we can intruduce the new UTRACE_ATTACH_XXX, this flags should
be used when utrace_attach_task() is called from ->report_clone(), and
then something like

	--- kernel/utrace.c
	+++ kernel/utrace.c
	@@ -130,12 +130,11 @@ static inline bool exclude_utrace(struct
	  */
	 static inline int utrace_attach_delay(struct task_struct *target)
	 {
	-	if ((target->flags & PF_STARTING) && target->real_parent != current)
	-		do {
	-			schedule_timeout_interruptible(1);
	-			if (signal_pending(current))
	-				return -ERESTARTNOINTR;
	-		} while (target->flags & PF_STARTING);
	+	while (unlikely(target->flags & PF_STARTING)) {
	+		schedule_timeout_interruptible(1);
	+		if (signal_pending(current))
	+			return -ERESTARTNOINTR;
	+	}
	 
		return 0;
	 }
	@@ -267,7 +266,8 @@ struct utrace_engine *utrace_attach_task
		engine->ops = ops;
		engine->data = data;
	 
	-	ret = utrace_attach_delay(target);
	+	if (!(flags & UTRACE_ATTACH_XXX))
	+		ret = utrace_attach_delay(target);
		if (likely(!ret))
			ret = utrace_add_engine(target, utrace, engine,
						flags, ops, data);

when ->report_clone() is called current == creator always.

Yes, this is ugly, I agree.


We can also add "struct task_struct *creator" to "struct utrace". It is
be set by tracehook_finish_clone/utrace_init_task, and it is cleared by
tracehook_report_clone() path. In that case we do not need PF_STARTING.
But this blows task_struct...

Oleg.


From ananth at in.ibm.com  Sun Mar  8 14:53:54 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Sun, 8 Mar 2009 20:23:54 +0530
Subject: [PATCH] Fix utrace_attach_delay() to work correctly with
	cloned threads
In-Reply-To: <20090307230330.GA26139@redhat.com>
References: <20090307230330.GA26139@redhat.com>
Message-ID: <20090308145354.GA4600@in.ibm.com>

On Sun, Mar 08, 2009 at 12:03:30AM +0100, Oleg Nesterov wrote:
> Ananth N Mavinakayanahalli wrote:

...
 
> We can also add "struct task_struct *creator" to "struct utrace". It is
> be set by tracehook_finish_clone/utrace_init_task, and it is cleared by
> tracehook_report_clone() path. In that case we do not need PF_STARTING.
> But this blows task_struct...

But just by one pointer size. Perhaps reverting commit dd30e86355 would
suffice?

Ananth


From roland at redhat.com  Mon Mar  9 18:23:51 2009
From: roland at redhat.com (Roland McGrath)
Date: Mon,  9 Mar 2009 11:23:51 -0700 (PDT)
Subject: [BUG] utrace_attach_task() never returns when called from the
	report_clone callback
In-Reply-To: Ananth N Mavinakayanahalli's message of  Saturday,
	7 March 2009 07:37:02 +0530 <20090307020702.GD15133@in.ibm.com>
References: <20090306154134.GB15133@in.ibm.com>
	<20090306205234.0A759FC3BF@magilla.sf.frob.com>
	<20090307020702.GD15133@in.ibm.com>
Message-ID: <20090309182351.1FA6FFC3C7@magilla.sf.frob.com>

> The issue is that target->real_parent == current->real_parent and not
> current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the
> do-while.

Oops!  I knew it felt too easy to remove the utrace->cloning field.  If a
little cleverness sufficed then I would have done it that way in the first
place.  I've restored the old mechanism.


Thanks,
Roland


From winnings at uslottery.com.redhat.com  Tue Mar 10 06:42:39 2009
From: winnings at uslottery.com.redhat.com (U.S Lottery)
Date: Tue, 10 Mar 2009 01:42:39 -0500
Subject: Congratulation From The United States of America: Your Email Have
	Won
Message-ID: <E1Lgvfj-0004tA-AJ@prizm.websitewelcome.com>

Date:10th March 2009
Ref. No: 575061725
Batch No: 8056490902/188
Serial No: 6741137002
Winning No: KB8701/LPRC


CONGRATULATION!!!

We are delighted to inform you of your winning on 9th March 2009 from the United States of America International Lottery Program, which is partially based on an electronic selection of winners using their e-mail addresses.
Your e-mail address was attached to ticket number; 575061725 8056490902 serial number 6741137002 .This batch draws the lucky numbers as follows: 4-13-33-37-42 bonus number 17, which consequently won the lottery in the second category.
All participants were selected through a computer balloting system drawn from Nine hundred thousand E-mail addresses from Canada, Australia, Asia, Europe, Middle East, and Africa as part of our international promotions program which is conducted annually.
This Lottery was promoted and sponsored by president Barack Obama as his part of social responsibility and his special way to appreciate the world citizens as the new inaugurated African-American president and base on this your emails address was lucky to be selected and you are entitled with a huge lump of $500,000.00.

HOW TO FILE YOUR CLAIM: Simply contact our Fiduciary Claims Agent, with below information;
The claims processor is:
Name:Dr Daniel Peters
E-mail: dr.danielpeters at msn.com
Telephone: +447045711338

Do email the above email address, immediately with all the claims requirements below In order to avoid unnecessary delays and complications.

Claims Requirements:
1. FULL NAMES: 
2. NATIONALITY: 
3. DATE OF BIRTH:
4. SEX: 
5. MARITAL STATUS: 
6. CONTACT ADDRESS: 
7. TELEPHONE NUMBER: 
8. OCCUPATION: 
9. COUNTRY:

security reasons, we advice the winner to keep this information confidential from the public until your claim is processed and been released to you. This is part of our security protocol to avoid double claiming and unwarranted taking advantage of this program by non-participant or unofficial personnel.
ANY BREACH OF CONFIDENTIALITY ON THE PART OF WINNERS WILL RESULT TO DISQUALIFICATION.

Best Regards.
Mrs. Sean Maria Dunn
(For the coordinator)

Copyright ? 1968-2009 United States lottery Inc All rights.  


From ananth at in.ibm.com  Tue Mar 10 10:59:22 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 10 Mar 2009 16:29:22 +0530
Subject: [BUG] utrace_attach_task() never returns when called from the
	report_clone callback
In-Reply-To: <20090309182351.1FA6FFC3C7@magilla.sf.frob.com>
References: <20090306154134.GB15133@in.ibm.com>
	<20090306205234.0A759FC3BF@magilla.sf.frob.com>
	<20090307020702.GD15133@in.ibm.com>
	<20090309182351.1FA6FFC3C7@magilla.sf.frob.com>
Message-ID: <20090310105922.GF4600@in.ibm.com>

On Mon, Mar 09, 2009 at 11:23:51AM -0700, Roland McGrath wrote:
> > The issue is that target->real_parent == current->real_parent and not
> > current on a CLONE_THREAD|CLONE_PARENT. So we keep looping in the
> > do-while.
> 
> Oops!  I knew it felt too easy to remove the utrace->cloning field.  If a
> little cleverness sufficed then I would have done it that way in the first
> place.  I've restored the old mechanism.

Thanks! The interface now works as expected.

Ananth


From info at posteserver.it  Tue Mar 10 13:53:24 2009
From: info at posteserver.it (Poste Italiane)
Date: Tue, 10 Mar 2009 17:53:24 +0400 (GST)
Subject: Diventa utente verificato !
Message-ID: <20090310135324.B62A8D17FAE@email.arabtecuae.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090310/c8604a5a/attachment.htm>

From vfalico at redhat.com  Tue Mar 10 16:33:51 2009
From: vfalico at redhat.com (Veaceslav Falico)
Date: Tue, 10 Mar 2009 17:33:51 +0100
Subject: [PATCH] utrace_add_engine: add missing 'else' after 'if
 (utrace->reap)'
Message-ID: <1236702831.8714.33.camel@darkmag.usersys.redhat.com>

In function utrace_add_engine is a missing else while verifying if
utrace_release_task was already called, which can lead to adding to a
reaping utrace engine.

Signed-off-by: Veaceslav Falico <vfalico at redhat.com>
---
diff --git a/kernel/utrace.c b/kernel/utrace.c
index 906145e..8fc1867 100644
--- a/kernel/utrace.c
+++ b/kernel/utrace.c
@@ -153,7 +153,7 @@ static int utrace_add_engine(struct task_struct *target,
 		 * Already entered utrace_release_task(), cannot attach now.
 		 */
 		ret = -ESRCH;
-	} if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
+	} else if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
 	    unlikely(matching_engine(utrace, flags, ops, data))) {
 		ret = -EEXIST;
 	} else {


From oleg at redhat.com  Tue Mar 10 16:45:36 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 10 Mar 2009 17:45:36 +0100
Subject: [PATCH] utrace_stop: trivial, kill the unnecessary assignment
Message-ID: <20090310164536.GA32196@redhat.com>

Kill the unneeded "killed = false", the next line overwrites "killed".

Signed-off-by: Oleg Nesterov <oleg at redhat.com>

--- xxx/kernel/utrace.c~DEAD_LINE	2009-03-09 21:41:04.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-10 17:42:02.000000000 +0100
@@ -440,7 +440,6 @@ static bool utrace_stop(struct task_stru
 	 */
 	try_to_freeze();
 
-	killed = false;
 	killed = finish_utrace_stop(task, utrace);
 
 	/*


From oleg at redhat.com  Tue Mar 10 18:23:27 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 10 Mar 2009 19:23:27 +0100
Subject: Q: REPORT_CALLBACKS()->list_for_each_entry_safe() - why _safe?
Message-ID: <20090310182327.GA3826@redhat.com>

REPORT_CALLBACKS/utrace_resume/etc use list_for_each_entry_safe().

Why we can't just use list_for_each_entry() ?

Perhaps I misread utrace.c, but I can't see how engine can be unlinked
under us. Afaics, nobody except us (finish_report->utrace_reset) can
unlink the detached engines, even if we race with UTRACE_DETACH. And
we can't race with utrace_release_task().

No?

OTOH. If I am wrong, and UTRACE_DETACH can unlink _any_ engine from
->attached list while we are doing list_for_each_entry_safe(), then
we can crash, and I can't see how _safe can help.

Confused.

Oleg.


From mhiramat at redhat.com  Tue Mar 10 19:57:11 2009
From: mhiramat at redhat.com (Masami Hiramatsu)
Date: Tue, 10 Mar 2009 15:57:11 -0400
Subject: instruction-analysis API(s)
In-Reply-To: <20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>	<498CA248.2090708@redhat.com>	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>	<4990B6D4.2020907@redhat.com>
	<20090210044230.GB12811@in.ibm.com>	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>	<49A85902.8000306@redhat.com>	<1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>	<49AF3480.1040804@redhat.com>
	<49B059B8.8090702@redhat.com>
	<20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com>
Message-ID: <49B6C617.1090602@redhat.com>

Hi Jim,

Jim Keniston wrote:
> Quoting Masami Hiramatsu <mhiramat at redhat.com>:
> 
>> Hi Jim and Sriker,
>>
>> Here, I almost rewrote my patch.
>>
>> Changelog:
>> - rewrite decoding logic based on Intel' manual.
>> - supoort insn_get_sib(),insn_get_displacement()
>>   and insn_get_immediate() too.
>> - support 3 bytes opcode and 64bit immediate.
>> - introduce some bitmaps.
>>
>> Thank you,
> 
> Well, I didn't do much of a code review -- it looks like you addressed
> all my concerns -- but as I mentioned on IRC, I hacked together a test
> rig whereby you can disassemble a designated elf file (e.g., vmlinux,
> libc, libm) and then compare insn_get_length()'s results with objdump's
> results.  The comment in distill.awk shows how to use objdump, awk, and
> test_get_len together.

Thank you for review and test!

> I also hacked up insn_x86.h and insn_x86.c to work in user space.  Most
> of that is accomplished via insn_x86_user.h, but it certainly isn't
> necessary to do it that way.  In particular, __u8, __s8, __u16, etc. are
> versions of u8, s8, u16, etc. that can be used in both kernel and user
> code, so maybe we should switch to those.
> 
> I tested with vmlinux, libc, and libm on both an i686 system and an
> x86_64 system.  I found and fixed a few bugs.  Here are the ones that
> come to mind (all fixed):
> - shrd/shld, which we discussed
> - missing support for weird nops with modrm bytes (0f 1f ...).
> - neglected to include the REX prefix in prefixes.nbytes
> - missing static decl in an inline function in insn_x86.h

Thank you for fixing  it.
BTW, it might have to support vm86 mode(especially, for user code).

> There are some other cases where insn_get_length() doesn't match up with
> the disassembly, but I don't consider them bugs:
> - 0x9b is an instruction (fwait), but the disassembler treats it as a
> prefix.  For example 9b df ... can be disassembled as
>     fstsw ...    // wait, then store status word
> or
>     fwait        // wait
>     fnstsw ...    // store status word without waiting
> Perhaps it's relevant to investigate whether a single-step of 9b df ...
> would execute just the fwait or the whole fstsw.  Anyway, this explains
> the "failures" of finit and fstsw that I mentioned to you.  I also saw
> this with fstcw and fclex.

FYI, there is a single wait/fwait instruction described at Intel software
developers manual vol.2B p.399.

> - Illegal instruction sequences, such as an x86_64 instruction that
> starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these
> when disassembling data.  I just filtered out (via egrep) instructions
> whose disassembly starts with "rex" or includes "(bad)".

Sure, I think insn_* should return -EINVAL or set insn.invalid = 1
if we found those invalid ops. E.g. kernel use BUG() macro, it adds
some raw numbers after ud2, in that case, those raw numbers might
be decoded as an illegal instruction.

> We could address the above by filtering them out in distill.awk or
> test_get_len.c.  I think we're clean otherwise.
> 
> There's a little more housecleaning to do -- e.g., adding Hitachi (?)
> copyright to IBM copyright, discarding insn_field_exists() and
> insn_extract_reg(), putting this all in git somewhere.  But not tonight.
> 
> Pull all the attached files into a directory and have a go -- e.g.,
> $ make
> $ objdump -d vmlinux | awk -f distill.awk | ./test_get_len [x86_64]
> 
> Jim
> 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat at redhat.com


From roland at redhat.com  Tue Mar 10 21:18:31 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 10 Mar 2009 14:18:31 -0700 (PDT)
Subject: [PATCH] utrace_add_engine: add missing 'else' after 'if
 (utrace->reap)'
In-Reply-To: Veaceslav Falico's message of  Tuesday,
	10 March 2009 17:33:51 +0100
	<1236702831.8714.33.camel@darkmag.usersys.redhat.com>
References: <1236702831.8714.33.camel@darkmag.usersys.redhat.com>
Message-ID: <20090310211831.2FE22FC3B6@magilla.sf.frob.com>

Good catch!  Applied.


Thanks,
Roland


From oleg at redhat.com  Tue Mar 10 21:22:47 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 10 Mar 2009 22:22:47 +0100
Subject: Q: ->attaching && REPORT_CALLBACKS()
Message-ID: <20090310212247.GA12258@redhat.com>

Despite the fat comment in utrace_add_engine() I can't really understand
the meaning of ->attaching list.

The comment:

	* When target == current, it would be safe just to call
	* splice_attaching() right here.  But if we're inside a
	* callback,

just to clarify, "inside a callback" means inside utrace_report_xxx(),
not only inside utrace_engine_ops->report_xxx(), right?

	            that would mean the new engine also gets
	* notified about the event that precipitated its own
	* creation.

engine->flags == 0, so it should not be notified until the caller
does utrace_set_events() later, right?

	             This is not what the user wants.

It it not clear to me why the user doesn't want this.

I understand this as follows. If we add the new engine to the ->attached
list, and if the target is inside a callback, the target can later race
with (say) utrace_set_events(). The target can see "engine->flags & event"
and call start_callback/finish_callback before utrace_set_events() completes.

Is this correct?

I guess no. Because the "race" above can happen even if we use ->attaching.
utrace_add_engine() can happen after we already entered utrace_report_xxx(),
but before it does start_report().

Could you clarify?


Another question. In any case I don't understand why do we really need
two lists.

Let's suppose we implement the new trivial helper,

	list_for_each_entry_xxx(pos, head, tail, member)

it stops when "pos" reaches "tail", not "head". Then REPORT_CALLBACKS()
can just read "tail = utrace->attached->prev" (under ->lock, or
utrace_add_engine() can use list_add_rcu) before list_for_each_entry_xxx.

This way we can kill ->attaching, no?

Oleg.


From roland at redhat.com  Tue Mar 10 21:57:57 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 10 Mar 2009 14:57:57 -0700 (PDT)
Subject: Q: REPORT_CALLBACKS()->list_for_each_entry_safe() - why _safe?
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	10 March 2009 19:23:27 +0100 <20090310182327.GA3826@redhat.com>
References: <20090310182327.GA3826@redhat.com>
Message-ID: <20090310215757.1D3BCFC3B6@magilla.sf.frob.com>

You are right.  I think that in some past version of the code, some utrace
calls made on current from inside a callback could change the list.  But
now it's only possible in utrace_reset, so the list can never change from a
callback.  I changed the code.


Thanks,
Roland


From roland at redhat.com  Wed Mar 11 00:11:37 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 10 Mar 2009 17:11:37 -0700 (PDT)
Subject: Q: ->attaching && REPORT_CALLBACKS()
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	10 March 2009 22:22:47 +0100 <20090310212247.GA12258@redhat.com>
References: <20090310212247.GA12258@redhat.com>
Message-ID: <20090311001137.2D625FC3B6@magilla.sf.frob.com>

> The comment:
> 
> 	* When target == current, it would be safe just to call
> 	* splice_attaching() right here.  But if we're inside a
> 	* callback,
> 
> just to clarify, "inside a callback" means inside utrace_report_xxx(),
> not only inside utrace_engine_ops->report_xxx(), right?

Certainly what I mean when I say "a callback" is one of the functions whose
pointer lives in struct utrace_engine_ops.  But I don't see how the
distinction you make could even be meaningful here.  A utrace_attach_task()
call "inside utrace_report_foo()" could only possibly mean one made by a
->report_foo() function utrace_report_foo() calls, since obviously there
are no hard-wired utrace_attach_task() calls in utrace.c itself.

> 	            that would mean the new engine also gets
> 	* notified about the event that precipitated its own
> 	* creation.
> 
> engine->flags == 0, so it should not be notified until the caller
> does utrace_set_events() later, right?

Right.  The case in question is a callback doing:

	new_engine = utrace_attach_task(current, ...);
	utrace_set_events(new_engine, <includes this event>);

Then new_engine would get the "this event" callback at the end of the very
same reporting loop containing its creation.  (This is what happened before
I changed the code as the comment describes.)

> 	             This is not what the user wants.
> 
> It it not clear to me why the user doesn't want this.

Jim Keniston is the user who doesn't want this.
https://www.redhat.com/archives/utrace-devel/2008-December/msg00051.html

> I understand this as follows. If we add the new engine to the ->attached
> list, and if the target is inside a callback, the target can later race
> with (say) utrace_set_events(). The target can see "engine->flags & event"
> and call start_callback/finish_callback before utrace_set_events() completes.

It's not a race question.  There are no guarantees for such races.  (That
is, utrace_set_events() calls on target!=current make no guarantee about
reporting an event that might already have started.  Only if the target was
already stopped by your engine when you made the call can you be sure that
no such event can be in progress.)

The scenario we are talking about here is fully synchronous.  The target
itself is inside a callback, calling utrace_set_events() on itself.

> Another question. In any case I don't understand why do we really need
> two lists.

We want that in the common case a reporting pass takes no locks.  The
->attached list is never touched when the target is not quiescent.  (The
target uses the lock to synchronize when it transitions between being
quiescent and not.)  Once you believe the quiescence logic, this makes
it easy to be confident about the unlocked use of that list in reporting
passes.  It's used in totally vanilla ways, and modified in totally
vanilla ways.

> Let's suppose we implement the new trivial helper,
> 
> 	list_for_each_entry_xxx(pos, head, tail, member)
> 
> it stops when "pos" reaches "tail", not "head". Then REPORT_CALLBACKS()
> can just read "tail = utrace->attached->prev" (under ->lock, or
> utrace_add_engine() can use list_add_rcu) before list_for_each_entry_xxx.
> 
> This way we can kill ->attaching, no?

This is a lot like what the old utrace code did before the introduction
of the two lists (when the engine struct was managed using RCU).  This
is just an optimization over what we have now.  It saves the ->attaching
space (i.e. two words in struct utrace), the splice_attaching() logic
(pretty cheap), and the sometimes-superfluous resume report after an
attach.  The cost for this is some very touchy fine-grained complexity
in convincing ourselves (and reviewers) that the list traversal and
modification is always correct.

I've already implied that anything taking any locks for every vanilla
reporting pass is a non-starter.  I'm asserting preemptive optimization
here because it's the case that is most important to optimize.  The
overhead of a reporting pass applies to situations like every system
call entry, with an engine callback that quickly filters out the vast
majority of calls (i.e. "if (regs->foo != __NR_bar) return;" or
something about that cheap).  So we think about the reporting pass
overhead as something that might be done a million times a second, and
accordingly think carefully about that hot path.  In contrast, we are
talking here about optimizing attach, and saving a couple of words of
data structure space that will already be cache-hot.

We are not actually following any RCU rules at all, so to use
list_add_tail_rcu would really just mean that we are relying on our own
fancy special list mutation scheme and proving/documenting that it is
correct.  It just happens to have the same implementation details as
list_add_tail_rcu, and we must either copy those innards and document
why they are right in our uses, or document how the list_add_tail_rcu
innards happen to match what is right for our uses and keep track of any
future implementation changes in rculist.h that might diverge from what
we rely on.  That proof and documentation entails hairy logic about SMP
ordering and memory barriers and so forth.  Frankly, that all seems like
much more touchy hair than the utrace-indirect logic (for less benefit),
and we've already decided to avoid that for the first cut because LKML
reviewers found it too hairy to contemplate.

Next, consider that e.g. Renzo Davoli has proposed reversing the engine
order used for certain reporting passes (syscall entry vs exit having
inverse order).  (I'm not discussing the merits of that change, it's
just an example.)  Right now, a change like that would be a simple
choice about the desireable API, with no implementation complexity to
worry about at all, just s/list_for_each/&_reverse/.  In the same vein,
we contemplate for the future having engines on a priority list or some
other means to add new engines somewhere other than at the end of the
list.  When we've resolved what interfaces we want for that, it will be
straightforward to implement whatever it is using normal list.h calls
(or even to use a different kind of list data structure).  Many choices
like those are likely to conflict with what clever SMP-safe list magic
we can do now if we start on that sort of optimization now.

I could easily be quite wrong about the performance trade-offs of a lock
vs splice_attaching() and cache effects, etc.  But before we get to
worrying about that performance in great detail, the complexity argument
stands.  This is an optimization to consider later on, both after the
upstream review has accepted the simpler code into the kernel to begin
with, and after we have gotten a more mature set of uses of the API and
refined the details of the API semantics on merits broader than such
micro-optimization.


Thanks,
Roland


From jkenisto at us.ibm.com  Wed Mar 11 19:44:07 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Wed, 11 Mar 2009 12:44:07 -0700
Subject: instruction-analysis API(s)
In-Reply-To: <49B6C617.1090602@redhat.com>
References: <1233793136.3652.4.camel@dyn9047018139.beaverton.ibm.com>
	<498CA248.2090708@redhat.com>
	<1233964738.3706.78.camel@dyn9047018139.beaverton.ibm.com>
	<4990B6D4.2020907@redhat.com> <20090210044230.GB12811@in.ibm.com>
	<1235591628.3629.20.camel@dyn9047018139.beaverton.ibm.com>
	<49A85902.8000306@redhat.com>
	<1236129313.5331.72.camel@dyn9047022094.beaverton.ibm.com>
	<49AF3480.1040804@redhat.com> <49B059B8.8090702@redhat.com>
	<20090307025500.gm2ahmuwgsoo0444@imap.linux.ibm.com>
	<49B6C617.1090602@redhat.com>
Message-ID: <1236800647.4965.46.camel@dyn9047018139.beaverton.ibm.com>

On Tue, 2009-03-10 at 15:57 -0400, Masami Hiramatsu wrote:
> Hi Jim,
...
> > 
> > I tested with vmlinux, libc, and libm on both an i686 system and an
> > x86_64 system.
...
> 
> Thank you for fixing  it.
> BTW, it might have to support vm86 mode(especially, for user code).

I have a vague idea of what vm86 mode is, but I don't really understand
what the implications are for instruction analysis or probing.  My
understanding is that its use is rare (e.g., for DOS emulators), so it
hasn't been a requirement for uprobes so far.

> 
> > There are some other cases where insn_get_length() doesn't match up with
> > the disassembly, but I don't consider them bugs:
> > - 0x9b is an instruction (fwait), but the disassembler treats it as a
> > prefix.  For example 9b df ... can be disassembled as
> >     fstsw ...    // wait, then store status word
> > or
> >     fwait        // wait
> >     fnstsw ...    // store status word without waiting
> > Perhaps it's relevant to investigate whether a single-step of 9b df ...
> > would execute just the fwait or the whole fstsw.  Anyway, this explains
> > the "failures" of finit and fstsw that I mentioned to you.  I also saw
> > this with fstcw and fclex.
> 
> FYI, there is a single wait/fwait instruction described at Intel software
> developers manual vol.2B p.399.

Yes, I tried probing an fclex instruction -- which is really fwait +
fnclex -- and the single-step stopped after the fwait.  So our
instruction analysis is correct.  (Of course, I had to adjust uprobes
not to reject the 0x9b opcode -- need to check that in.  PR 5273 is
about this sort of thing.)

> 
> > - Illegal instruction sequences, such as an x86_64 instruction that
> > starts with 0x40, or a misplaced 0x65 prefix.  Typically, we see these
> > when disassembling data.  I just filtered out (via egrep) instructions
> > whose disassembly starts with "rex" or includes "(bad)".
> 
> Sure, I think insn_* should return -EINVAL or set insn.invalid = 1
> if we found those invalid ops. E.g. kernel use BUG() macro, it adds
> some raw numbers after ud2, in that case, those raw numbers might
> be decoded as an illegal instruction.

It could be useful to provide a function to determine whether the byte
sequence is a valid instruction, but I don't think we should make that
check by default.  Here are some reasons:

1. It costs execution time.  For some instructions, you have to examine
the prefixes and/or modrm byte as well as the opcode(s).

2. It takes time to code it 100% right.  In particular, mistakenly
rejecting a valid instruction can be a nuisance.

3. Intel and AMD may not completely agree on which instructions are
valid in which modes.  I've always consulted the AMD manuals, since
they're online and appear complete, but I'm not really sure whether what
they say applies without exception to (say) Pentium and EM64T.

4. kprobes and uprobes have gotten along fine without such a test.
(Uprobes's test is far from complete, and deliberately screens out some
valid instructions, such as sysenter, that we suspect may produce weird
results when single-stepped.)  The assumption is that the address
provided points to the first byte of a valid instruction.  Since on x86,
most random byte sequences look like some kind valid instruction,
catching obviously invalid sequences wouldn't buy us very much.


Jim


From oleg at redhat.com  Wed Mar 11 22:24:01 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Wed, 11 Mar 2009 23:24:01 +0100
Subject: Q: utrace->stopped && utrace_report_jctl()
Message-ID: <20090311222401.GA13512@redhat.com>

I'd like to ask you to clarify what utrace->stopped means...

My understanding is: if we see ->stopped == true under utrace->lock, then
the target can do nothing "interesting" from the utrace's pov. The target
should take utrace->lock at least once. Either in finish_utrace_stop(), or,
if ->stopped was set by do_signal_stop() path, the target will call
tracehook_get_signal()->utrace_get_signal(). So we can assume the target
is "quiescent" and we can do, for example, UTRACE_DETACH safely.

Is this correct?


But utrace_report_jctl() doesn't look right to me,

	spin_lock(&utrace->lock);
	utrace->stopped = 0;
	utrace->report = 0;
	spin_unlock(&utrace->lock);

I must admit, I dont't understand the comment above, but obviously this is
right, we should clear ->stopped. If nothing else, REPORT()->start_report()
won't be happy if ->stopped.

But ->stopped can be restored right after we clear it! Yes, utrace_do_stop()
and utrace_set_events() set ->stopped == 1 only if ->utrace_flags has no JCTL,
and since we are here we must have JCTL.

But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be
already removed from ->utrace_flags, exactly because ->stopped was true.

No?

This leads to another minor question, how it is possible to enter enter
utrace_report_jctl() with ->stopped == 1 ? I think the only possibility
it was previously set by another call to utrace_report_jctl(), see below.


	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);

	if (was_stopped && !task_is_stopped(task)) {
		/*
		 * The event report hooks could have blocked, though
		 * it should have been briefly.  Make sure we're in
		 * TASK_STOPPED state again to block properly, unless
		 * we've just come back out of job control stop.
		 */

Yes. Even a plain kmalloc() can change ->state to TASK_RUNNING,

		spin_lock_irq(&task->sighand->siglock);
		if (task->signal->flags & SIGNAL_STOP_STOPPED)
			__set_current_state(TASK_STOPPED);

SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in
progress and it is not finished yet. But ->group_stop_count is not reliable
too. It it possible that we recieved SIGCONT and then another SIGSTOP. If
another thread has already dequeued this SIGSTOP and initiated the new group
stop, we can't just set TASK_STOPPED, we must participate in the
->group_stop_count accounting.


	if (task_is_stopped(current)) {
		/*
		 * While in TASK_STOPPED, we can be considered safely
		 * stopped by utrace_do_stop() only once we set this.
		 */
		spin_lock(&utrace->lock);
		utrace->stopped = 1;
		spin_unlock(&utrace->lock);

I think this is correct, but it is not easy to understand. SIGCONT may
come right after the task_is_stopped() check, so this _looks_ racy.

But, nobody should clear ->utrace_flags without calling utrace_wakeup()
which clears ->stopped too. This means that the target can't escape
from get_signal_to_deliver() with the ->stopped == 1. And in fact,
we could check was_stopped instead of task_is_stopped().

Is my understanding correct?


But! can't we miss utrace_wakeup() ?

Let's suppose the debugger D attaches the single engine E to the target T.

	D does utrace_set_events(JCTL).

	T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts
	utrace_report_jctl().

	D does utrace_set_events(events => 0), this clears E->flags.

	T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers
	and calls utrace_reset(). It sets ->utrace_flags = 0.

	T checks task_is_stopped(), sets ->stopped = 1.

Now, when T is woken by SIGCONT, it returns to user-space bypassing all utrace
hooks, and runs with ->stopped == 1. This doesn't look right. Say, D can do
utrace_set_events(ANY) and then T hits start_report()->BUG_ON(utrace->stopped).

Could you clarify?

Oleg.


From oleg at redhat.com  Thu Mar 12 00:15:21 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 01:15:21 +0100
Subject: Q: ->attaching && REPORT_CALLBACKS()
In-Reply-To: <20090311001137.2D625FC3B6@magilla.sf.frob.com>
References: <20090310212247.GA12258@redhat.com>
	<20090311001137.2D625FC3B6@magilla.sf.frob.com>
Message-ID: <20090312001521.GA16303@redhat.com>

On 03/10, Roland McGrath wrote:
>
> > The comment:
> >
> > 	* When target == current, it would be safe just to call
> > 	* splice_attaching() right here.  But if we're inside a
> > 	* callback,
> >
> > just to clarify, "inside a callback" means inside utrace_report_xxx(),
> > not only inside utrace_engine_ops->report_xxx(), right?
>
> Certainly what I mean when I say "a callback" is one of the functions whose
> pointer lives in struct utrace_engine_ops.  But I don't see how the
> distinction you make could even be meaningful here.

Yes, I wasn't clear.

> A utrace_attach_task()
> call "inside utrace_report_foo()" could only possibly mean one made by a
> ->report_foo() function utrace_report_foo() calls, since obviously there
> are no hard-wired utrace_attach_task() calls in utrace.c itself.

But not vise versa. I misunderstood the comment as if the new engine
should not be notified if it is attached by another task while target
is inside callback.

I was confused by "When target == current" part of the comment, please
see below.

> > 	             This is not what the user wants.
> >
> > It it not clear to me why the user doesn't want this.
>
> Jim Keniston is the user who doesn't want this.
> https://www.redhat.com/archives/utrace-devel/2008-December/msg00051.html

Still can't understand... If (say) ->report_exec() attaches the new
engine to the same task and does utrace_set_events(EXEC), then it looks
logical the new engine gets the notification too. But OK, I agree, either
way is correct, and perhaps the current behaviour is more intuitive.

But this means that "When target == current it would be safe just to call
splice_attaching() right here" part of the comment is not right, no?
Except for report_reap() target == current.

> > I understand this as follows. If we add the new engine to the ->attached
> > list, and if the target is inside a callback, the target can later race
> > with (say) utrace_set_events(). The target can see "engine->flags & event"
> > and call start_callback/finish_callback before utrace_set_events() completes.
>
> It's not a race question.  There are no guarantees for such races.  (That
> is, utrace_set_events() calls on target!=current make no guarantee about
> reporting an event that might already have started.  Only if the target was
> already stopped by your engine when you made the call can you be sure that
> no such event can be in progress.)
>
> The scenario we are talking about here is fully synchronous.  The target
> itself is inside a callback, calling utrace_set_events() on itself.

Yes, yes, I see. But I meant another case. Suppose that the debugger D
attaches to T and does

	engine = utrace_attach_task(T, ...);
	utrace_set_events(T, engine, XXX);

It is possible that ->report_xxx() is called before utrace_set_events()
completes. But afaics currently this is not a problem.

> > Another question. In any case I don't understand why do we really need
> > two lists.
>
> [... big snip ...]

Thanks for your explanations!

And, in any case,

> This is an optimization to consider later on

Yes, yes, sure. I didn't mean we should do this change right now even
_if_ it is good, and I didn't mean I think it is necessary good ;)

Oleg.


From oleg at redhat.com  Thu Mar 12 00:28:59 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 01:28:59 +0100
Subject: [PATCH] utrace_tracer_task: s/list_for_each_safe/list_for_each_entry
In-Reply-To: <20090310215757.1D3BCFC3B6@magilla.sf.frob.com>
References: <20090310182327.GA3826@redhat.com>
	<20090310215757.1D3BCFC3B6@magilla.sf.frob.com>
Message-ID: <20090312002859.GA20725@redhat.com>

utrace_tracer_task() can use list_for_each_entry() too.

Signed-off-by: Oleg Nesterov <oleg at redhat.com>

--- xxx/kernel/utrace.c~TRACER_TASK	2009-03-12 01:18:38.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-12 01:21:05.000000000 +0100
@@ -2317,15 +2317,12 @@ EXPORT_SYMBOL_GPL(task_user_regset_view)
  */
 struct task_struct *utrace_tracer_task(struct task_struct *target)
 {
-	struct list_head *pos, *next;
 	struct utrace_engine *engine;
 	const struct utrace_engine_ops *ops;
 	struct task_struct *tracer = NULL;
 	struct utrace *utrace = task_utrace_struct(target);
 
-	list_for_each_safe(pos, next, &utrace->attached) {
-		engine = list_entry(pos, struct utrace_engine,
-				    entry);
+	list_for_each_entry(engine, &utrace->attached, entry) {
 		ops = rcu_dereference(engine->ops);
 		if (ops->tracer_task) {
 			tracer = (*ops->tracer_task)(engine, target);


From roland at redhat.com  Thu Mar 12 05:12:46 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed, 11 Mar 2009 22:12:46 -0700 (PDT)
Subject: Q: ->attaching && REPORT_CALLBACKS()
In-Reply-To: Oleg Nesterov's message of  Thursday,
	12 March 2009 01:15:21 +0100 <20090312001521.GA16303@redhat.com>
References: <20090310212247.GA12258@redhat.com>
	<20090311001137.2D625FC3B6@magilla.sf.frob.com>
	<20090312001521.GA16303@redhat.com>
Message-ID: <20090312051246.B50E5FC3B6@magilla.sf.frob.com>

> But not vise versa. I misunderstood the comment as if the new engine
> should not be notified if it is attached by another task while target
> is inside callback.

That is indeed what happens in that case.  But that one is not a
specific "should not", it's just what happens to be true given what we
say about the "asynchronous" attach case in general.  That is, that an
"asynchronous" attach + set_events makes no guarantees about how
instantly you start to get event reports.  It might be as long as the
time it takes to get back to user mode from whereever the thread is
now, or the time it takes it to process an interrupt and then get back
to user mode.  It's like you did "thread->events |= events" but there
has not been any kind of memory barrier--it might see it or might not,
until you do something affirmative to make sure (i.e. put it through
UTRACE_STOP, or else get some other callback you're sure happens after
your utrace_set_events call).

For this purpose an "asynchronous attach" means one by a third task (not
the thread itself or the creator during its report_clone), and done when
that third task did not already have some engine that completed a UTRACE_STOP.

This applies even if it is literally synchronous, i.e. if a callback
arranged for the third task to do the attach and set_events and then
blocked waiting for the third task to report its success, we'd call this an
"asynchronous attach" because it didn't synchronize using UTRACE_STOP.

> Still can't understand... If (say) ->report_exec() attaches the new
> engine to the same task and does utrace_set_events(EXEC), then it looks
> logical the new engine gets the notification too. But OK, I agree, either
> way is correct, and perhaps the current behaviour is more intuitive.

As you can see in the cited thread, that's what I thought too.  Jim
convinced me that the (new) current behavior is more useful.  The most
important thing to me is that it's clearly specified one way or the
other for the synchronous case.  It's obviously straightforward to do:

	report_exec(engine, ...)
	{
		new_engine = utrace_attach_task(current, &new_ops);
		utrace_set_events(new_engine, UTRACE_EVENT(EXEC));
		new_ops.report_exec(new_engine, ...);
	}

if you want one of your own callback functions to get another call
there.  OTOH, it's much more cumbersome to make the report_exec
callback used by your new engine keep flags and whatnot to distinguish
the first exec event that preceded that engine's setup from the next
one (which is what the new engine is really there to respond to).

Jim's use seems fairly representative of situations where this might
come up.  He's concerned with the EXEC event as the "old address space
is gone, new one is here" event.  It's also the "my name changed"
event that may be triggering a new tracing setup.  The former use just
wants report_exec to do "wipe out our state and go away" stuff.  The
latter use might want to set up a new incarnation of that sort of
tracing setup--a new engine whose report_exec callback does clean up.
It's obvious how the new engine getting its "clean up now" callback
immediately as a consequence of where the call to set it up came from
is not helpful.  I'm sure this sort of scenario will not be unique
either to Jim's work or to EXEC callbacks in particular.

> But this means that "When target == current it would be safe just to call
> splice_attaching() right here" part of the comment is not right, no?
> Except for report_reap() target == current.

It would be "safe", meaning it doesn't have race problems like the
target != current case does for touching ->attached here.  That's what
the comment says (and that's what the code used to do).  The reason we
don't do it (any more) is the explicit choice for API semantics, not
any implementation reason (in the implementation it is indeed an
obvious optimization if you are understanding the code).  That's why
the comment is there.

> Yes, yes, I see. But I meant another case. Suppose that the debugger D
> attaches to T and does
> 
> 	engine = utrace_attach_task(T, ...);
> 	utrace_set_events(T, engine, XXX);
> 
> It is possible that ->report_xxx() is called before utrace_set_events()
> completes. But afaics currently this is not a problem.

As far as the API guarantees are concerned, there is no "completes".
When you call utrace_set_events, it becomes possible your callbacks
get made.  The return value (a failure return, not -EINPROGRESS) can
say that you are now sure no callback was made or will be.  But when
you called, you wanted it to be possible.  If you didn't, then you
should have made sure it was fully stopped via UTRACE_STOP before you
called utrace_set_events.


Thanks,
Roland


From roland at redhat.com  Thu Mar 12 07:36:52 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 12 Mar 2009 00:36:52 -0700 (PDT)
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: Oleg Nesterov's message of  Wednesday,
	11 March 2009 23:24:01 +0100 <20090311222401.GA13512@redhat.com>
References: <20090311222401.GA13512@redhat.com>
Message-ID: <20090312073652.75811FC3B6@magilla.sf.frob.com>

> I'd like to ask you to clarify what utrace->stopped means...

I'm very glad you are looking into this area!

> My understanding is: if we see ->stopped == true under utrace->lock, then
> the target can do nothing "interesting" from the utrace's pov. The target
> should take utrace->lock at least once. Either in finish_utrace_stop(), or,
> if ->stopped was set by do_signal_stop() path, the target will call
> tracehook_get_signal()->utrace_get_signal(). So we can assume the target
> is "quiescent" and we can do, for example, UTRACE_DETACH safely.

Correct.

> But utrace_report_jctl() doesn't look right to me,
> 
> 	spin_lock(&utrace->lock);
> 	utrace->stopped = 0;
> 	utrace->report = 0;
> 	spin_unlock(&utrace->lock);
> 
> I must admit, I dont't understand the comment above, but obviously this is
> right, we should clear ->stopped. If nothing else, REPORT()->start_report()
> won't be happy if ->stopped.

The comment mentions "utrace being removed", which is a bit of old text
referring to an indirect struct utrace.  Aside from that, please tell me
what is not clear about that comment.

> But ->stopped can be restored right after we clear it! Yes, utrace_do_stop()
> and utrace_set_events() set ->stopped == 1 only if ->utrace_flags has no JCTL,
> and since we are here we must have JCTL.

That's indeed the logic intended to prevent ->stopped being set again here.

> But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be
> already removed from ->utrace_flags, exactly because ->stopped was true.

I don't follow this.  JCTL is never "removed" from ->utrace_flags, except
as all event bits are, by utrace_reset().

> This leads to another minor question, how it is possible to enter enter
> utrace_report_jctl() with ->stopped == 1 ? I think the only possibility
> it was previously set by another call to utrace_report_jctl(), see below.

There are two ways to enter utrace_report_jctl with ->stopped set.

1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then.
   Now utrace_report_jctl is called for the CLD_CONTINUED case, and
   ->stopped remains set.

2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
   already in TASK_STOPPED (and really stopped, or at least got past
   tracehook_notify_jctl before JCTL was set).  It sets ->stopped before
   adding JCTL to ->utrace_flags, so that utrace_control() will consider
   the target stopped.

> SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in
> progress and it is not finished yet. 

SIGNAL_STOP_STOPPED should be reliable, as far as it goes.  It will only be
set if the group stop is complete.  If then a SIGCONT+stop signal come,
SIGCONT will clear SIGNAL_STOP_STOPPED before the stop signal starts
another group stop.  (We have no bad old PTRACE_CONT implementation to
conflict with here.)

> But ->group_stop_count is not reliable too. It it possible that we
> recieved SIGCONT and then another SIGSTOP. If another thread has already
> dequeued this SIGSTOP and initiated the new group stop, we can't just set
> TASK_STOPPED, we must participate in the ->group_stop_count accounting.

It's worse than that!  If we came out of TASK_STOPPED, we did it implicitly
and without holding the siglock.  

We participated in group_stop_count accounting for the first stop before we
got here.  If we stayed in TASK_STOPPED throughout the callbacks, then that
bookkeeping is still correct.

If the initiation of the new group stop happened while we were in
TASK_STOPPED, we were omitted from the count but we should stop again.  
In that case we should stop either if SIGNAL_STOP_STOPPED is set or if
group_stop_count > 0.  Since we weren't counted, if group_stop_count==0
then SIGNAL_STOP_STOPPED will be set (again).

If that initiation happened while a callback (e.g.) blocked in kmalloc or
after (i.e. we were not in TASK_STOPPED), we were included in that count.
In that case we need to decrement group_stop_count and stop again, but
possibly also need to call do_notify_parent_cldstop again if it was 1.  For
that we'd do the right thing just by returning in TASK_RUNNING.  We'll just
come right back around in get_signal_to_deliver and handle group_stop_count
normally.

The trouble is that we have no way to distinguish these two cases, i.e. to
know whether or not we were counted in group_stop_count.  Am I missing a
way?  (The one piece of information we are not using is the @notify
argument: it tells us whether we were the thread responsible for setting
SIGNAL_STOP_STOPPED just before we got here.  But I don't see how that helps.)

I think the bottom line is that we can't ever allow any transition to or
from TASK_STOPPED when we don't hold the siglock.  Every such transition
must hold that lock to manage group_stop_count and SIGNAL_STOP_STOPPED.

That suggests we must preemptively go back to TASK_RUNNING before making
the callbacks, just in case they would do the transition.  We'd take the
siglock and manage the bookkeeping.  But I'm not sure yet how best to do
that.  I'm not sure if we can safely clear SIGNAL_STOP_STOPPED momentarily
after it's been set.  This all happens before do_notify_parent_cldstop is
called, which avoids a whole can of worms about do_wait() I was starting to
worry about.  Hmm.  Seems like there should be something we can do using
group_stop_count and/or checking the SIGNAL_CLD_* bits to notice a SIGCONT
having come in.

> 
> 	if (task_is_stopped(current)) {
> 		/*
> 		 * While in TASK_STOPPED, we can be considered safely
> 		 * stopped by utrace_do_stop() only once we set this.
> 		 */
> 		spin_lock(&utrace->lock);
> 		utrace->stopped = 1;
> 		spin_unlock(&utrace->lock);
> 
> I think this is correct, but it is not easy to understand. SIGCONT may
> come right after the task_is_stopped() check, so this _looks_ racy.

Right, all that matters is that we are always on a path that goes back
through utrace_get_signal() before doing anything else utrace thinks about.

> But, nobody should clear ->utrace_flags without calling utrace_wakeup()
> which clears ->stopped too. 

Right.

> This means that the target can't escape
> from get_signal_to_deliver() with the ->stopped == 1. 

Right, that is the core invariant of all ->stopped logic.

> And in fact, we could check was_stopped instead of task_is_stopped().

Right.  If we were resumed rather than actually stopping now, then
->stopped will be cleared shortly anyway.  Since we have one test or the
other here anyway, the fresh test is a free way to optimize out the lock
and set when it happens to be that case.  (Not that it matters to optimize
this case, but it's free.)

> Is my understanding correct?

I think so.

> But! can't we miss utrace_wakeup() ?

I think you've found something (though not quite the scenario you describe).

> Let's suppose the debugger D attaches the single engine E to the target T.
> 
> 	D does utrace_set_events(JCTL).
> 
> 	T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts
> 	utrace_report_jctl().
> 
> 	D does utrace_set_events(events => 0), this clears E->flags.
> 
> 	T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers
> 	and calls utrace_reset(). It sets ->utrace_flags = 0.

Nope:

			flags |= engine->flags | UTRACE_EVENT(REAP);

If there are any engines left on the list, ->utrace_flags is never zero.

So, change your scenario to:

	D does utrace_control(UTRACE_DETACH).

and then this will happen.

> 	T checks task_is_stopped(), sets ->stopped = 1.

Right.  In the utrace-indirect code, this was even worse!  The dangling
utrace pointer was invalid and should not have been used at all (it should
have fetched the new one under RCU).

> Now, when T is woken by SIGCONT, it returns to user-space bypassing all utrace
> hooks, and runs with ->stopped == 1. This doesn't look right. Say, D can do
> utrace_set_events(ANY) and then T hits start_report()->BUG_ON(utrace->stopped).

Right.  I think it's made safe with:

	if (task_is_stopped(task) &&
	    (task->utrace_flags & UTRACE_EVENT(JCTL))) {

In fact, just task->utrace_flags != 0 would be safe.  But only if JCTL is
set do we actually need to set ->stopped here.  (Otherwise, it will get set
later by utrace_do_stop or utrace_set_events.)


Thanks,
Roland


From renzo at cs.unibo.it  Thu Mar 12 13:13:03 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Thu, 12 Mar 2009 14:13:03 +0100
Subject: [PATCH 1/2] UTRACE_STOP race condition (updated)
Message-ID: <20090312131303.GA25801@cs.unibo.it>

Dear Roland, dear utrace developers,

I have updated my patch #1 (it solves the race condition on utrace_stop but
not the nesting issue) for the latest version of utrace.

I am trying to get the patches updated downloading, compiling and testing
the fixes every week or so... 
Things would be easier if these patch could be merged in the mainstream ;-)

renzo
----
diff -Naur linux-2.6.29-rc7-git5-utrace/kernel/utrace.c linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c
--- linux-2.6.29-rc7-git5-utrace/kernel/utrace.c	2009-03-12 11:00:09.000000000 +0100
+++ linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c	2009-03-12 11:05:50.000000000 +0100
@@ -376,6 +376,13 @@
 	return killed;
 }
 
+static void mark_engine_wants_stop(struct utrace_engine *engine);
+static void clear_engine_wants_stop(struct utrace_engine *engine);
+static bool engine_wants_stop(struct utrace_engine *engine);
+static void mark_engine_wants_resume(struct utrace_engine *engine);
+static void clear_engine_wants_resume(struct utrace_engine *engine);
+static bool engine_wants_resume(struct utrace_engine *engine);
+
 /*
  * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
  * @task == current, @utrace == current->utrace, which is not locked.
@@ -385,6 +392,7 @@
 static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
 {
 	bool killed;
+	struct utrace_engine *engine, *next;
 
 	/*
 	 * @utrace->stopped is the flag that says we are safely
@@ -406,7 +414,23 @@
 		return true;
 	}
 
-	utrace->stopped = 1;
+	/* final check: it is really needed to stop? */
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
+			else
+				utrace->stopped = 1;
+		}
+	}
+	if (unlikely(!utrace->stopped)) {
+		spin_unlock_irq(&task->sighand->siglock);
+		spin_unlock(&utrace->lock);
+		return false;
+	}
+
 	__set_current_state(TASK_TRACED);
 
 	/*
@@ -632,6 +656,7 @@
  * to record whether the engine is keeping the target thread stopped.
  */
 #define ENGINE_STOP		(1UL << _UTRACE_NEVENTS)
+#define ENGINE_RESUME		(1UL << (_UTRACE_NEVENTS+1))
 
 static void mark_engine_wants_stop(struct utrace_engine *engine)
 {
@@ -648,6 +673,21 @@
 	return (engine->flags & ENGINE_STOP) != 0;
 }
 
+static void mark_engine_wants_resume(struct utrace_engine *engine)
+{
+	engine->flags |= ENGINE_RESUME;
+}
+
+static void clear_engine_wants_resume(struct utrace_engine *engine)
+{
+	engine->flags &= ~ENGINE_RESUME;
+}
+
+static bool engine_wants_resume(struct utrace_engine *engine)
+{
+	return (engine->flags & ENGINE_RESUME) != 0;
+}
+
 /**
  * utrace_set_events - choose which event reports a tracing engine gets
  * @target:		thread to affect
@@ -906,6 +946,10 @@
 			list_move(&engine->entry, &detached);
 		} else {
 			flags |= engine->flags | UTRACE_EVENT(REAP);
+			if (engine_wants_resume(engine)) {
+				clear_engine_wants_stop(engine);
+				clear_engine_wants_resume(engine);
+			}
 			wake = wake && !engine_wants_stop(engine);
 		}
 	}
@@ -1133,6 +1177,7 @@
 		 * There might not be another report before it just
 		 * resumes, so make sure single-step is not left set.
 		 */
+		mark_engine_wants_resume(engine);
 		if (likely(resume))
 			user_disable_single_step(target);
 		break;


From renzo at cs.unibo.it  Thu Mar 12 13:13:30 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Thu, 12 Mar 2009 14:13:30 +0100
Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated)
Message-ID: <20090312131330.GB25801@cs.unibo.it>

Dear Roland, dear utrace developers,

I have update also the second patch. Please note that now this patch
must be applied after the first one.
This patch implements a consistent nesting model for utrace machines.
(There is a full description in the messages I sent on Feb. 14 and Mar. 6)

renzo
---
diff -Naur linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c
--- linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c	2009-03-12 11:05:50.000000000 +0100
+++ linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c	2009-03-12 13:37:27.000000000 +0100
@@ -1405,6 +1405,7 @@
 static bool finish_callback(struct utrace *utrace,
 			    struct utrace_report *report,
 			    struct utrace_engine *engine,
+			    struct task_struct *task,
 			    u32 ret)
 {
 	enum utrace_resume_action action = utrace_resume_action(ret);
@@ -1426,6 +1427,7 @@
 				spin_lock(&utrace->lock);
 				mark_engine_wants_stop(engine);
 				spin_unlock(&utrace->lock);
+				utrace_stop(task, utrace);
 			}
 		} else if (engine_wants_stop(engine)) {
 			spin_lock(&utrace->lock);
@@ -1492,7 +1494,7 @@
 	ops = engine->ops;
 
 	if (want & UTRACE_EVENT(QUIESCE)) {
-		if (finish_callback(utrace, report, engine,
+		if (finish_callback(utrace, report, engine, task,
 				    (*ops->report_quiesce)(report->action,
 							   engine, task,
 							   event)))
@@ -1526,24 +1528,24 @@
  * @callback is the name of the member in the ops vector, and remaining
  * args are the extras it takes after the standard three args.
  */
-#define REPORT(task, utrace, report, event, callback, ...)		      \
+#define REPORT(reverse, task, utrace, report, event, callback, ...)		      \
 	do {								      \
 		start_report(utrace);					      \
-		REPORT_CALLBACKS(task, utrace, report, event, callback,	      \
+		REPORT_CALLBACKS(reverse, task, utrace, report, event, callback,	      \
 				 (report)->action, engine, current,	      \
 				 ## __VA_ARGS__);  	   		      \
 		finish_report(report, task, utrace);			      \
 	} while (0)
-#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...)	      \
+#define REPORT_CALLBACKS(reverse, task, utrace, report, event, callback, ...)	      \
 	do {								      \
 		struct utrace_engine *engine;				      \
 		const struct utrace_engine_ops *ops;			      \
-		list_for_each_entry(engine, &utrace->attached, entry) {	      \
+		list_for_each_entry ## reverse(engine, &utrace->attached, entry) {	      \
 			ops = start_callback(utrace, report, engine, task,    \
 					     event);			      \
 			if (!ops)					      \
 				continue;				      \
-			finish_callback(utrace, report, engine,		      \
+			finish_callback(utrace, report, engine, task,		      \
 					(*ops->callback)(__VA_ARGS__));	      \
 		}							      \
 	} while (0)
@@ -1558,7 +1560,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
+	REPORT(, task, utrace, &report, UTRACE_EVENT(EXEC),
 	       report_exec, fmt, bprm, regs);
 }
 
@@ -1573,7 +1575,7 @@
 	INIT_REPORT(report);
 
 	start_report(utrace);
-	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
+	REPORT_CALLBACKS(_reverse, task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
 			 report_syscall_entry, report.result | report.action,
 			 engine, current, regs);
 	finish_report(&report, task, utrace);
@@ -1615,7 +1617,7 @@
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
+	REPORT(, task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
 	       report_syscall_exit, regs);
 }
 
@@ -1640,7 +1642,7 @@
 	start_report(utrace);
 	utrace->cloning = child;
 
-	REPORT_CALLBACKS(task, utrace, &report,
+	REPORT_CALLBACKS(, task, utrace, &report,
 			 UTRACE_EVENT(CLONE), report_clone,
 			 report.action, engine, task, clone_flags, child);
 
@@ -1708,7 +1710,7 @@
 	utrace->report = 0;
 	spin_unlock(&utrace->lock);
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
+	REPORT(, task, utrace, &report, UTRACE_EVENT(JCTL),
 	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
 
 	if (was_stopped && !task_is_stopped(task)) {
@@ -1745,7 +1747,7 @@
 	INIT_REPORT(report);
 	long orig_code = *exit_code;
 
-	REPORT(task, utrace, &report, UTRACE_EVENT(EXIT),
+	REPORT(, task, utrace, &report, UTRACE_EVENT(EXIT),
 	       report_exit, orig_code, exit_code);
 
 	if (report.action == UTRACE_STOP)
@@ -1784,7 +1786,7 @@
 	utrace->interrupt = 0;
 	spin_unlock(&utrace->lock);
 
-	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH),
+	REPORT_CALLBACKS(, task, utrace, &report, UTRACE_EVENT(DEATH),
 			 report_death, engine, task, group_dead, signal);
 
 	spin_lock(&utrace->lock);
@@ -2129,7 +2131,7 @@
 			break;
 		}
 
-		finish_callback(utrace, &report, engine, ret);
+		finish_callback(utrace, &report, engine, task, ret);
 	}
 
 	/*


From oleg at redhat.com  Thu Mar 12 17:21:28 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 18:21:28 +0100
Subject: [PATCH 1/2] UTRACE_STOP race condition (updated)
In-Reply-To: <20090312131303.GA25801@cs.unibo.it>
References: <20090312131303.GA25801@cs.unibo.it>
Message-ID: <20090312172128.GA26657@redhat.com>

Hi Renzo,

This patch needs Roland's review, but I'd like to participate...

On 03/12, Renzo Davoli wrote:
>
> I have updated my patch #1 (it solves the race condition on utrace_stop but
> not the nesting issue) for the latest version of utrace.
>
> I am trying to get the patches updated downloading, compiling and testing
> the fixes every week or so...
> Things would be easier if these patch could be merged in the mainstream ;-)

I think it would be better if you describe the problem in the changelog.
It is not convenient to dig the archives to understand which problem
this patch fixes.

Can't really comment this change because I don't understand what is the
supposed behaviour of utrace_control(UTRACE_RESUME). Perhaps the caller
should wait until the target is stopped? The comment says:

	 case UTRACE_RESUME:
		* This and all other cases imply resuming if stopped.

it doesn't explain what should we do if it is not stopped yet.

>  static bool utrace_stop(struct task_struct *task, struct utrace *utrace)
>  {
>  	bool killed;
> +	struct utrace_engine *engine, *next;
>
>  	/*
>  	 * @utrace->stopped is the flag that says we are safely
> @@ -406,7 +414,23 @@
>  		return true;
>  	}
>
> -	utrace->stopped = 1;
> +	/* final check: it is really needed to stop? */
> +	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {

I think we can do this earlier, before taking ->siglock

> +		if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) {

Do we need "!= &utrace_detached_ops" check? mark_engine_detached() removes
ENGINE_STOP from ->flags.

> +                       if (engine_wants_resume(engine)) {
> +                               clear_engine_wants_stop(engine);
> +                               clear_engine_wants_resume(engine);
> +                       }

I'm afraid _wants_resume() adds another problem. Let's suppose we do

	utrace_control(UTRACE_RESUME);
	utrace_control(UTRACE_STOP);

UTRACE_STOP doesn't do clear_engine_wants_resume(), so it can be lost.

And. Let's suppose we call utrace_control(UTRACE_RESUME), and later
report_xxx() returns UTRACE_STOP. Again, this stop request can be lost.
This doesn't look consistent.


Do we really need _wants_resume()? Note that utrace_control(UTRACE_RESUME)
does clear_engine_wants_stop(). Yes, we can race with finish_callback()
in case when ->report_xxx() returns UTRACE_STOP. But, perhaps, in that
case the caller of utrace_control(UTRACE_RESUME) should take care about
the synchronization with its own callbacks? Something like:

	make_sure_my_callback_wont_return_UTRACE_STOP();
	utrace_barrier();
	utrace_control(UTRACE_RESUME);

This way utrace_stop() can just check engine_wants_stop().

Oleg.


From oleg at redhat.com  Thu Mar 12 17:35:32 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 18:35:32 +0100
Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated)
In-Reply-To: <20090312131330.GB25801@cs.unibo.it>
References: <20090312131330.GB25801@cs.unibo.it>
Message-ID: <20090312173532.GB26657@redhat.com>

On 03/12, Renzo Davoli wrote:
>
> I have update also the second patch. Please note that now this patch
> must be applied after the first one.
> This patch implements a consistent nesting model for utrace machines.
> (There is a full description in the messages I sent on Feb. 14 and Mar. 6)

This patch does 2 completely different things. I think you should
make separate patches.

Again, we need Roland's opinion, but could you explain why it would
be better to use _reverse in utrace_report_syscall_entry() ?

As for another change,

> --- linux-2.6.29-rc7-git5-utrace-p1/kernel/utrace.c	2009-03-12 11:05:50.000000000 +0100
> +++ linux-2.6.29-rc7-git5-utrace-p2/kernel/utrace.c	2009-03-12 13:37:27.000000000 +0100
> @@ -1405,6 +1405,7 @@
>  static bool finish_callback(struct utrace *utrace,
>  			    struct utrace_report *report,
>  			    struct utrace_engine *engine,
> +			    struct task_struct *task,
>  			    u32 ret)
>  {
>  	enum utrace_resume_action action = utrace_resume_action(ret);
> @@ -1426,6 +1427,7 @@
>  				spin_lock(&utrace->lock);
>  				mark_engine_wants_stop(engine);
>  				spin_unlock(&utrace->lock);
> +				utrace_stop(task, utrace);

I don't think this is safe. If we do utrace_stop() here, the next engine
can be detached before we return (UTRACE_DETACH assumes it it safe to
unlink the engine when the target is stopped). This means we can't
continue list_for_each_entry(engine, &utrace->attached, entry) after
return from finish_callback().

Oleg.


From oleg at redhat.com  Thu Mar 12 19:07:38 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 20:07:38 +0100
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: <20090312073652.75811FC3B6@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
Message-ID: <20090312190738.GA3529@redhat.com>

Roland, I left some parts of your message unanswered because I need to think
more about them...

On 03/12, Roland McGrath wrote:
>
> > But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be
> > already removed from ->utrace_flags, exactly because ->stopped was true.
>
> I don't follow this.  JCTL is never "removed" from ->utrace_flags, except
> as all event bits are, by utrace_reset().

Yep. And utrace_reset() can be called because ->stopped == 1.

Let me explain. Again, let's suppose D attaches engine E to the target T.

T enters utrace_report_jctl() with ->stopped == 1.

D calls utrace_set_events(events => 0), this removes JCTL from E->flags.

D calls, say, utrace_control(UTRACE_RESUME). Since ->stopped == 1, this
calls utrace_reset() and removes JCTL from T->utrace_flags.

T takes utrace->lock, clears ->stopped, and drops the lock.

D does utrace_control(UTRACE_STOP). This calls utrace_do_stop() which
sees task_is_stopped() && !JCTL, so it sets ->stopped = true.

T calls REPORT() and start_report() hits the (correct) BUG_ON(stopped).

No?

> > This leads to another minor question, how it is possible to enter enter
> > utrace_report_jctl() with ->stopped == 1 ? I think the only possibility
> > it was previously set by another call to utrace_report_jctl(), see below.
>
> There are two ways to enter utrace_report_jctl with ->stopped set.
>
> 1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then.
>    Now utrace_report_jctl is called for the CLD_CONTINUED case, and
>    ->stopped remains set.

this is covered by my guess above,

> 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
>    already in TASK_STOPPED (and really stopped, or at least got past
>    tracehook_notify_jctl before JCTL was set).  It sets ->stopped before
>    adding JCTL to ->utrace_flags,

Yes, thanks. I missed this.

> > SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in
> > progress and it is not finished yet.
>
> SIGNAL_STOP_STOPPED should be reliable, as far as it goes.  It will only be
> set if the group stop is complete.

Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set?
This doesn't mean we don't need  __set_current_state(TASK_STOPPED), it is
possible that the group-stop is in progress and ->group_stop_count != 0.

> > But! can't we miss utrace_wakeup() ?
>
> I think you've found something (though not quite the scenario you describe).
>
> > Let's suppose the debugger D attaches the single engine E to the target T.
> >
> > 	D does utrace_set_events(JCTL).
> >
> > 	T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts
> > 	utrace_report_jctl().
> >
> > 	D does utrace_set_events(events => 0), this clears E->flags.
> >
> > 	T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers
> > 	and calls utrace_reset(). It sets ->utrace_flags = 0.
>
> Nope:
>
> 			flags |= engine->flags | UTRACE_EVENT(REAP);

Ah, thanks. Can't understand how I didn't notice this, I checked the
code several times ;)

But as you pointed out,

> So, change your scenario to:
>
> 	D does utrace_control(UTRACE_DETACH).
>
> and then this will happen.

Yes.

Oleg.


From oleg at redhat.com  Thu Mar 12 19:50:21 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 20:50:21 +0100
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: <20090312073652.75811FC3B6@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
Message-ID: <20090312195021.GB3529@redhat.com>

On 03/12, Roland McGrath wrote:
>
> > 	T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers
> > 	and calls utrace_reset(). It sets ->utrace_flags = 0.
>
> Nope:
>
> 			flags |= engine->flags | UTRACE_EVENT(REAP);

Hmm. But this leads to another question: why does utrace_reset() set
UTRACE_EVENT(REAP) ?

This looks as: make sure ->utrace_flags is never 0 unless we detach
all engines. Perhaps because sometimes, say tracehook_notify_resume(),
we just check task_utrace_flags() != 0 ?

Imho, this needs a comment. Or I missed something obvious.


Oh. utrace_attach_task()->utrace_add_engine() sets ->report + TIF_NOTIFY_RESUME.
But tracehook_notify_resume() does nothing because ->utrace_flags == 0 ?
Confused.

Oleg.


From oleg at redhat.com  Thu Mar 12 20:36:09 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 21:36:09 +0100
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: <20090312195021.GB3529@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
Message-ID: <20090312203609.GC3529@redhat.com>

I'm afraid I wasn't clear again,

On 03/12, Oleg Nesterov wrote:
>
> Oh. utrace_attach_task()->utrace_add_engine() sets ->report + TIF_NOTIFY_RESUME.
> But tracehook_notify_resume() does nothing because ->utrace_flags == 0 ?
> Confused.

Perhaps this is not problem per se. But let's suppose we call, say,
utrace_control(UTRACE_STOP) later. utrace_do_stop() sees ->report == 1
and doesn't call set_notify_resume(). But TIF_NOTIFY_RESUME was already
cleared by do_notify_resume().

And again, utrace_control(UTRACE_STOP) does not set ->utrace_flags != 0
itself. But even if we called utrace_set_events(XXX) before, without
set_notify_resume() we have to wait for that XXX event, this doesn't
look right.

Oleg.


From oleg at redhat.com  Thu Mar 12 21:40:37 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Thu, 12 Mar 2009 22:40:37 +0100
Subject: Q: ->attaching && REPORT_CALLBACKS()
In-Reply-To: <20090312051246.B50E5FC3B6@magilla.sf.frob.com>
References: <20090310212247.GA12258@redhat.com>
	<20090311001137.2D625FC3B6@magilla.sf.frob.com>
	<20090312001521.GA16303@redhat.com>
	<20090312051246.B50E5FC3B6@magilla.sf.frob.com>
Message-ID: <20090312214037.GA10462@redhat.com>

On 03/11, Roland McGrath wrote:
>
> > But not vise versa. I misunderstood the comment as if the new engine
> > should not be notified if it is attached by another task while target
> > is inside callback.
>
> That is indeed what happens in that case.  But that one is not a
> specific "should not", it's just what happens to be true given what we
> say about the "asynchronous" attach case in general.  That is, that an
> "asynchronous" attach + set_events makes no guarantees about how
> instantly you start to get event reports.

Yes, yes, I understand. In short: I greatly misinterpreted the comment.

> > Still can't understand... If (say) ->report_exec() attaches the new
> > engine to the same task and does utrace_set_events(EXEC), then it looks
> > logical the new engine gets the notification too. But OK, I agree, either
> > way is correct, and perhaps the current behaviour is more intuitive.
>
> As you can see in the cited thread, that's what I thought too.  Jim
> convinced me that the (new) current behavior is more useful.
> ...
> Jim's use seems fairly representative of situations where this might
> come up.  He's concerned with the EXEC event as the "old address space
> is gone, new one is here" event.  It's also the "my name changed"
> event that may be triggering a new tracing setup.

Aha, thanks!

> > But this means that "When target == current it would be safe just to call
> > splice_attaching() right here" part of the comment is not right, no?
> > Except for report_reap() target == current.
>
> It would be "safe", meaning it doesn't have race problems like the
> target != current case does for touching ->attached here.

Yes, I see. Again, I confused "safe" with "not what the user wants".

> > Yes, yes, I see. But I meant another case. Suppose that the debugger D
> > attaches to T and does
> >
> > 	engine = utrace_attach_task(T, ...);
> > 	utrace_set_events(T, engine, XXX);
> >
> > It is possible that ->report_xxx() is called before utrace_set_events()
> > completes. But afaics currently this is not a problem.
>
> As far as the API guarantees are concerned, there is no "completes".
> When you call utrace_set_events, it becomes possible your callbacks
> get made.

Yes sure.

Thanks!

Oleg.


From roland at redhat.com  Thu Mar 12 22:40:55 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 12 Mar 2009 15:40:55 -0700 (PDT)
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: Oleg Nesterov's message of  Thursday,
	12 March 2009 20:07:38 +0100 <20090312190738.GA3529@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
Message-ID: <20090312224055.BA71CFC3B6@magilla.sf.frob.com>

> Yep. And utrace_reset() can be called because ->stopped == 1.

Right.

> Let me explain. Again, let's suppose D attaches engine E to the target T.
> 
> T enters utrace_report_jctl() with ->stopped == 1.
> 
> D calls utrace_set_events(events => 0), this removes JCTL from E->flags.
> 
> D calls, say, utrace_control(UTRACE_RESUME). Since ->stopped == 1, this
> calls utrace_reset() and removes JCTL from T->utrace_flags.

Right.  In the utrace-indirect code this would have reset the utrace
pointer too.

> T takes utrace->lock, clears ->stopped, and drops the lock.

In the utrace-indirect code, this part would have been harmless even in the
race case where it happened (the more likely case being that task->utrace
was cleared already before utrace_report_jctl looked at it).  (That code
just had the dangling utrace pointer issue I noticed yesterday, at the end
of the function.)

But, yes, this is a problem.  I think this ought to cover it:

@@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what)
 	 * longer considered stopped while we run callbacks.
 	 */
 	spin_lock(&utrace->lock);
+	/*
+	 * Now that we have the lock, check in case utrace_reset() has
+	 * just now cleared UTRACE_EVENT(JCTL) while it considered us
+	 * safely stopped.  In that case, we should not touch ->stopped
+	 * and have nothing else to do.
+	 */
+	if (unlikely(!(task->utrace_flags & UTRACE_EVENT(JCTL)))) {
+		spin_unlock(&utrace->lock);
+		return;
+	}
 	utrace->stopped = 0;
 	utrace->report = 0;
 	spin_unlock(&utrace->lock);

> > 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
> >    already in TASK_STOPPED (and really stopped, or at least got past
> >    tracehook_notify_jctl before JCTL was set).  It sets ->stopped before
> >    adding JCTL to ->utrace_flags,
> 
> Yes, thanks. I missed this.

I feel I should also point out the case where exit_signals() calls
tracehook_notify_jctl, because I just noticed it.  I don't think that path
existed the last time I thought seriously about the utrace_report_jctl
logic.  (This is not a #3 in that list, but in general is another path we
need to keep in mind here.)

> Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set?
> This doesn't mean we don't need  __set_current_state(TASK_STOPPED), it is
> possible that the group-stop is in progress and ->group_stop_count != 0.

Right.


Thanks,
Roland


From roland at redhat.com  Thu Mar 12 23:16:07 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 12 Mar 2009 16:16:07 -0700 (PDT)
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: Oleg Nesterov's message of  Thursday,
	12 March 2009 20:50:21 +0100 <20090312195021.GB3529@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
Message-ID: <20090312231607.7F9E5FC3B6@magilla.sf.frob.com>

> Hmm. But this leads to another question: why does utrace_reset() set
> UTRACE_EVENT(REAP) ?
> 
> This looks as: make sure ->utrace_flags is never 0 unless we detach
> all engines. Perhaps because sometimes, say tracehook_notify_resume(),
> we just check task_utrace_flags() != 0 ?

Right, it's an invariant that utrace_flags != 0 if there is any utrace
stuff to do.  It just fits logically too.  The utrace_flags bits mean "need
to call into utrace", so UTRACE_EVENT(REAP) means that we need to call
utrace_release_task.

> Imho, this needs a comment. Or I missed something obvious.

Sure, better comments are always good.  How's this?

@@ -899,6 +899,10 @@ static void utrace_reset(struct task_struct *task, struct utrace *utrace,
 	 * of the interests of the remaining tracing engines.
 	 * For any engine marked detached, remove it from the list.
 	 * We'll collect them on the detached list.
+	 *
+	 * Any engine that's not detached implies tracking the REAP event,
+	 * whether or not that engine wants a report_reap callback.  Any
+	 * engine requires attention from utrace_release_task().
 	 */
 	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
 		if (engine->ops == &utrace_detached_ops) {

> Oh. utrace_attach_task()->utrace_add_engine() sets ->report + TIF_NOTIFY_RESUME.
> But tracehook_notify_resume() does nothing because ->utrace_flags == 0 ?

The logic (in the utrace_add_engine comment) is to have ->report just to
make sure splice_attaching() precedes the next reporting pass (start_report).
It doesn't actually care about TIF_NOTIFY_RESUME (i.e. how soon the report
happens), but just wants to keep the invariant that ->report matches
TIF_NOTIFY_RESUME.  But as you point out, this invariant will be violated
later if tracehook_notify_resume() sees ->utrace_flags == 0.

> Perhaps this is not problem per se. But let's suppose we call, say,
> utrace_control(UTRACE_STOP) later. utrace_do_stop() sees ->report == 1
> and doesn't call set_notify_resume(). But TIF_NOTIFY_RESUME was already
> cleared by do_notify_resume().

Right.

So I think we need this:

@@ -181,7 +181,13 @@ static int utrace_add_engine(struct task_struct *target,
 		 * also set.  Otherwise utrace_control() or utrace_do_stop()
 		 * might skip setting TIF_NOTIFY_RESUME upon seeing ->report
 		 * already set, and we'd miss a necessary callback.
+		 *
+		 * In case we had no engines before, make sure that
+		 * utrace_flags is not zero when tracehook_notify_resume()
+		 * checks.  That would bypass utrace reporting clearing
+		 * TIF_NOTIFY_RESUME, and thus violate the same invariant.
 		 */
+		target->utrace_flags |= UTRACE_EVENT(REAP);
 		list_add_tail(&engine->entry, &utrace->attaching);
 		utrace->report = 1;
 		set_notify_resume(target);

Does that need a barrier pair here and in tracehook_notify_resume()?


Thanks,
Roland


From renzo at cs.unibo.it  Fri Mar 13 06:36:17 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Fri, 13 Mar 2009 07:36:17 +0100
Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated)
In-Reply-To: <20090312173532.GB26657@redhat.com>
References: <20090312131330.GB25801@cs.unibo.it>
	<20090312173532.GB26657@redhat.com>
Message-ID: <20090313063616.GA11403@cs.unibo.it>

> Again, we need Roland's opinion, but could you explain why it would
> be better to use _reverse in utrace_report_syscall_entry() ?

I refer to this posting:
http://www.mail-archive.com/utrace-devel at redhat.com/msg00579.html

Item #4 explains why it is *needed* to reverse the order in utrace_report_syscall_entry
to have a consistent implementation of nested virtualization.

> I don't think this is safe. If we do utrace_stop() here, the next engine
> can be detached before we return (UTRACE_DETACH assumes it it safe to
> unlink the engine when the target is stopped). This means we can't
> continue list_for_each_entry(engine, &utrace->attached, entry) after
> return from finish_callback().

Maybe this is not the best patch, maybe we can solve the problem in a
better way.
The point is explained in #3 in the same posting cited above.

When a report function of an engine returns UTRACE_STOP, it means (may mean)
that it wants to change the status of the process before resuming it.
VM monitors often change the status, sometimes debugger users want to set
some variables too.

IMHO, utrace should stop it *before* calling the report function of the 
next engine, otherwise we need to set up another structure to synchronize
the engines (that may even be unknown one to the other).
If there is a tracer/debugger among the engines, it is not even possible to know
which snapshot it gets, after or before the modification created by the VM
monitor?

With these patches it is possible to run nested virtual machines based
on utrace, it is also possbile to strace (use ptrace) on processes running
inside a VM.

	renzo


From oleg at redhat.com  Fri Mar 13 21:59:12 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Fri, 13 Mar 2009 22:59:12 +0100
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: <20090312231607.7F9E5FC3B6@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
	<20090312231607.7F9E5FC3B6@magilla.sf.frob.com>
Message-ID: <20090313215912.GA1856@redhat.com>

On 03/12, Roland McGrath wrote:
>
> So I think we need this:
>
> @@ -181,7 +181,13 @@ static int utrace_add_engine(struct task_struct *target,
>  		 * also set.  Otherwise utrace_control() or utrace_do_stop()
>  		 * might skip setting TIF_NOTIFY_RESUME upon seeing ->report
>  		 * already set, and we'd miss a necessary callback.
> +		 *
> +		 * In case we had no engines before, make sure that
> +		 * utrace_flags is not zero when tracehook_notify_resume()
> +		 * checks.  That would bypass utrace reporting clearing
> +		 * TIF_NOTIFY_RESUME, and thus violate the same invariant.
>  		 */
> +		target->utrace_flags |= UTRACE_EVENT(REAP);
>  		list_add_tail(&engine->entry, &utrace->attaching);
>  		utrace->report = 1;
>  		set_notify_resume(target);

Agreed.

> Does that need a barrier pair here and in

No, set_notify_resume()->test_and_set_tsk_thread_flag() implies mb(),

> tracehook_notify_resume()?

Ah. I think you are right, and I think it needs the barrier even without
this change. Say, UTRACE_REPORT does:

	utrace->report = 1;
	set_notify_resume();

Without mb() there is no guarantee that utrace_resume() will notice and
clear ->report.

smp_mb__after_clear_bit() is enough, but in that case perhaps it is better
to modify the arch dependent do_notify_resume().


A couple of minor nits, but please remember I often misread the comments.

> Sure, better comments are always good.  How's this?
>
> @@ -899,6 +899,10 @@ static void utrace_reset(struct task_struct *task, struct utrace *utrace,
>  	 * of the interests of the remaining tracing engines.
>  	 * For any engine marked detached, remove it from the list.
>  	 * We'll collect them on the detached list.
> +	 *
> +	 * Any engine that's not detached implies tracking the REAP event,
> +	 * whether or not that engine wants a report_reap callback.  Any
> +	 * engine requires attention from utrace_release_task().
>  	 */
>  	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {

This looks misleading, utrace_release_task() is called unconditionally, and
we could use any unused bit afacis (REAP only makes sense for engine->flags,
we never check ->utrace_flags & REAP). Also, whatever reason we have to keep
->utrace_flags != 0, the same reason applies to ->utrace_flags |= XXX in
utrace_add_engine().

utrace_reset() also does

	if (task->exit_state) {
		flags &= DEAD_FLAGS_MASK;

The comment about DEAD_FLAGS_MASK

	/*
	 * Only these flags matter any more for a dead task (exit_state set).
	 * We use this mask on flags installed in ->utrace_flags after
	 * exit_notify (and possibly utrace_report_death) has run.

Looks a bit confusing to me. Unless exit_notify() calls utrace_report_death()
we don't change ->utrace_flags.

	 * This ensures that utrace_release_task knows positively that
	 * utrace_report_death will not run later.
	 */

Yes. But this means we could do "flags &= ~DEATH_EVENTS" instead. This is
subjective of course, but looks more clean to me.

Note also that utrace_reset() is the only user of DEAD_FLAGS_MASK and
LIVE_FLAGS_MASK	has no users.

Also, it would be better imho to change tracehook_report_death() to use
DEATH_EVENTS too, it is always good when grep can find the usage.

Oleg.


From oleg at redhat.com  Fri Mar 13 23:33:00 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Sat, 14 Mar 2009 00:33:00 +0100
Subject: utrace_set_events/utrace_control && death/reap checks
Message-ID: <20090313233300.GA14605@redhat.com>

utrace_set_events:

	(utrace->death && ((old_flags & ~events) & DEATH_EVENTS))

"(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to
clear DEATH/QUIESCE. Why this is not allowed? And why this is not
allowed _only_ when the target runs utrace_report_death()->REPORT()?

I think this line can be just killed. I guess the intent was to
prevent utrace_release_task() from doing utrace_reap() in parallel
with utrace_report_death(), but note that utrace_set_events() can
never "shrinks" ->utrace_flags, it only sets new bits.

The next line looks strange too, don't we need

	(utrace->reap && ((events & ~old_flags) & UTRACE_EVENT(REAP)))

?


And I don't understand why do we need utrace->death at all. Apart from
utrace_set_events (which I think doesn't need it), it is only used by
utrace_control(UTRACE_DETACH). But I can't see how can we race with
utrace_report_death(). If it can be called, we have DEATH_EVENTS bits
set. But in that case utrace_do_stop() can't succeed, so UTRACE_DETACH
can only do mark_engine_wants_stop() but not utrace_reset().

IOW, could you explain why the patch below is wrong? (and why can't
we kill ->death then).

Oleg.

--- kernel/utrace.c
+++ kernel/utrace.c
@@ -1072,27 +1072,10 @@ int utrace_control(struct task_struct *t
 		/*
 		 * You can't do anything to a dead task but detach it.
 		 * If release_task() has been called, you can't do that.
-		 *
-		 * On the exit path, DEATH and QUIESCE event bits are
-		 * set only before utrace_report_death() has taken the
-		 * lock.  At that point, the death report will come
-		 * soon, so disallow detach until it's done.  This
-		 * prevents us from racing with it detaching itself.
 		 */
-		if (action != UTRACE_DETACH ||
-		    unlikely(utrace->reap)) {
+		if (action != UTRACE_DETACH || unlikely(utrace->reap)) {
 			spin_unlock(&utrace->lock);
 			return -ESRCH;
-		} else if (unlikely(target->utrace_flags & DEATH_EVENTS) ||
-			   unlikely(utrace->death)) {
-			/*
-			 * We have already started the death report, or
-			 * are about to very soon.  We can't prevent
-			 * the report_death and report_reap callbacks,
-			 * so tell the caller they will happen.
-			 */
-			spin_unlock(&utrace->lock);
-			return -EALREADY;
 		}
 	}
 

From oleg at redhat.com  Sat Mar 14 00:14:20 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Sat, 14 Mar 2009 01:14:20 +0100
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: <20090312224055.BA71CFC3B6@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
Message-ID: <20090314001420.GA15677@redhat.com>

On 03/12, Roland McGrath wrote:
>
> > Yep. And utrace_reset() can be called because ->stopped == 1.
>
> Right.
>
> > Let me explain. Again, let's suppose D attaches engine E to the target T.
> >
> > T enters utrace_report_jctl() with ->stopped == 1.
> >
> > D calls utrace_set_events(events => 0), this removes JCTL from E->flags.
> >
> > D calls, say, utrace_control(UTRACE_RESUME). Since ->stopped == 1, this
> > calls utrace_reset() and removes JCTL from T->utrace_flags.
>
> Right.  In the utrace-indirect code this would have reset the utrace
> pointer too.
>
> > T takes utrace->lock, clears ->stopped, and drops the lock.
>
> In the utrace-indirect code, this part would have been harmless even in the
> race case where it happened (the more likely case being that task->utrace
> was cleared already before utrace_report_jctl looked at it).  (That code
> just had the dangling utrace pointer issue I noticed yesterday, at the end
> of the function.)
>
> But, yes, this is a problem.  I think this ought to cover it:
>
> @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what)
>  	 * longer considered stopped while we run callbacks.
>  	 */
>  	spin_lock(&utrace->lock);
> +	/*
> +	 * Now that we have the lock, check in case utrace_reset() has
> +	 * just now cleared UTRACE_EVENT(JCTL) while it considered us
> +	 * safely stopped.  In that case, we should not touch ->stopped
> +	 * and have nothing else to do.
> +	 */
> +	if (unlikely(!(task->utrace_flags & UTRACE_EVENT(JCTL)))) {
> +		spin_unlock(&utrace->lock);
> +		return;

I don't think this can help, even if we clear ->stopped before return.
It is still possible to set ->stopped after that, and since we don't
have JCTL we return from get_signal_to_deliver() bypassing tracehook
calls.

>From the previous message:
>
> That suggests we must preemptively go back to TASK_RUNNING before making
> the callbacks, just in case they would do the transition.
> ...

I thought about this too. But this not easy and not nice.

Roland, I _seem_ to have the vague idea, will return tomorrow.

Oleg.


From grenadier at edanddons.com  Sun Mar 15 01:32:47 2009
From: grenadier at edanddons.com (Bleyer Pasche)
Date: Sun, 15 Mar 2009 01:32:47 +0000
Subject: prolonged erecction
Message-ID: <9821636804.20090315012912@edanddons.com>


PProlonged erection
    

Milk, she (word without utterance) yields diverse festivities
on the hooglychapter xxxvii. The farewell of my life, i
think happy and content. O my love, even the sweat from
his brow, he rises up again together on the fourteenth day
of the dark fortnight..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090315/d1df08cd/attachment.htm>

From ruza277 at inet.hr  Mon Mar 16 08:14:36 2009
From: ruza277 at inet.hr (Joey Hale)
Date: Mon, 16 Mar 2009 11:14:36 +0300
Subject: Every man can have manhood problems. Clever men know how to solve
	it once and for all.
Message-ID: <20090316111436.8060000@inet.hr>

She will love the cnages that blue pilule made with you. 
http://gfa.quietyoung.com/


From oleg at redhat.com  Sun Mar 15 22:33:00 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Sun, 15 Mar 2009 23:33:00 +0100
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: <20090314001420.GA15677@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
Message-ID: <20090315223300.GA10526@redhat.com>

On 03/14, Oleg Nesterov wrote:
>
> On 03/12, Roland McGrath wrote:
> >
> > But, yes, this is a problem.  I think this ought to cover it:
> >
> > @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what)
> >  	 * longer considered stopped while we run callbacks.
> >  	 */
> >  	spin_lock(&utrace->lock);
> > +	/*
> > +	 * Now that we have the lock, check in case utrace_reset() has
> > +	 * just now cleared UTRACE_EVENT(JCTL) while it considered us
> > +	 * safely stopped.  In that case, we should not touch ->stopped
> > +	 * and have nothing else to do.
> > +	 */
> > +	if (unlikely(!(task->utrace_flags & UTRACE_EVENT(JCTL)))) {
> > +		spin_unlock(&utrace->lock);
> > +		return;
>
> I don't think this can help, even if we clear ->stopped before return.
> It is still possible to set ->stopped after that, and since we don't
> have JCTL we return from get_signal_to_deliver() bypassing tracehook
> calls.

I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL.

OK, let's look at utrace_do_stop:

	if (task_is_stopped(target) &&
	    !(target->utrace_flags & UTRACE_EVENT(JCTL))) {
		utrace->stopped = 1;
		return true;
	}

This doesn't look correct. We don't hold ->siglock, the task can be
SIGCONT'ed and return from get_signal_to_deliver(), and then we set
->stopped. Or I missed something again?

Then we re-do this (well, almost) check under ->siglock,

	} else if (task_is_stopped(target)) {
		if (!(target->utrace_flags & UTRACE_EVENT(JCTL)))
			utrace->stopped = stopped = true;
	}

But this is not nice. Let's suppose the task is already stopped, we do
UTRACE_ATTACH + utrace_set_events(JCTL).

Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't
even set ->report. Yes, we can't set ->stopped if JCTL, we can race with
utrace_report_jctl() which does REPORT().


BTW, afaics utrace_report_jctl() has another bug,

	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);

I think it should do

	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
	       report_jctl, what, notify);

instead.

> Roland, I _seem_ to have the vague idea, will return tomorrow.

Well, this idea is not very nice. But see the draft patches below.

With the first patch, we call utrace_report_jctl() before we actually
stop. do_signal_stop() can fail then, but I think this is OK, we can
pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete,
and with this patch we always call ->report_jctl with notify == 0. Just for
discussion.

--- xxx/include/linux/utrace.h~JCTL	2009-03-03 20:43:43.000000000 +0100
+++ xxx/include/linux/utrace.h	2009-03-15 21:55:45.000000000 +0100
@@ -102,7 +102,7 @@ void utrace_report_exit(long *exit_code)
 	__attribute__((weak));
 void utrace_report_death(struct task_struct *, struct utrace *, bool, int)
 	__attribute__((weak));
-void utrace_report_jctl(int notify, int type)
+bool utrace_report_jctl(bool sig_locked, int what)
 	__attribute__((weak));
 void utrace_report_exec(struct linux_binfmt *, struct linux_binprm *,
 			struct pt_regs *regs)
--- xxx/include/linux/tracehook.h~JCTL	2009-03-03 20:40:57.000000000 +0100
+++ xxx/include/linux/tracehook.h	2009-03-15 22:02:05.000000000 +0100
@@ -521,11 +521,11 @@ static inline int tracehook_get_signal(s
  *
  * Called with no locks held.
  */
-static inline int tracehook_notify_jctl(int notify, int why)
+static inline bool tracehook_notify_jctl(bool sig_locked, int why)
 {
 	if (task_utrace_flags(current) & UTRACE_EVENT(JCTL))
-		utrace_report_jctl(notify, why);
-	return notify || (current->ptrace & PT_PTRACED);
+		return utrace_report_jctl(sig_locked, why);
+	return true;
 }
 
 #define DEATH_REAP			-1
--- xxx/kernel/utrace.c~JCTL	2009-03-12 01:21:05.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-15 22:59:36.000000000 +0100
@@ -1637,12 +1637,14 @@ void utrace_finish_vfork(struct task_str
 /*
  * Called iff UTRACE_EVENT(JCTL) flag is set.
  */
-void utrace_report_jctl(int notify, int what)
+bool utrace_report_jctl(bool sig_locked, int what)
 {
 	struct task_struct *task = current;
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
-	bool was_stopped = task_is_stopped(task);
+
+	if (sig_locked)
+		spin_unlock_irq(&task->sighand->siglock);
 
 	/*
 	 * We get here with CLD_STOPPED when we've just entered
@@ -1664,30 +1662,12 @@ void utrace_report_jctl(int notify, int 
 	spin_unlock(&utrace->lock);
 
 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
-	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
+	       report_jctl, what, 0);
 
-	if (was_stopped && !task_is_stopped(task)) {
-		/*
-		 * The event report hooks could have blocked, though
-		 * it should have been briefly.  Make sure we're in
-		 * TASK_STOPPED state again to block properly, unless
-		 * we've just come back out of job control stop.
-		 */
+	if (sig_locked)
 		spin_lock_irq(&task->sighand->siglock);
-		if (task->signal->flags & SIGNAL_STOP_STOPPED)
-			__set_current_state(TASK_STOPPED);
-		spin_unlock_irq(&task->sighand->siglock);
-	}
 
-	if (task_is_stopped(current)) {
-		/*
-		 * While in TASK_STOPPED, we can be considered safely
-		 * stopped by utrace_do_stop() only once we set this.
-		 */
-		spin_lock(&utrace->lock);
-		utrace->stopped = 1;
-		spin_unlock(&utrace->lock);
-	}
+	return task->signal->group_stop_count != 0;
 }
 
 /*
--- xxx/kernel/signal.c~JCTL	2009-03-03 18:11:47.000000000 +0100
+++ xxx/kernel/signal.c	2009-03-15 22:07:30.000000000 +0100
@@ -1641,7 +1641,7 @@ finish_stop(int stop_count)
 	 * a group stop in progress and we are the last to stop,
 	 * report to the parent.  When ptraced, every thread reports itself.
 	 */
-	if (tracehook_notify_jctl(stop_count == 0, CLD_STOPPED)) {
+	if (stop_count == 0) {
 		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current, CLD_STOPPED);
 		read_unlock(&tasklist_lock);
@@ -1785,8 +1785,7 @@ relock:
 		signal->flags &= ~SIGNAL_CLD_MASK;
 		spin_unlock_irq(&sighand->siglock);
 
-		if (unlikely(!tracehook_notify_jctl(1, why)))
-			goto relock;
+		tracehook_notify_jctl(false, why);
 
 		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current->group_leader, why);
@@ -1798,6 +1797,7 @@ relock:
 		struct k_sigaction *ka;
 
 		if (unlikely(signal->group_stop_count > 0) &&
+		    tracehook_notify_jctl(true, CLD_STOPPED) &&
 		    do_signal_stop(0))
 			goto relock;
 
@@ -1872,6 +1872,7 @@ relock:
 				if (is_current_pgrp_orphaned())
 					goto relock;
 
+				tracehook_notify_jctl(false, CLD_STOPPED);
 				spin_lock_irq(&sighand->siglock);
 			}
 
@@ -1953,7 +1954,8 @@ void exit_signals(struct task_struct *ts
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
-	if (unlikely(group_stop) && tracehook_notify_jctl(1, CLD_STOPPED)) {
+	if (unlikely(group_stop)) {
+		tracehook_notify_jctl(false, CLD_STOPPED);
 		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(tsk, CLD_STOPPED);
 		read_unlock(&tasklist_lock);
-------------------------------------------------------------------------------

Now we can change utrace_do_stop(), no need to check JCTL any longer,

--- xxx/kernel/utrace.c~STOP	2009-03-15 22:59:36.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-15 23:29:19.000000000 +0100
@@ -794,20 +794,6 @@ static bool utrace_do_stop(struct task_s
 {
 	bool stopped;
 
-	/*
-	 * If it will call utrace_report_jctl() but has not gotten
-	 * through it yet, then don't consider it quiescent yet.
-	 * utrace_report_jctl() will take @utrace->lock and
-	 * set @utrace->stopped itself once it finishes.  After that,
-	 * it is considered quiescent; when it wakes up, it will go
-	 * through utrace_get_signal() before doing anything else.
-	 */
-	if (task_is_stopped(target) &&
-	    !(target->utrace_flags & UTRACE_EVENT(JCTL))) {
-		utrace->stopped = 1;
-		return true;
-	}
-
 	stopped = false;
 	spin_lock_irq(&target->sighand->siglock);
 	if (unlikely(target->exit_state)) {
@@ -819,8 +805,7 @@ static bool utrace_do_stop(struct task_s
 		if (!(target->utrace_flags & DEATH_EVENTS))
 			utrace->stopped = stopped = true;
 	} else if (task_is_stopped(target)) {
-		if (!(target->utrace_flags & UTRACE_EVENT(JCTL)))
-			utrace->stopped = stopped = true;
+		utrace->stopped = stopped = true;
 	} else if (!utrace->report && !utrace->interrupt) {
 		utrace->report = 1;
 		set_notify_resume(target);
-------------------------------------------------------------------------------

Again, this is not complete and likely buggy. But what do you think?

Oleg.


From roland at redhat.com  Mon Mar 16 01:14:01 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 15 Mar 2009 18:14:01 -0700 (PDT)
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: Oleg Nesterov's message of  Sunday,
	15 March 2009 23:33:00 +0100 <20090315223300.GA10526@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
Message-ID: <20090316011401.8EAE7FC3AB@magilla.sf.frob.com>

> I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL.

Right, that is key.

> OK, let's look at utrace_do_stop:
> 
> 	if (task_is_stopped(target) &&
> 	    !(target->utrace_flags & UTRACE_EVENT(JCTL))) {
> 		utrace->stopped = 1;
> 		return true;
> 	}
> 
> This doesn't look correct. We don't hold ->siglock, the task can be
> SIGCONT'ed and return from get_signal_to_deliver(), and then we set
> ->stopped. Or I missed something again?

I think you're right.  The logic there was supposed to be, "TASK_STOPPED
means it will get into utrace_get_signal()."  That much is true, but
nothing inside utrace_get_signal() actually synchronizes with this to make
that matter.

All this check does is try to optimize the TASK_STOPPED case not to take
the siglock.  That doesn't seem worth much, so we can just drop it.

> Then we re-do this (well, almost) check under ->siglock,
> 
> 	} else if (task_is_stopped(target)) {
> 		if (!(target->utrace_flags & UTRACE_EVENT(JCTL)))
> 			utrace->stopped = stopped = true;
> 	}
> 
> But this is not nice. Let's suppose the task is already stopped, we do
> UTRACE_ATTACH + utrace_set_events(JCTL).

This is exactly why utrace_set_events() sets ->stopped preemptively for
that case.

> Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't
> even set ->report. Yes, we can't set ->stopped if JCTL, we can race with
> utrace_report_jctl() which does REPORT().

Setting JCTL while in TASK_STOPPED made it set ->stopped, so
utrace_control() succeeds without calling utrace_do_stop().

> BTW, afaics utrace_report_jctl() has another bug,
> 
> 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
> 	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
> 
> I think it should do
> 
> 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
> 	       report_jctl, what, notify);
> 
> instead.

There is a bug, but your fix changes a key API choice.
I've put in this fix:

-	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
+	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED,
+	       notify ? what : 0);

There are two things a tracer might be tracking: state or events.
The "state" is whether the thread is in job control stop or is running.
The "events" are the SIGCHLD notifications that the thread tries to post to
its parent.

The @type argument shows the state we will be in after the callback.
If the state changes, there will be another callback.  That's what a
state-tracking tracer needs, e.g. to keep a little light on the screen red
while the thread is stopped and green while it's running.

The @notify argument shows what SIGCHLD the parent sees (if it were
dequeuing all possible SIGCHLD postings as quickly as they come).  That's
what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs
are expected in the parent.

Your change to @type would break state-trackers in the case where
tracehook_notify_jctl() is called from get_signal_to_deliver() with
CLD_STOPPED.

> With the first patch, we call utrace_report_jctl() before we actually
> stop. do_signal_stop() can fail then, but I think this is OK, we can
> pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete,
> and with this patch we always call ->report_jctl with notify == 0. Just for
> discussion.

I think I sort of understand the intent of your patch.  If we change the
calling convention for tracehook_notify_jctl, I think we want to preserve
the aspect that the hook decides about sending the notification.  That's
how the ptrace quirks can be reimplemented differently later without
changing the tracehook layer again.  Also, we certainly don't want one
tracehook call with two different locking conditions.

It seems right in principle to do the reporting before we change ->state,
given that we have to allow for it changing during the callbacks.  And
indeed, that avoids the JCTL special case mess entirely.


Thanks,
Roland


From opinions at imp.uni-erlangen.de  Mon Mar 16 01:18:55 2009
From: opinions at imp.uni-erlangen.de (Judy)
Date: Mon, 16 Mar 2009 03:18:55 +0200
Subject: Are you and your friends fine?
Message-ID: <20090316031855.8040205@imp.uni-erlangen.de>

I hope you haven.t been there http://zihzke.breakingnewsltd.com/news.php


From roland at redhat.com  Mon Mar 16 01:55:41 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 15 Mar 2009 18:55:41 -0700 (PDT)
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: Oleg Nesterov's message of  Friday,
	13 March 2009 22:59:12 +0100 <20090313215912.GA1856@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
	<20090312231607.7F9E5FC3B6@magilla.sf.frob.com>
	<20090313215912.GA1856@redhat.com>
Message-ID: <20090316015541.11C33FC3AB@magilla.sf.frob.com>

> > +		 *
> > +		 * In case we had no engines before, make sure that
> > +		 * utrace_flags is not zero when tracehook_notify_resume()
> > +		 * checks.  That would bypass utrace reporting clearing
> > +		 * TIF_NOTIFY_RESUME, and thus violate the same invariant.
> >  		 */
> > +		target->utrace_flags |= UTRACE_EVENT(REAP);
> >  		list_add_tail(&engine->entry, &utrace->attaching);
> >  		utrace->report = 1;
> >  		set_notify_resume(target);
> 
> Agreed.

I put that in.

> > Does that need a barrier pair here and in
> 
> No, set_notify_resume()->test_and_set_tsk_thread_flag() implies mb(),

Ah, ok.

> > tracehook_notify_resume()?
> 
> Ah. I think you are right, and I think it needs the barrier even without
> this change. Say, UTRACE_REPORT does:
> 
> 	utrace->report = 1;
> 	set_notify_resume();
> 
> Without mb() there is no guarantee that utrace_resume() will notice and
> clear ->report.

Wait, what?  You just said that set_notify_resume() already implies an mb().

> smp_mb__after_clear_bit() is enough, but in that case perhaps it is better
> to modify the arch dependent do_notify_resume().

I don't follow this.  But we don't want a solution that requires changing
arch code.  Why can't tracehook_notify_resume() do whatever is required?

> > +	 *
> > +	 * Any engine that's not detached implies tracking the REAP event,
> > +	 * whether or not that engine wants a report_reap callback.  Any
> > +	 * engine requires attention from utrace_release_task().
> >  	 */
> >  	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
> 
> This looks misleading, utrace_release_task() is called unconditionally, and
> we could use any unused bit afacis (REAP only makes sense for engine->flags,
> we never check ->utrace_flags & REAP). 

It's true that any bit at all would do, but REAP is one that makes some
sense logically and also one that is implicitly reserved in utrace_flags
already (without having to reserve another one from engine.flags).  

It's true that utrace_release_task() is called unconditionally now, but it
might not always be so.  It seems like a very intuitive and useful
invariant that utrace_flags==0 means "utrace totally empty".  

It's unconditional now because the previous code tested the indirect
pointer rather than flags (for reasons we can no longer be very sure of).
If we can convince ourselves about the interlocks, then it would be better
to have it test utrace_flags and not call into utrace.c for the common case
(nor take the utrace lock).

> Also, whatever reason we have to keep ->utrace_flags != 0, the same
> reason applies to ->utrace_flags |= XXX in utrace_add_engine().

Hence the change we agreed to above.

> utrace_reset() also does
> 
> 	if (task->exit_state) {
> 		flags &= DEAD_FLAGS_MASK;
> 
> The comment about DEAD_FLAGS_MASK
> 
> 	/*
> 	 * Only these flags matter any more for a dead task (exit_state set).
> 	 * We use this mask on flags installed in ->utrace_flags after
> 	 * exit_notify (and possibly utrace_report_death) has run.

I think these macros are from when reap did a quiesce callback in a
previous incarnation of the API.  It doesn't make much sense to use
the macro for just UTRACE_EVENT(REAP) now.

> Looks a bit confusing to me. Unless exit_notify() calls utrace_report_death()
> we don't change ->utrace_flags.

If it doesn't call utrace_report_death(), that means DEATH_EVENTS were not
in ->utrace_flags.

> Yes. But this means we could do "flags &= ~DEATH_EVENTS" instead. This is
> subjective of course, but looks more clean to me.
> 
> Note also that utrace_reset() is the only user of DEAD_FLAGS_MASK and
> LIVE_FLAGS_MASK	has no users.

I got rid of those macros and replaced the comment with this:

 	if (task->exit_state) {
+		/*
+		 * Once it's already dead, we never install any flags
+		 * except REAP.  When ->exit_state is set and events
+		 * like DEATH are not set, then they never can be set.
+		 * This ensures that utrace_release_task() knows
+		 * positively that utrace_report_death() can never run.
+		 */
 		BUG_ON(utrace->death);
-		flags &= DEAD_FLAGS_MASK;
+		flags &= UTRACE_EVENT(REAP);
 		wake = false;

I think it makes sense to use this mask because what we are specifically
concerned with here is that utrace_release_task() is the one and only
utrace entry point that the task might take hereafter.

> Also, it would be better imho to change tracehook_report_death() to use
> DEATH_EVENTS too, it is always good when grep can find the usage.

I made _UTRACE_DEATH_EVENTS that common macro.


Thanks,
Roland


From roland at redhat.com  Mon Mar 16 02:34:21 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 15 Mar 2009 19:34:21 -0700 (PDT)
Subject: utrace_set_events/utrace_control && death/reap checks
In-Reply-To: Oleg Nesterov's message of  Saturday,
	14 March 2009 00:33:00 +0100 <20090313233300.GA14605@redhat.com>
References: <20090313233300.GA14605@redhat.com>
Message-ID: <20090316023421.C6136FC3AB@magilla.sf.frob.com>

> utrace_set_events:
> 
> 	(utrace->death && ((old_flags & ~events) & DEATH_EVENTS))
> 
> "(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to
> clear DEATH/QUIESCE. Why this is not allowed? And why this is not
> allowed _only_ when the target runs utrace_report_death()->REPORT()?

This is specifically documented for -EALREADY, and in the DocBook section
"Interlock with final callbacks".  The idea is this:

For most utrace events, you don't know whether you'll get some callbacks.
It could be, the task got SIGKILL first thing after you attached, and it
will never report anything.  That is fine for the most part.  But for the
lifetime events it becomes a real burden on the users of the API.  They
have to manage their data structures, and so they have to know reliably
when they can and can't get what callbacks.

So, the utrace_set_events rules try to ensure that the caller knows for
sure whether it will or won't get a callback when the task dies and/or is
reaped.  You can clear DEATH/QUIESCE, and be sure from the return value
that it is now impossible that there is a report_death/report_quiesce
callback racing with you because the guy just got a SIGKILL.  If you can't
be sure of that, then you do know for sure that your callback is being made
right now or very soon.

> I think this line can be just killed. I guess the intent was to
> prevent utrace_release_task() from doing utrace_reap() in parallel
> with utrace_report_death(), but note that utrace_set_events() can
> never "shrinks" ->utrace_flags, it only sets new bits.

It's not ->utrace_flags that matters here, it's engine->flags.

That is one of the intents, but not the only one.  It's just as important
that the user of the API can rely on the ordering of its callbacks wrt its
utrace_set_events/utrace_control calls as that it can rely on the ordering
of its death and reap callbacks.

> The next line looks strange too, don't we need
> 
> 	(utrace->reap && ((events & ~old_flags) & UTRACE_EVENT(REAP)))
> 
> ?

get_utrace_lock() already returned -ESRCH if it was in EXIT_DEAD, so this
is probably moot.

> And I don't understand why do we need utrace->death at all. Apart from
> utrace_set_events (which I think doesn't need it), it is only used by
> utrace_control(UTRACE_DETACH). But I can't see how can we race with
> utrace_report_death(). If it can be called, we have DEATH_EVENTS bits
> set. But in that case utrace_do_stop() can't succeed, so UTRACE_DETACH
> can only do mark_engine_wants_stop() but not utrace_reset().

It is used by utrace_set_events and utrace_control for the same purpose.
Those calls must know for sure that report_death cannot happen, or else
that it will (or it's already happening).

Many tracers only keep track until death.  For them, the simple thing is to
have report_death clean up their data structures and return UTRACE_DETACH.
But then they also want to do asynchronous detach.  So they can do
utrace_set_events or utrace_control as the synchronizing step of
asynchronous tear-down.  If it returns 0, then report_death will not and it
is safe to destroy data structures the callback code would use.  If it
returns -EALREADY, then report_death will shortly be called and we can rely
on our callback code to take care of the data structures before it returns
UTRACE_DETACH.


Thanks,
Roland


From roland at redhat.com  Mon Mar 16 02:48:22 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 15 Mar 2009 19:48:22 -0700 (PDT)
Subject: UTRACE_STOP race condition?
In-Reply-To: Renzo Davoli's message of  Wednesday,
	11 February 2009 10:59:46 +0100
	<20090211095946.GA2597@cs.unibo.it>
References: <20090211095946.GA2597@cs.unibo.it>
Message-ID: <20090316024822.23585FC3AB@magilla.sf.frob.com>

Thanks very much for the feedback, Renzo.  You seem to be about the only
person to thoroughly exercise this part of the API so far.  I'm sure it can
use some refinement.

> please help me. Either I have not understood the meaning of UTRACE_STOP
> or it is completely useless due to a race condition.

I'm confident it can be a little bit of each. ;-)

> There are always two entities in a utrace interaction: the traced
> process and the tracing module.

There are lots of ways to slice things into a notion like 'entity'.
Let's be precise in what we're specifically discussing right now.
The question at hand is about synchronization between two threads:
a traced task and a control task.

> When a traced event occurs in the traced process the correspondent 
> report function gets called in the module.

Your engine's callback function is run by the traced task, yes.

> If the report function returns UTRACE_STOP the traced process stays in a
> quiescent state and the module wakes it up by a 
> utrace_control(...,UTRACE_RESUME) call *later*.

A control task (i.e. whatever other task) can make this call at some time, yes.

> If the module wakes the traced process too quickly, utrace has not yet put
> it into a "stopped" state, therefore UTRACE_RESUME gets lost.
> As a consequence, the execution is blocked.
> 
> IMHO, given the current utrace code, there is no way to set up some kind
> of synchronization in the module to prevent this error.

I understand what scenario you mean.  The rest of your message talks about
implementation details of utrace internals.  Frankly I find this confusing
and distracting from the API discussion.  I've gone to some pains to
explicitly document what all the API guarantees and requirements are (and
aren't), in the kerneldoc and docbook text.  I would like us to discuss the
problems for writing tracing engines in terms of the documented API
constraints and guarantees.

The API documentation says what the contract is between the kernel and the
module writer.  If that specification is ambiguous, we'll first fix the
descriptions to be clear.  If what it specifies needs to change into a
better contract for module writers, we'll decide what new contract to agree
on.  Finally, if the utrace implementation does not do what it says, then
we'll fix the implementation.  Your postings have thrown all this together,
which does not work for me.

Please start a separate thread about each separate issue, such as callback
order among engines.  I understand your motivation for all these things is
tied together, but they are separate subjects to address individually.

In commit 3a9f4c87, I made a change/clarification to the API documentation
for utrace_barrier() and a corresponding fix to the implementation.  What
this does that was missing before is that utrace_barrier() does not
consider your engine's callback to be complete until your callback's return
value has been processed.  That means that if utrace_barrier() returned 0
and then you call utrace_control(UTRACE_RESUME), the UTRACE_STOP return
value of your prior callback is definitely before the UTRACE_RESUME of your
asynchronous control call.

Please address your concerns on the synchronization issue with respect to
the documented API guarantee now made by this utrace_barrier() behavior.


Thanks,
Roland


From roland at redhat.com  Mon Mar 16 04:22:58 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 15 Mar 2009 21:22:58 -0700 (PDT)
Subject: [PATCH] utrace_tracer_task:
	s/list_for_each_safe/list_for_each_entry
In-Reply-To: Oleg Nesterov's message of  Thursday,
	12 March 2009 01:28:59 +0100 <20090312002859.GA20725@redhat.com>
References: <20090310182327.GA3826@redhat.com>
	<20090310215757.1D3BCFC3B6@magilla.sf.frob.com>
	<20090312002859.GA20725@redhat.com>
Message-ID: <20090316042258.96EC0FC3AB@magilla.sf.frob.com>

> utrace_tracer_task() can use list_for_each_entry() too.

Yes, but ... I'm reminded that this function is its own can of worms.  
It's called by other threads, without any synchronization, so it cannot
safely used utrace->attached unlocked like reporting passes do.

The tracer_task and unsafe_exec hooks are there mainly for ptrace.  
I've decided to punt these utrace hooks for now.  When we get to doing a
cleaned-up ptrace on utrace (or some other facility that brings in the need
for the unsafe_exec hook), we can figure out how to cleanly and safely
support some utrace API feature for that.


Thanks,
Roland


From roland at redhat.com  Mon Mar 16 03:59:32 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 15 Mar 2009 20:59:32 -0700 (PDT)
Subject: [PATCH 2/2] UTRACE_STOP: nesting engine management (updated)
In-Reply-To: Renzo Davoli's message of  Friday,
	13 March 2009 07:36:17 +0100 <20090313063616.GA11403@cs.unibo.it>
References: <20090312131330.GB25801@cs.unibo.it>
	<20090312173532.GB26657@redhat.com>
	<20090313063616.GA11403@cs.unibo.it>
Message-ID: <20090316035932.1CF73FC3AB@magilla.sf.frob.com>

> When a report function of an engine returns UTRACE_STOP, it means (may mean)
> that it wants to change the status of the process before resuming it.
> VM monitors often change the status, sometimes debugger users want to set
> some variables too.

Yes.  In ideal cases, it can decide up front quickly what it wants to do,
and change the user state right in the callback without stopping.  But when
it needs another agent to decide what to do, it uses UTRACE_STOP.

> IMHO, utrace should stop it *before* calling the report function of the 
> next engine, 

No, we'll never want to do it this way.  One engine doesn't get to
arbitrarily delay the reporting to other engines of the thread's events.
This is both an efficiency point and a robustness point.  It's important to
remember that utrace is about the primitive events: the user thread had an
event ... the user thread is about to run again.  The high-level notion of
"what did the other engine do?" is built from examining the state at these
events, and knowing about the delays that other engines are imposing via
UTRACE_STOP.

> otherwise we need to set up another structure to synchronize
> the engines (that may even be unknown one to the other).
> If there is a tracer/debugger among the engines, it is not even possible to know
> which snapshot it gets, after or before the modification created by the VM
> monitor?

This is where the broader discussion of callback order comes in.

When a previous engine has decided to use UTRACE_STOP, your callback's
@action argument reflects this.  You know that another engine is going to
do something asynchronous before it lets the user thread run.  If your own
engine doesn't especially want it stopped now but wants to see what it
looks like when other engines are done fiddling with it, then you can use
UTRACE_REPORT.  That ensures that you'll get a report_quiesce callback
after those other engines have done their thing.


Thanks,
Roland


From fche at redhat.com  Mon Mar 16 22:18:00 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Mon, 16 Mar 2009 18:18:00 -0400
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090316214526.GA15119@Krystal>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
Message-ID: <20090316221800.GE12974@redhat.com>

Hi -


On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote:

> [...]
> > As far as I know, utrace supports multiple trace-engines on a process.
> > Since ptrace is just an engine of utrace, you can add another engine on utrace.
> > 
> > utrace-+-ptrace_engine---owner_process
> >        |
> >        +-systemtap_module
> >        |
> >        +-ftrace_plugin

Right.  In this way, utrace is simply a multiplexing intermediary.


> > Here, Frank had posted an example of utrace->ftrace engine.
> > http://lkml.org/lkml/2009/1/27/294
> > 
> > And here is the latest his patch(which seems to support syscall tracing...)
> > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124
> > 
> 
> Reminder : we are looking at system-wide tracing here. Here are some
> comments about the current utrace implementation.
> 
> Looking at include/linux/utrace.h from the tree
> 
> 17  * A tracing engine starts by calling utrace_attach_task() or
> 18  * utrace_attach_pid() on the chosen thread, passing in a set of hooks
> 19  * (&struct utrace_engine_ops), and some associated data.  This produces a
> 20  * &struct utrace_engine, which is the handle used for all other
> 21  * operations.  An attached engine has its ops vector, its data, and an
> 22  * event mask controlled by utrace_set_events().
> 
> So if the system has, say 3000 threads, then we have 3000 struct
> utrace_engine created ? I wonder what effet this could have one
> cachelines if this is used to trace hot paths like system call
> entry/exit. Have you benchmarked this kind of scenario under tbench ?

It has not been a problem, since utrace_engines are designed to be
lightweight.  Starting or stopping a systemtap script of the form

    probe process.syscall {}

appears to have no noticable impact on a tbench suite.


> 24  * For each event bit that is set, that engine will get the
> 25  * appropriate ops->report_*() callback when the event occurs.  The
> 26  * &struct utrace_engine_ops need not provide callbacks for an event
> 27  * unless the engine sets one of the associated event bits.
> 
> Looking at utrace_set_events(), we seem to be limited to 32 events on a
> 32-bits architectures because it uses a bitmask ? Isn't it a bit small?

There are only a few types of thread events that involve different
classes of treatment, or different degrees of freedom in terms of
interference with the uninstrumented fast path of the threads.

For example, it does not make sense to have different flag bits for
different system calls, since choosing to trace *any* system call
involves taking the thread off of the fast path with the TIF_ flag.
Once it's off the fast path, it doesn't matter whether the utrace core
or some client performs syscall discrimination, so it is left to the
client.


> 682 /**
> 683  * utrace_set_events_pid - choose which event reports a tracing engine gets
> 684  * @pid:                thread to affect
> 685  * @engine:             attached engine to affect
> 686  * @eventmask:          new event mask
> 687  *
> 688  * This is the same as utrace_set_events(), but takes a &struct pid
> 689  * pointer rather than a &struct task_struct pointer.  The caller must
> 690  * hold a ref on @pid, but does not need to worry about the task
> 691  * staying valid.  If it's been reaped so that @pid points nowhere,
> 692  * then this call returns -%ESRCH.
> 
> 
> Comments like "but does not need to worry about the task staying valid"
> does not make me feel safe and comfortable at all, could you explain
> how you can assume that derefencing an "invalid" pointer will return
> NULL ?

(We're doing a final round of "internal" (pre-LKML) reviews of the
utrace implementation right now on utrace-devel at redhat.com, where such
comments get fastest attention from the experts.)

For this particular issue, the utrace documentation file explains the
liveness rules for the various pointers that can be fed to or received
from utrace functions.  This is not about "feeling" safe, it's about
what the mechanism is deliberately designed to permit.


> About the utrace_attach_task() :
> 
> 244         if (unlikely(target->flags & PF_KTHREAD))
> 245                 /*
> 246                  * Silly kernel, utrace is for users!
> 247                  */
> 248                 return ERR_PTR(-EPERM);
> 
> So we cannot trace kernel threads ?

I'm not quite sure about all the reasons for this, but I believe that
kernel threads don't tend to engage in job control / signal /
system-call activities the same way as normal user threads do.


> 118 /*
> 119  * Called without locks, when we might be the first utrace engine to attach.
> 120  * If this is a newborn thread and we are not the creator, we have to wait
> 121  * for it.  The creator gets the first chance to attach.  The PF_STARTING
> 122  * flag is cleared after its report_clone hook has had a chance to run.
> 123  */
> 124 static inline int utrace_attach_delay(struct task_struct *target)
> 125 {
> 126         if ((target->flags & PF_STARTING) && target->real_parent != current)
> 127                 do {
> 128                         schedule_timeout_interruptible(1);
> 129                         if (signal_pending(current))
> 130                                 return -ERESTARTNOINTR;
> 131                 } while (target->flags & PF_STARTING);
> 132
> 133         return 0;
> 134 }
> 
> Why do we absolutely have to poll until the thread has started ?

(I don't know off the top of my head - Roland?)


> utrace_add_engine()
>   set_notify_resume(target);
> 
> ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice
> that it is set asynchronously with the execution of the target thread
> (as I do with my TIF_KERNEL_TRACE thread flag).
> 
> However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in 
> entry_64.S
> 
> int_signal:
> and
> retint_signal:
> 
> code paths. However, if there is no syscall tracing to do upon syscall
> entry, the thread flags are not re-read at syscall exit and you will
> miss the syscall exit returning from your target thread if this thread
> was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in
> some subtle way I did not figure out ? BTW re-reading the TIF flags from
> the thread_info at syscall exit on the fast path is out of question
> because it considerably degrades the kernel performances. entry_*.S is
> a very, very critical path.

(I don't know off the top of my head - Roland?)


- FChE


From fweisbec at gmail.com  Mon Mar 16 23:46:58 2009
From: fweisbec at gmail.com (Frederic Weisbecker)
Date: Tue, 17 Mar 2009 00:46:58 +0100
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090316221800.GE12974@redhat.com>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090316221800.GE12974@redhat.com>
Message-ID: <20090316234657.GC6150@nowhere>

On Mon, Mar 16, 2009 at 06:18:00PM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> 
> On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote:
> 
> > [...]
> > > As far as I know, utrace supports multiple trace-engines on a process.
> > > Since ptrace is just an engine of utrace, you can add another engine on utrace.
> > > 
> > > utrace-+-ptrace_engine---owner_process
> > >        |
> > >        +-systemtap_module
> > >        |
> > >        +-ftrace_plugin
> 
> Right.  In this way, utrace is simply a multiplexing intermediary.
> 
> 
> > > Here, Frank had posted an example of utrace->ftrace engine.
> > > http://lkml.org/lkml/2009/1/27/294
> > > 
> > > And here is the latest his patch(which seems to support syscall tracing...)
> > > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124
> > > 
> > 
> > Reminder : we are looking at system-wide tracing here. Here are some
> > comments about the current utrace implementation.
> > 
> > Looking at include/linux/utrace.h from the tree
> > 
> > 17  * A tracing engine starts by calling utrace_attach_task() or
> > 18  * utrace_attach_pid() on the chosen thread, passing in a set of hooks
> > 19  * (&struct utrace_engine_ops), and some associated data.  This produces a
> > 20  * &struct utrace_engine, which is the handle used for all other
> > 21  * operations.  An attached engine has its ops vector, its data, and an
> > 22  * event mask controlled by utrace_set_events().
> > 
> > So if the system has, say 3000 threads, then we have 3000 struct
> > utrace_engine created ? I wonder what effet this could have one
> > cachelines if this is used to trace hot paths like system call
> > entry/exit. Have you benchmarked this kind of scenario under tbench ?
> 
> It has not been a problem, since utrace_engines are designed to be
> lightweight.  Starting or stopping a systemtap script of the form
> 
>     probe process.syscall {}
> 
> appears to have no noticable impact on a tbench suite.
> 
> 
> > 24  * For each event bit that is set, that engine will get the
> > 25  * appropriate ops->report_*() callback when the event occurs.  The
> > 26  * &struct utrace_engine_ops need not provide callbacks for an event
> > 27  * unless the engine sets one of the associated event bits.
> > 
> > Looking at utrace_set_events(), we seem to be limited to 32 events on a
> > 32-bits architectures because it uses a bitmask ? Isn't it a bit small?
> 
> There are only a few types of thread events that involve different
> classes of treatment, or different degrees of freedom in terms of
> interference with the uninstrumented fast path of the threads.
> 
> For example, it does not make sense to have different flag bits for
> different system calls, since choosing to trace *any* system call
> involves taking the thread off of the fast path with the TIF_ flag.
> Once it's off the fast path, it doesn't matter whether the utrace core
> or some client performs syscall discrimination, so it is left to the
> client.
> 
> 
> > 682 /**
> > 683  * utrace_set_events_pid - choose which event reports a tracing engine gets
> > 684  * @pid:                thread to affect
> > 685  * @engine:             attached engine to affect
> > 686  * @eventmask:          new event mask
> > 687  *
> > 688  * This is the same as utrace_set_events(), but takes a &struct pid
> > 689  * pointer rather than a &struct task_struct pointer.  The caller must
> > 690  * hold a ref on @pid, but does not need to worry about the task
> > 691  * staying valid.  If it's been reaped so that @pid points nowhere,
> > 692  * then this call returns -%ESRCH.
> > 
> > 
> > Comments like "but does not need to worry about the task staying valid"
> > does not make me feel safe and comfortable at all, could you explain
> > how you can assume that derefencing an "invalid" pointer will return
> > NULL ?
> 
> (We're doing a final round of "internal" (pre-LKML) reviews of the
> utrace implementation right now on utrace-devel at redhat.com, where such
> comments get fastest attention from the experts.)
> 
> For this particular issue, the utrace documentation file explains the
> liveness rules for the various pointers that can be fed to or received
> from utrace functions.  This is not about "feeling" safe, it's about
> what the mechanism is deliberately designed to permit.
> 
> 
> > About the utrace_attach_task() :
> > 
> > 244         if (unlikely(target->flags & PF_KTHREAD))
> > 245                 /*
> > 246                  * Silly kernel, utrace is for users!
> > 247                  */
> > 248                 return ERR_PTR(-EPERM);
> > 
> > So we cannot trace kernel threads ?
> 
> I'm not quite sure about all the reasons for this, but I believe that
> kernel threads don't tend to engage in job control / signal /
> system-call activities the same way as normal user threads do.
> 


Some of them use some syscalls, but it doesn't involve a user/kernel switch.
So it's not tracable by hooking syscall_entry/exit or using tracehooks.
It would require specific hooks on sys_* functions for that.

So this check is right (writing on each thread info seems somewhat costly so
it's better if it is avoided like here).

Frederic.

 
> > 118 /*
> > 119  * Called without locks, when we might be the first utrace engine to attach.
> > 120  * If this is a newborn thread and we are not the creator, we have to wait
> > 121  * for it.  The creator gets the first chance to attach.  The PF_STARTING
> > 122  * flag is cleared after its report_clone hook has had a chance to run.
> > 123  */
> > 124 static inline int utrace_attach_delay(struct task_struct *target)
> > 125 {
> > 126         if ((target->flags & PF_STARTING) && target->real_parent != current)
> > 127                 do {
> > 128                         schedule_timeout_interruptible(1);
> > 129                         if (signal_pending(current))
> > 130                                 return -ERESTARTNOINTR;
> > 131                 } while (target->flags & PF_STARTING);
> > 132
> > 133         return 0;
> > 134 }
> > 
> > Why do we absolutely have to poll until the thread has started ?
> 
> (I don't know off the top of my head - Roland?)
> 
> 
> > utrace_add_engine()
> >   set_notify_resume(target);
> > 
> > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice
> > that it is set asynchronously with the execution of the target thread
> > (as I do with my TIF_KERNEL_TRACE thread flag).
> > 
> > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in 
> > entry_64.S
> > 
> > int_signal:
> > and
> > retint_signal:
> > 
> > code paths. However, if there is no syscall tracing to do upon syscall
> > entry, the thread flags are not re-read at syscall exit and you will
> > miss the syscall exit returning from your target thread if this thread
> > was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in
> > some subtle way I did not figure out ? BTW re-reading the TIF flags from
> > the thread_info at syscall exit on the fast path is out of question
> > because it considerably degrades the kernel performances. entry_*.S is
> > a very, very critical path.
> 
> (I don't know off the top of my head - Roland?)
> 
> 
> - FChE


From oleg at redhat.com  Tue Mar 17 01:21:43 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 17 Mar 2009 02:21:43 +0100
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: <20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
	<20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
Message-ID: <20090317012143.GA17780@redhat.com>

On 03/15, Roland McGrath wrote:
>
> > Then we re-do this (well, almost) check under ->siglock,
> >
> > 	} else if (task_is_stopped(target)) {
> > 		if (!(target->utrace_flags & UTRACE_EVENT(JCTL)))
> > 			utrace->stopped = stopped = true;
> > 	}
> >
> > But this is not nice. Let's suppose the task is already stopped, we do
> > UTRACE_ATTACH + utrace_set_events(JCTL).
>
> This is exactly why utrace_set_events() sets ->stopped preemptively for
> that case.

Yes, thanks. I saw this code in utrace_set_events(), but then forgot.

> > 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
> > 	       report_jctl, what, notify);
> >
> > instead.
>
> There is a bug, but your fix changes a key API choice.
> I've put in this fix:
>
> -	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
> +	       report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED,
> +	       notify ? what : 0);
>
> The @type argument shows the state we will be in after the callback.
> If the state changes, there will be another callback.  That's what a
> state-tracking tracer needs, e.g. to keep a little light on the screen red
> while the thread is stopped and green while it's running.
>
> The @notify argument shows what SIGCHLD the parent sees (if it were
> dequeuing all possible SIGCHLD postings as quickly as they come).  That's
> what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs
> are expected in the parent.

I see, thanks.

> > With the first patch, we call utrace_report_jctl() before we actually
> > stop. do_signal_stop() can fail then, but I think this is OK, we can
> > pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete,
> > and with this patch we always call ->report_jctl with notify == 0. Just for
> > discussion.
>
> I think I sort of understand the intent of your patch.  If we change the
> calling convention for tracehook_notify_jctl, I think we want to preserve
> the aspect that the hook decides about sending the notification.  That's
> how the ptrace quirks can be reimplemented differently later without
> changing the tracehook layer again.  Also, we certainly don't want one
> tracehook call with two different locking conditions.

Agreed, "bool sig_locked" is awful. But we can avoid it. The real problem
is how to figure out the correct "notify" argument. I'll try to think more,
but I am not sure I will find the clean solution :(

Just in case. We can introduce PF_SIGCONTED flag and change
prepare_signal(SIGCONT) and signal_wake_up(resume => 1) to set this flag.
Since the task never changes its ->flags in finish_stop() path, it is safe
to do this under ->siglock. This way utrace_report_jctl() can drop
TASK_STOPPED before REPORT() and then check !PF_SIGCONTED before restoring
the ->state. But yes sure, this is very, very ugly.

Oleg.


From oleg at redhat.com  Tue Mar 17 01:34:22 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 17 Mar 2009 02:34:22 +0100
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: <20090316015541.11C33FC3AB@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
	<20090312231607.7F9E5FC3B6@magilla.sf.frob.com>
	<20090313215912.GA1856@redhat.com>
	<20090316015541.11C33FC3AB@magilla.sf.frob.com>
Message-ID: <20090317013422.GB17780@redhat.com>

On 03/15, Roland McGrath wrote:
>
>
> > > Does that need a barrier pair here and in
> >
> > No, set_notify_resume()->test_and_set_tsk_thread_flag() implies mb(),
>
> Ah, ok.
>
> > > tracehook_notify_resume()?
> >
> > Ah. I think you are right, and I think it needs the barrier even without
> > this change. Say, UTRACE_REPORT does:
> >
> > 	utrace->report = 1;
> > 	set_notify_resume();
> >
> > Without mb() there is no guarantee that utrace_resume() will notice and
> > clear ->report.
>
> Wait, what?  You just said that set_notify_resume() already implies an mb().

Yes, but the other side lacks a barrier. UTRACE_REPORT does

	utrace->report = 1;
	wmb(); // actually mb, but wmb is enough
	set _TIF_NOTIFY_RESUME;

do_notify_resume()->utrace_resume()->start_report() path does

	if (_TIF_NOTIFY_RESUME)
		// !!! we need rmb in between !!!
		if (utrace->report)
			...

and it can miss ->report.

> But we don't want a solution that requires changing
> arch code.

Yes, agreed.

Oleg.


From oleg at redhat.com  Tue Mar 17 02:33:48 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 17 Mar 2009 03:33:48 +0100
Subject: utrace_set_events/utrace_control && death/reap checks
In-Reply-To: <20090316023421.C6136FC3AB@magilla.sf.frob.com>
References: <20090313233300.GA14605@redhat.com>
	<20090316023421.C6136FC3AB@magilla.sf.frob.com>
Message-ID: <20090317023348.GC17780@redhat.com>

On 03/15, Roland McGrath wrote:
>
> > utrace_set_events:
> >
> > 	(utrace->death && ((old_flags & ~events) & DEATH_EVENTS))
> >
> > "(old_flags & ~events) & DEATH_EVENTS)" means the caller tries to
> > clear DEATH/QUIESCE. Why this is not allowed? And why this is not
> > allowed _only_ when the target runs utrace_report_death()->REPORT()?
>
> This is specifically documented for -EALREADY, and in the DocBook section
> "Interlock with final callbacks".  The idea is this:

Aha, I didn't know.

> > And I don't understand why do we need utrace->death at all.
> ...
> > it is only used by
> > utrace_control(UTRACE_DETACH).
> ...
> that it will (or it's already happening).
>
> utrace_control as the synchronizing step of
> asynchronous tear-down.  If it returns 0, then report_death will not and it
> is safe to destroy data structures the callback code would use.

Yes, with your explanation above this is clear.


But can't we simplify this check a little bit?

	utrace_control:

		else if (unlikely(target->utrace_flags & DEATH_EVENTS) ||
			   unlikely(utrace->death)) {
			return -EALREADY;

can't we just do

		else if (unlikely(utrace->death)) {
			return -EALREADY;

I guess I missed something, but can't understand why do we need to
check ->utrace_flags. We are going to call mark_engine_detached()
below which clears engine->flags, and we hold utrace->lock.

If utrace_flags & DEATH_EVENTS is true, the subsequent
utrace_report_death() must see engine->flags == 0 (it takes
utrace->lock before REPORT_CALLBACKS), so it won't call any
callback. Yes, it can play with engine itself, but this should
be safe because "struct utrace" has a reference to attached
engine.

No?

Oleg.


From oleg at redhat.com  Tue Mar 17 05:24:42 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Tue, 17 Mar 2009 06:24:42 +0100
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090316214526.GA15119@Krystal>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
Message-ID: <20090317052442.GA32674@redhat.com>

On 03/16, Mathieu Desnoyers wrote:
>
> utrace_add_engine()
>   set_notify_resume(target);
>
> ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice
> that it is set asynchronously with the execution of the target thread
> (as I do with my TIF_KERNEL_TRACE thread flag).
>
> However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in
> entry_64.S
>
> int_signal:
> and
> retint_signal:
>
> code paths. However, if there is no syscall tracing to do upon syscall
> entry, the thread flags are not re-read at syscall exit and you will
> miss the syscall exit returning from your target thread if this thread
> was blocked while you set its TIF_NOTIFY_RESUME.

Afaics, TIF_NOTIFY_RESUME is not needed to trace syscall entry/exit.
If engine wants the syscall tracing, utrace_set_events(UTRACE_SYSCALL_xxx)
sets TIF_SYSCALL_TRACE. And syscall_trace_enter/syscall_trace_leave call
tracehook_report_syscall_xxx().

Oleg.


From mathieu.desnoyers at polymtl.ca  Tue Mar 17 16:00:29 2009
From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers)
Date: Tue, 17 Mar 2009 12:00:29 -0400
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090317052442.GA32674@redhat.com>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090317052442.GA32674@redhat.com>
Message-ID: <20090317160029.GD10092@Krystal>

* Oleg Nesterov (oleg at redhat.com) wrote:
> On 03/16, Mathieu Desnoyers wrote:
> >
> > utrace_add_engine()
> >   set_notify_resume(target);
> >
> > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice
> > that it is set asynchronously with the execution of the target thread
> > (as I do with my TIF_KERNEL_TRACE thread flag).
> >
> > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in
> > entry_64.S
> >
> > int_signal:
> > and
> > retint_signal:
> >
> > code paths. However, if there is no syscall tracing to do upon syscall
> > entry, the thread flags are not re-read at syscall exit and you will
> > miss the syscall exit returning from your target thread if this thread
> > was blocked while you set its TIF_NOTIFY_RESUME.
> 
> Afaics, TIF_NOTIFY_RESUME is not needed to trace syscall entry/exit.
> If engine wants the syscall tracing, utrace_set_events(UTRACE_SYSCALL_xxx)
> sets TIF_SYSCALL_TRACE. And syscall_trace_enter/syscall_trace_leave call
> tracehook_report_syscall_xxx().
> 
> Oleg.

I recall that TIF_SYSCALL_TRACE also suffers from the same problem as
TIF_NOTIFY_RESUME if set asynchronously with the target thread's
execution at least on x86_64 and arm. Do you take care to stop the
target thread in utrace_set_events ?

Mathieu

> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


From info at mondopinione.redhat.com  Tue Mar 17 13:41:50 2009
From: info at mondopinione.redhat.com (info at mondopinione.redhat.com)
Date: Tue, 17 Mar 2009 14:41:50 +0100
Subject: La tua opinione conta!
Message-ID: <9b184e282e9392a751a59d903bde5a15@www.consoleworld.org>

 
 <http://partners.sprintrade.com/z/18038/CD345/>

 Diventa membro di Globaltestmarket
<http://partners.sprintrade.com/z/18038/CD345/>
, una realt? in cui persone di tutto il mondo partecipano a indagini
di opinioni online. 

 Diventa membro di Globaltestmarket e partecipa alle indagini online,
dando cos? il tuo contributo alla valutazione di prodotti di consumo
nuovi e consolidati, campagne pubblicitarie e anteprime di film e
canzoni. E per di pi?? partecipare a GlobalTestMarket ? del tutto
gratis. 

 <http://partners.sprintrade.com/z/18038/CD345/>

 <http://partners.sprintrade.com/z/18038/CD345/>

 Se avete delle domande su GlobalTestMarket, cliccate qui
<http://partners.sprintrade.com/z/18038/CD345/>
. 
 GlobalTestMarket . 2835 82nd Ave. SE . Suite S100 . Mercer Island,
WA 98040 . USA 


--
Powered by PHPlist, www.phplist.com --


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090317/9fe0385c/attachment.htm>

From roland at redhat.com  Wed Mar 18 08:37:40 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed, 18 Mar 2009 01:37:40 -0700 (PDT)
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	17 March 2009 02:34:22 +0100 <20090317013422.GB17780@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
	<20090312231607.7F9E5FC3B6@magilla.sf.frob.com>
	<20090313215912.GA1856@redhat.com>
	<20090316015541.11C33FC3AB@magilla.sf.frob.com>
	<20090317013422.GB17780@redhat.com>
Message-ID: <20090318083740.9EB39FC3AB@magilla.sf.frob.com>

> > Wait, what?  You just said that set_notify_resume() already implies an mb().
> 
> Yes, but the other side lacks a barrier. UTRACE_REPORT does
> 
> 	utrace->report = 1;
> 	wmb(); // actually mb, but wmb is enough
> 	set _TIF_NOTIFY_RESUME;
> 
> do_notify_resume()->utrace_resume()->start_report() path does
> 
> 	if (_TIF_NOTIFY_RESUME)
> 		// !!! we need rmb in between !!!
> 		if (utrace->report)
> 			...
> 
> and it can miss ->report.

I see.  We have a similar problem for (the first) attach, too, right?
utrace_add_engine does:

	utrace_flags |= UTRACE_EVENT(REAP);
 	utrace->report = 1;
	wmb(); // actually mb, but wmb is enough
	set _TIF_NOTIFY_RESUME;

do_notify_resume()->tracehook_notify_resume() path does:

 	if (_TIF_NOTIFY_RESUME)
 		// !!! we need rmb in between !!!
 		if (utrace_flags != 0)
 			utrace_resume()

This is what I put in (4d8a6fd6):

--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -616,6 +616,12 @@ static inline void set_notify_resume(struct task_struct *task)
 static inline void tracehook_notify_resume(struct pt_regs *regs)
 {
 	struct task_struct *task = current;
+	/*
+	 * This pairs with the barrier implicit in set_notify_resume().
+	 * It ensures that we read the nonzero utrace_flags set before
+	 * set_notify_resume() was called by utrace setup.
+	 */
+	smp_rmb();
 	if (task_utrace_flags(task))
 		utrace_resume(task, regs);
 }


Thanks,
Roland


From roland at redhat.com  Wed Mar 18 08:52:14 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed, 18 Mar 2009 01:52:14 -0700 (PDT)
Subject: utrace_set_events/utrace_control && death/reap checks
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	17 March 2009 03:33:48 +0100 <20090317023348.GC17780@redhat.com>
References: <20090313233300.GA14605@redhat.com>
	<20090316023421.C6136FC3AB@magilla.sf.frob.com>
	<20090317023348.GC17780@redhat.com>
Message-ID: <20090318085214.8961CFC3AB@magilla.sf.frob.com>

> But can't we simplify this check a little bit?
> 
> 	utrace_control:
> 
> 		else if (unlikely(target->utrace_flags & DEATH_EVENTS) ||
> 			   unlikely(utrace->death)) {
> 			return -EALREADY;
> 
> can't we just do
> 
> 		else if (unlikely(utrace->death)) {
> 			return -EALREADY;

Yes, it's sufficient.  I've changed it.


Thanks,
Roland


From roland at redhat.com  Wed Mar 18 11:07:58 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed, 18 Mar 2009 04:07:58 -0700 (PDT)
Subject: Q: utrace->stopped && utrace_report_jctl()
In-Reply-To: Oleg Nesterov's message of  Tuesday,
	17 March 2009 02:21:43 +0100 <20090317012143.GA17780@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
	<20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
	<20090317012143.GA17780@redhat.com>
Message-ID: <20090318110758.C7654FC3AB@magilla.sf.frob.com>

> Agreed, "bool sig_locked" is awful. But we can avoid it. The real problem
> is how to figure out the correct "notify" argument. I'll try to think more,
> but I am not sure I will find the clean solution :(

It does not seem hard if we move tracehook_notify_jctl inside siglock.

> Just in case. We can introduce PF_SIGCONTED flag and change
> prepare_signal(SIGCONT) and signal_wake_up(resume => 1) to set this flag.
> Since the task never changes its ->flags in finish_stop() path, it is safe
> to do this under ->siglock. This way utrace_report_jctl() can drop
> TASK_STOPPED before REPORT() and then check !PF_SIGCONTED before restoring
> the ->state. But yes sure, this is very, very ugly.

Very!  No need for this at all.

It's OK to change the tracehook definition.  I did this on the new git
branch tracehook, then utrace branch commit 7b0be6e4 merges that and uses it.

This drops all the JCTL bit kludgery and utrace_report_jctl just backs out
of TASK_STOPPED before dropping the siglock in the first place.  I think
the bookkeeping covers all the angles, but please check it in the new code.

Also please verify if you think all ->stopped bookkeeping is bulletproof
now.  I fiddled utrace_get_signal() a little because I wasn't quite sure
that all possibly paths there after TASK_STOPPED were resetting it.

With that, please tell me if the current code fixes all the issues (not
just this one) you've noticed or what I've still missed.  I want to post it
to LKML in the next day or two so it has aired before the 2.6.30 merge
window.  If we've covered things that would hold up review and initial
merge now, many follow-on changes will probably go in easily as we have them.


Thanks,
Roland


From oleg at redhat.com  Wed Mar 18 18:15:12 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Wed, 18 Mar 2009 19:15:12 +0100
Subject: [PATCH] simplify do_signal_stop() && utrace_report_jctl()
	interaction
In-Reply-To: <20090318110758.C7654FC3AB@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
	<20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
	<20090317012143.GA17780@redhat.com>
	<20090318110758.C7654FC3AB@magilla.sf.frob.com>
Message-ID: <20090318181512.GA697@redhat.com>

On 03/18, Roland McGrath wrote:
>
> It's OK to change the tracehook definition.  I did this on the new git
> branch tracehook, then utrace branch commit 7b0be6e4 merges that and uses it.

Roland, I think it better to change tracehook definition more, please
see below.

> This drops all the JCTL bit kludgery and utrace_report_jctl just backs out
> of TASK_STOPPED before dropping the siglock in the first place.  I think
> the bookkeeping covers all the angles, but please check it in the new code.

Heh. I was thinking about the very similar change. But I have problems
with tracehook_notify_jctl().

Please find the patch below, on top of your changes. At the cost of
one additional ->group_stop_count != 0 in do_signal_stop(), we can
avoid playing with state/group_stop_count/flags twice.

But, with or without this patch we have a small problem: we can wrongly
send SIGCHLD twice. I'll write a separate email.

> Also please verify if you think all ->stopped bookkeeping is bulletproof
> now.  I fiddled utrace_get_signal() a little because I wasn't quite sure
> that all possibly paths there after TASK_STOPPED were resetting it.

Will do. I didn't study the signal part of utrace yet.

> I want to post it
> to LKML in the next day or two so it has aired before the 2.6.30 merge
> window.

Yes! I think it should be posted really soon.

BTW. exit_signals() calls tracehook_notify_jctl(why => CLD_STOPPED),
could you confirm this is right?

-------------------------------------------------------------------------
[PATCH] simplify do_signal_stop() && utrace_report_jctl() interaction

do_signal_stop() can call utrace_report_jctl() before decrementing
->group_stop_count and setting TASK_STOPPED/SIGNAL_STOP_STOPPED.
This allow to greatly simplify utrace_report_jctl() and avoid playing
with group-stop bookkeeping twice.

Signed-off-by: Oleg Nesterov <oleg at redhat.com>

 signal.c |   29 +++++++++++------------------
 utrace.c |   20 --------------------
 2 files changed, 11 insertions(+), 38 deletions(-)

--- xxx/kernel/signal.c~JCTL_SIMPLIFY	2009-03-18 14:50:06.000000000 +0100
+++ xxx/kernel/signal.c	2009-03-18 18:20:35.000000000 +0100
@@ -1638,16 +1638,9 @@ void ptrace_notify(int exit_code)
 static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
-	int stop_count;
 	int notify;
 
-	if (sig->group_stop_count > 0) {
-		/*
-		 * There is a group stop in progress.  We don't need to
-		 * start another one.
-		 */
-		stop_count = --sig->group_stop_count;
-	} else {
+	if (!sig->group_stop_count) {
 		struct task_struct *t;
 
 		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
@@ -1659,7 +1652,7 @@ static int do_signal_stop(int signr)
 		 */
 		sig->group_exit_code = signr;
 
-		stop_count = 0;
+		sig->group_stop_count = 1;
 		for (t = next_thread(current); t != current; t = next_thread(t))
 			/*
 			 * Setting state to TASK_STOPPED for a group
@@ -1668,25 +1661,25 @@ static int do_signal_stop(int signr)
 			 */
 			if (!(t->flags & PF_EXITING) &&
 			    !task_is_stopped_or_traced(t)) {
-				stop_count++;
+				sig->group_stop_count++;
 				signal_wake_up(t, 0);
 			}
-		sig->group_stop_count = stop_count;
 	}
 
-	if (stop_count == 0)
-		sig->flags = SIGNAL_STOP_STOPPED;
-	current->exit_code = sig->group_exit_code;
-	__set_current_state(TASK_STOPPED);
-
 	/*
 	 * If there are no other threads in the group, or if there is
 	 * a group stop in progress and we are the last to stop,
 	 * report to the parent.  When ptraced, every thread reports itself.
 	 */
-	notify = tracehook_notify_jctl(stop_count == 0 ? CLD_STOPPED : 0,
-				       CLD_STOPPED);
+	notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0;
+	notify = tracehook_notify_jctl(notify, CLD_STOPPED);
 
+	if (sig->group_stop_count) {
+		if (!--sig->group_stop_count)
+			sig->flags = SIGNAL_STOP_STOPPED;
+		current->exit_code = sig->group_exit_code;
+		__set_current_state(TASK_STOPPED);
+	}
 	spin_unlock_irq(&current->sighand->siglock);
 
 	if (notify) {
--- xxx/kernel/utrace.c~JCTL_SIMPLIFY	2009-03-18 14:50:06.000000000 +0100
+++ xxx/kernel/utrace.c	2009-03-18 18:23:01.000000000 +0100
@@ -1618,18 +1618,7 @@ void utrace_report_jctl(int notify, int 
 	struct task_struct *task = current;
 	struct utrace *utrace = task_utrace_struct(task);
 	INIT_REPORT(report);
-	bool stop = task_is_stopped(task);
 
-	/*
-	 * We have to come out of TASK_STOPPED in case the event report
-	 * hooks might block.  Since we held the siglock throughout, it's
-	 * as if we were never in TASK_STOPPED yet at all.
-	 */
-	if (stop) {
-		__set_current_state(TASK_RUNNING);
-		task->signal->flags &= ~SIGNAL_STOP_STOPPED;
-		++task->signal->group_stop_count;
-	}
 	spin_unlock_irq(&task->sighand->siglock);
 
 	/*
@@ -1654,16 +1643,7 @@ void utrace_report_jctl(int notify, int 
 	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
 	       report_jctl, what, notify);
 
-	/*
-	 * Retake the lock, and go back into TASK_STOPPED
-	 * unless the stop was just cleared.
-	 */
 	spin_lock_irq(&task->sighand->siglock);
-	if (stop && task->signal->group_stop_count > 0) {
-		__set_current_state(TASK_STOPPED);
-		if (--task->signal->group_stop_count == 0)
-			task->signal->flags |= SIGNAL_STOP_STOPPED;
-	}
 }
 
 /*


From oleg at redhat.com  Wed Mar 18 18:22:45 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Wed, 18 Mar 2009 19:22:45 +0100
Subject: Q: utrace_reset() && UTRACE_EVENT(REAP)
In-Reply-To: <20090318083740.9EB39FC3AB@magilla.sf.frob.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312195021.GB3529@redhat.com>
	<20090312231607.7F9E5FC3B6@magilla.sf.frob.com>
	<20090313215912.GA1856@redhat.com>
	<20090316015541.11C33FC3AB@magilla.sf.frob.com>
	<20090317013422.GB17780@redhat.com>
	<20090318083740.9EB39FC3AB@magilla.sf.frob.com>
Message-ID: <20090318182245.GB697@redhat.com>

On 03/18, Roland McGrath wrote:
>
> > Yes, but the other side lacks a barrier. UTRACE_REPORT does
> >
> > 	utrace->report = 1;
> > 	wmb(); // actually mb, but wmb is enough
> > 	set _TIF_NOTIFY_RESUME;
> >
> > do_notify_resume()->utrace_resume()->start_report() path does
> >
> > 	if (_TIF_NOTIFY_RESUME)
> > 		// !!! we need rmb in between !!!
> > 		if (utrace->report)
> > 			...
> >
> > and it can miss ->report.
>
> I see.  We have a similar problem for (the first) attach, too, right?
> utrace_add_engine does:

Yes sure. I just meant the barrier was needed even before you changed
utrace_add_engine() to set ->report.

> --- a/include/linux/tracehook.h
> +++ b/include/linux/tracehook.h
> @@ -616,6 +616,12 @@ static inline void set_notify_resume(struct task_struct *task)
>  static inline void tracehook_notify_resume(struct pt_regs *regs)
>  {
>  	struct task_struct *task = current;
> +	/*
> +	 * This pairs with the barrier implicit in set_notify_resume().
> +	 * It ensures that we read the nonzero utrace_flags set before
> +	 * set_notify_resume() was called by utrace setup.
> +	 */
> +	smp_rmb();

smp_mb__after_clear_bit() is enough, but I agree, smp_rmb() is more
understandable.

Oleg.


From oleg at redhat.com  Wed Mar 18 19:49:41 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Wed, 18 Mar 2009 20:49:41 +0100
Subject: PATCH? tracehook_notify_jctl && SIGCONT
In-Reply-To: <20090318181512.GA697@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
	<20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
	<20090317012143.GA17780@redhat.com>
	<20090318110758.C7654FC3AB@magilla.sf.frob.com>
	<20090318181512.GA697@redhat.com>
Message-ID: <20090318194941.GA7563@redhat.com>

On 03/18, Oleg Nesterov wrote:
>
> On 03/18, Roland McGrath wrote:
> >
> > It's OK to change the tracehook definition.  I did this on the new git
> > branch tracehook, then utrace branch commit 7b0be6e4 merges that and uses it.
>
> Roland, I think it better to change tracehook definition more, please
> see below.

The problem is that, since utrace_report_jctl() drops ->siglock,
tracehook_notify_jctl() can return false positive. This is easy
to fix, but then we have to check PT_PTRACED twice, not good.

Suppose we have 2 threads, T1 and T2, T1 has JCTL in ->utrace_flags.

T2 dequeues SIGSTOP, calls do_signal_stop(), and sleeps in TASK_STOPPED.

T1 calls do_signal_stop(). ->group_stop_count == 1, so it does
notify = tracehook_notify_jctl(notify => CLD_STOPPED), this means
that notify always becomes CLD_STOPPED.

When tracehook_notify_jctl()->utrace_notify_jctl() drops siglock,
SIGCONT comes, notices ->group_stop_count != 0, and adds SIGNAL_CLD_STOPPED
to signal flags.

Now we send SIGCHLD with si_code = CLD_STOPPED twice. By T1 from
do_signal_stop(), and by T1 or T2 from get_signal_to_deliver() which
checks SIGNAL_CLD_MASK.

I'd suggest something like the patch below. At least for now.

Oleg.

--- xxx/include/linux/tracehook.h~JCTL_NOTIFY	2009-03-18 14:50:05.000000000 +0100
+++ xxx/include/linux/tracehook.h	2009-03-18 20:18:54.000000000 +0100
@@ -520,11 +520,10 @@ static inline int tracehook_get_signal(s
  *
  * Called with the siglock held.
  */
-static inline int tracehook_notify_jctl(int notify, int why)
+static inline void tracehook_notify_jctl(int notify, int why)
 {
 	if (task_utrace_flags(current) & UTRACE_EVENT(JCTL))
 		utrace_report_jctl(notify, why);
-	return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
 }
 
 #define DEATH_REAP			-1
--- xxx/kernel/signal.c~JCTL_NOTIFY	2009-03-18 18:20:35.000000000 +0100
+++ xxx/kernel/signal.c	2009-03-18 20:28:39.000000000 +0100
@@ -1671,18 +1671,21 @@ static int do_signal_stop(int signr)
 	 * a group stop in progress and we are the last to stop,
 	 * report to the parent.  When ptraced, every thread reports itself.
 	 */
-	notify = sig->group_stop_count == 1 ? CLD_STOPPED : 0;
-	notify = tracehook_notify_jctl(notify, CLD_STOPPED);
+	tracehook_notify_jctl(sig->group_stop_count == 1 ? CLD_STOPPED : 0,
+				CLD_STOPPED);
 
+	notify = 0;
 	if (sig->group_stop_count) {
-		if (!--sig->group_stop_count)
+		if (!--sig->group_stop_count) {
 			sig->flags = SIGNAL_STOP_STOPPED;
+			notify = 1;
+		}
 		current->exit_code = sig->group_exit_code;
 		__set_current_state(TASK_STOPPED);
 	}
 	spin_unlock_irq(&current->sighand->siglock);
 
-	if (notify) {
+	if (notify || (current->ptrace & PT_PTRACED)) {
 		read_lock(&tasklist_lock);
 		do_notify_parent_cldstop(current, notify);
 		read_unlock(&tasklist_lock);
@@ -1765,14 +1768,12 @@ relock:
 				? CLD_CONTINUED : CLD_STOPPED;
 		signal->flags &= ~SIGNAL_CLD_MASK;
 
-		why = tracehook_notify_jctl(why, CLD_CONTINUED);
+		tracehook_notify_jctl(why, CLD_CONTINUED);
 		spin_unlock_irq(&sighand->siglock);
 
-		if (why) {
-			read_lock(&tasklist_lock);
-			do_notify_parent_cldstop(current->group_leader, why);
-			read_unlock(&tasklist_lock);
-		}
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current->group_leader, why);
+		read_unlock(&tasklist_lock);
 		goto relock;
 	}
 
@@ -1930,7 +1931,8 @@ void exit_signals(struct task_struct *ts
 	if (unlikely(tsk->signal->group_stop_count) &&
 			!--tsk->signal->group_stop_count) {
 		tsk->signal->flags = SIGNAL_STOP_STOPPED;
-		group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
+		tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
+		group_stop = 1;
 	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);


From no-reply at BancoPostaonline.it  Wed Mar 18 17:54:23 2009
From: no-reply at BancoPostaonline.it (BancoPostaonline )
Date: Wed, 18 Mar 2009 12:54:23 -0500
Subject: Misure di Sicurezza !
Message-ID: <1237398863.13930.qmail@BancoPostaonline.it>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090318/3bb23233/attachment.htm>

From roland at redhat.com  Thu Mar 19 07:43:16 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 19 Mar 2009 00:43:16 -0700 (PDT)
Subject: [PATCH] simplify do_signal_stop() && utrace_report_jctl()
	interaction
In-Reply-To: Oleg Nesterov's message of  Wednesday,
	18 March 2009 19:15:12 +0100 <20090318181512.GA697@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
	<20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
	<20090317012143.GA17780@redhat.com>
	<20090318110758.C7654FC3AB@magilla.sf.frob.com>
	<20090318181512.GA697@redhat.com>
Message-ID: <20090319074316.B68D8FC3AB@magilla.sf.frob.com>

> Roland, I think it better to change tracehook definition more, please
> see below.

I don't really object to this in principle.  But this touches signal.c a
lot more in less obviously-trivial ways than my tracehook patch.  That is
more of an issue at the outset than some extra fiddling in the utrace code.
I think we should consider this for a later clean-up after merging.

> BTW. exit_signals() calls tracehook_notify_jctl(why => CLD_STOPPED),
> could you confirm this is right?

Yes, it's right.  I considered passing CLD_EXITED here to distinguish this
odd case, but that would make the vanilla tracehook_notify_jctl()
definition less simple.  Instead, we put the onus on a ->report_jctl hook
to check for PF_EXITING to tell if it's really going to stop.


Thanks,
Roland


From roland at redhat.com  Thu Mar 19 07:47:50 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 19 Mar 2009 00:47:50 -0700 (PDT)
Subject: PATCH? tracehook_notify_jctl && SIGCONT
In-Reply-To: Oleg Nesterov's message of  Wednesday,
	18 March 2009 20:49:41 +0100 <20090318194941.GA7563@redhat.com>
References: <20090311222401.GA13512@redhat.com>
	<20090312073652.75811FC3B6@magilla.sf.frob.com>
	<20090312190738.GA3529@redhat.com>
	<20090312224055.BA71CFC3B6@magilla.sf.frob.com>
	<20090314001420.GA15677@redhat.com>
	<20090315223300.GA10526@redhat.com>
	<20090316011401.8EAE7FC3AB@magilla.sf.frob.com>
	<20090317012143.GA17780@redhat.com>
	<20090318110758.C7654FC3AB@magilla.sf.frob.com>
	<20090318181512.GA697@redhat.com>
	<20090318194941.GA7563@redhat.com>
Message-ID: <20090319074750.4EB4EFC3AB@magilla.sf.frob.com>

> Now we send SIGCHLD with si_code = CLD_STOPPED twice. By T1 from
> do_signal_stop(), and by T1 or T2 from get_signal_to_deliver() which
> checks SIGNAL_CLD_MASK.

Yes, I considered this problem.  It's just not so big a deal as to worry
overmuch about this corner case in the first version.  What seems to me
will be the obvious and straightforward way to address it is to give
utrace_report_jctl() a return value that tracehook_notify_jctl() uses.
Then we can omit a notification that has been superceded.

Your patch does not seem very straightforward to me.  Moreover, you moved
some ptrace magic out of the tracehook function back into core signals code.
That is going in the wrong direction and we won't have any of that.


Thanks,
Roland


From Holbrook_Serena at daeilind.com  Thu Mar 19 08:31:47 2009
From: Holbrook_Serena at daeilind.com (Kara Arellano)
Date: Thu, 19 Mar 2009 16:31:47 +0800 (CST)
Subject: Send emails directly to dentists 
Message-ID: <20090319083147.7B1E2D87343@mailcenter.gdrc.com>

The package below is valued at over $2000 when purchased individually

Currently Practicing Physicians in America 

788,981 in total * 17,019 emails

Physicians in many different specialties

Over a dozen sortable fields


American Pharmaceutical Company Listing
47,000 names and emails of the major positions

Hospitals in the US
complete contact information for CEO's, CFO's, Directors and more - over 23,000 listings in total for more than 7,000 hospitals in the USA

Extensive Contact List of Dentists in the USA
Practically every dentist in America is listed here

US Chiropractor List
100,000 Chiropractors in the USA (worth $250 alone)


This week's special price =  
$397 for everything

send us an email:      Jack at thebestdatamed.com

  
above expires on March 21 


to stop this email in future email us at xyz at thebestdatamed.com


From roland at redhat.com  Thu Mar 19 10:34:34 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 19 Mar 2009 03:34:34 -0700 (PDT)
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: Mathieu Desnoyers's message of  Tuesday,
	17 March 2009 12:00:29 -0400 <20090317160029.GD10092@Krystal>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090317052442.GA32674@redhat.com>
	<20090317160029.GD10092@Krystal>
Message-ID: <20090319103434.CBE69FC3AB@magilla.sf.frob.com>

The utrace API itself is not a good fit for global tracing, since its
purpose is tracing and control of individual user threads.  There is
no reason to allocate its per-task data structures when you are going
to treat all tasks the same anyway.  The points that I think are being
missed are about the possibilities of overloading TIF_SYSCALL_TRACE.

It's true that ptrace uses TIF_SYSCALL_TRACE as a flag for whether you are
in the middle of a PTRACE_SYSCALL, so it can be confused by setting it for 
other purposes on a task that is also ptrace'd (but not with PTRACE_SYSCALL).
Until we are able to do away with these parts of the old ptrace innards,
you can't overload TIF_SYSCALL_TRACE without perturbing ptrace behavior.

The utrace code does not have this problem.  It keeps its own state bits,
so for it, TIF_SYSCALL_TRACE means exactly "the task will call
tracehook_report_syscall_*" and no more.  To use TIF_SYSCALL_TRACE for
another purpose, just set it on all the tasks you like (and/or set it on
new tasks in fork.c) and add your code (tracepoints, whatever) to
tracehook_report_syscall_* alongside the calls there into utrace.  There is
exactly one place in utrace code that clears TIF_SYSCALL_TRACE, and you
just add "&& !global_syscall_tracing_enabled" to the condition there.  You
don't need to bother clearing TIF_SYSCALL_TRACE on any task when you're
done.  If your "global_syscall_tracing_enabled" (or whatever it is) is
clear, each task will lazily fall into utrace at its next syscall
entry/exit and then utrace will reset TIF_SYSCALL_TRACE when it finds no
reason left to have it on.

I'm not really going to delve into utrace internals in this thread.  Please
raise those questions in review of the utrace patches when current code is
actually posted, where they belong.  Here I'll just mention the relevant
things that relate to the underlying issue you raised about synchronization.
As thoroughly documented, utrace_set_events() is a quick, asynchronous call
that itself makes no guarantees about how quickly a running task will start
to report the newly-requested events.  For purposes relevant here, it just
sets TIF_SYSCALL_TRACE and nothing else.  In utrace, if you want synchronous
assurance that a task misses no events you ask for, then you must first use
utrace_control (et al) to stop it and synchronize.  That is not something
that makes much sense at all for a "flip on global tracing" operation, which
is not generally especially synchronous with anything else.  If you want
best effort that a task will pick up newly-requested events Real Soon Now,
you can use utrace_control with just UTRACE_REPORT.  For purposes relevant
here, this just does set_notify_resume().  That will send an IPI if the task
is running, and then it will start noticing before it returns to user mode.
So:
	set_tsk_thread_flag(task, TIF_SYSCALL_TRACE);
	set_notify_resume(task);
is what I would expect you to do on each task if you want to quickly get it
to start hitting tracehook_report_syscall_*.  (I'm a bit dubious that there
is really any need to speed it up with set_notify_resume, but that's just me.)

Finally, some broader points about TIF_SYSCALL_TRACE that I think
have been overlooked.  The key special feature of TIF_SYSCALL_TRACE
is that it gets you to a place where full user_regset access is
available.  Debuggers need this to read (and write) the full user
register state arbitrarily, which they also need to do user
backtraces and the like.  If you do not need user_regset to work,
then you don't need to be on this (slowest) code path.

If you are only interested in reading syscall arguments and results
(or even in changing syscall results in exit tracing) then you do
not need user_regset and you do not need to take the slowest syscall
path.  (If you are doing backtraces but already rely on full kernel
stack unwinding to do it, you also do not need user_regset.)  From
anywhere inside the kernel, you can use the asm/syscall.h calls to
read syscall args, whichever entry path the task took.

The other mechanism to hook into every syscall entry/exit is
TIF_SYSCALL_AUDIT.  On some machines (like x86), this takes a third,
"warm" code path that is faster than the TIF_SYSCALL_TRACE path
(though obviously still off the fastest direct code path).  It can
be faster precisely because it doesn't need to allow for user_regset
access, nor for modification of syscall arguments in entry tracing.
For normal read-only tracing of just the actual syscall details,
it has all you need.

Unfortunately the arch code all looks like:

	if (unlikely(current->audit_context))
		 audit_syscall_{entry,exit}(...);

So we need to change that to:

	if (unlikely(test_thread_flag(TIF_SYSCALL_AUDIT)))
		 audit_syscall_{entry,exit}(...);

But that is pretty easy to get right, even doing it blind on arch's
you can't test.  Far better than adding new asm hackery for each arch
that's almost identical to TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT (and
finding out that some are fresh out of TIF bits in the range that
their asm code can handle).

TIF_SYSCALL_AUDIT is only set when allocating audit_context, and its
paths already have !context tests so overloading is harmless today.
(Whereas with TIF_SYSCALL_TRACE, you have to wait for later ptrace
cleanups or write off using ptrace simultaneously.)

Then you can do the lazy disable in audit_syscall_{entry,exit} with:

	if (unlikely(!context)) {
		if (unlikely(!global_syscall_tracing_enabled))
			clear_thread_flag(TIF_SYSCALL_AUDIT);
		return;
	}

Plus add there your tracepoint or whatnot.

Unless you really plan to use user_regset in your tracepoints, then
I think this is a better plan for global syscall tracing than either
fiddling with TIF_SYSCALL_TRACE or adding new arch asm requirements.
(IMHO, the latter is the worst idea on the table.)


Thanks,
Roland


From remodulation at mebel24.ru  Fri Mar 20 10:21:07 2009
From: remodulation at mebel24.ru (Belback Schepp)
Date: Fri, 20 Mar 2009 10:21:07 +0000
Subject: Second passionaate youth
Message-ID: <49C36B7A.2324573@mebel24.ru>

Seccond passionate youth

<http://cid-413a747bf337acc6.spaces.live.com/blog/cns!413A747BF337ACC6!104.entry>


Arjuna comes back safely. I desire to ascertain of men who
were remarkable for their character of this world, and accordingly
these, when acquired, who, sir?' 'acknowledge miss reynoldsyour
granddaughter is explained by nilakantha as sutaram abhava..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090320/2e4db6e9/attachment.htm>

From support at chineseclits.com  Fri Mar 20 12:01:52 2009
From: support at chineseclits.com (Chalermphon)
Date: Fri, 20 Mar 2009 12:01:52 +0000
Subject: 9 Reasons Xxoozero Sucks
Message-ID: <587c01c9a953$129f8260$b08fb479@[121.180.143.176]>

      you'll see the difference 
            Voyage
            Injuries
            Altars
            Grippe
            Raild
            Altars Loudly
            Enchanted
            Voyage
            Injuries
            Taurus
            Raild
            Altars Consecrated
            Injuries
            Altars
            Loudly
            Injuries
            Slightly 
     
      get to it 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090320/048eb60c/attachment.htm>

From rich47 at alftanes.is  Fri Mar 20 23:42:29 2009
From: rich47 at alftanes.is (Blanch Good)
Date: Sat, 21 Mar 2009 08:42:29 +0900
Subject: Show her how real man drills
Message-ID: <000901c9a9b5$88538740$b771ccdc@LocalHostxwv>

Whats your male score? http://spuz.clappingguide.at/


From roland at redhat.com  Sat Mar 21 01:39:46 2009
From: roland at redhat.com (Roland McGrath)
Date: Fri, 20 Mar 2009 18:39:46 -0700 (PDT)
Subject: [PATCH 0/3] utrace
Message-ID: <20090321013946.890F4FC3AB@magilla.sf.frob.com>

utrace is a new kernel-side API for kernel modules, intended to make it
tractable to work on novel ways to trace and debug user-mode tasks.

These patches apply to the current Linus tree (v2.6.29-rc8-241-g65c2449).
The first two should apply fine on the -tip tree as well, and we will be
glad to rebase the set to whichever tree.  Frank has another version of the
ftrace patch (3/3) that works for -tip.  The utrace patches don't touch
anything unless you set a new kconfig option (still marked EXPERIMENTAL),
and so are quite safe in that regard.

utrace cannot be enabled without CONFIG_HAVE_ARCH_TRACEHOOK and the arch
details it indicates.  If your arch does not have it yet, its maintainers
will have to work on that.  The details are in the comments in arch/Kconfig.

The first patch makes a small update to one of the tracehook.h interfaces
that we needed for utrace.  It moves code a little but does not change any
of the logic in the existing code.

The second patch adds the utrace kernel API (if CONFIG_UTRACE=y is set).
There is no change at all without the config option, and with it there is
no effect on anything at all until a kernel module using the utrace API is
loaded.  There is detailed documentation on the API in DocBook form.

The third patch is an ftrace widget based on utrace, by Frank Eigler.
Frank will follow up on any issues about that patch.


Thanks,
Roland


From roland at redhat.com  Sat Mar 21 01:41:00 2009
From: roland at redhat.com (Roland McGrath)
Date: Fri, 20 Mar 2009 18:41:00 -0700 (PDT)
Subject: [PATCH 1/3] signals: tracehook_notify_jctl change
In-Reply-To: Roland McGrath's message of  Friday, 20 March 2009 18:39:46 -0700
	<20090321013946.890F4FC3AB@magilla.sf.frob.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
Message-ID: <20090321014100.5C4A9FC3AB@magilla.sf.frob.com>

This changes tracehook_notify_jctl() so it's called with the siglock held,
and changes its argument and return value definition.  These clean-ups make
it a better fit for what new tracing hooks need to check.

Tracing needs the siglock here, held from the time TASK_STOPPED was set,
to avoid potential SIGCONT races if it wants to allow any blocking in its
tracing hooks.

This also folds the finish_stop() function into its caller do_signal_stop().
The function is short, called only once and only unconditionally.  It aids
readability to fold it in.

Signed-off-by: Roland McGrath <roland at redhat.com>
---
 include/linux/tracehook.h |   25 ++++++++++------
 kernel/signal.c           |   69 +++++++++++++++++++++++----------------------
 2 files changed, 51 insertions(+), 43 deletions(-)

diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 6186a78..b622498 100644  
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -1,7 +1,7 @@
 /*
  * Tracing hooks
  *
- * Copyright (C) 2008 Red Hat, Inc.  All rights reserved.
+ * Copyright (C) 2008-2009 Red Hat, Inc.  All rights reserved.
  *
  * This copyrighted material is made available to anyone wishing to use,
  * modify, copy, or redistribute it subject to the terms and conditions
@@ -469,22 +469,29 @@ static inline int tracehook_get_signal(s
 
 /**
  * tracehook_notify_jctl - report about job control stop/continue
- * @notify:		nonzero if this is the last thread in the group to stop
+ * @notify:		zero, %CLD_STOPPED or %CLD_CONTINUED
  * @why:		%CLD_STOPPED or %CLD_CONTINUED
  *
  * This is called when we might call do_notify_parent_cldstop().
- * It's called when about to stop for job control; we are already in
- * %TASK_STOPPED state, about to call schedule().  It's also called when
- * a delayed %CLD_STOPPED or %CLD_CONTINUED report is ready to be made.
  *
- * Return nonzero to generate a %SIGCHLD with @why, which is
- * normal if @notify is nonzero.
+ * @notify is zero if we would not ordinarily send a %SIGCHLD,
+ * or is the %CLD_STOPPED or %CLD_CONTINUED .si_code for %SIGCHLD.
  *
- * Called with no locks held.
+ * @why is %CLD_STOPPED when about to stop for job control;
+ * we are already in %TASK_STOPPED state, about to call schedule().
+ * It might also be that we have just exited (check %PF_EXITING),
+ * but need to report that a group-wide stop is complete.
+ *
+ * @why is %CLD_CONTINUED when waking up after job control stop and
+ * ready to make a delayed @notify report.
+ *
+ * Return the %CLD_* value for %SIGCHLD, or zero to generate no signal.
+ *
+ * Called with the siglock held.
  */
 static inline int tracehook_notify_jctl(int notify, int why)
 {
-	return notify || (current->ptrace & PT_PTRACED);
+	return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
 }
 
 #define DEATH_REAP			-1
diff --git a/kernel/signal.c b/kernel/signal.c
index 2a74fe8..9a0d98f 100644  
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -691,7 +691,7 @@ static int prepare_signal(int sig, struc
 
 		if (why) {
 			/*
-			 * The first thread which returns from finish_stop()
+			 * The first thread which returns from do_signal_stop()
 			 * will take ->siglock, notice SIGNAL_CLD_MASK, and
 			 * notify its parent. See get_signal_to_deliver().
 			 */
@@ -1629,29 +1629,6 @@ void ptrace_notify(int exit_code)
 	spin_unlock_irq(&current->sighand->siglock);
 }
 
-static void
-finish_stop(int stop_count)
-{
-	/*
-	 * If there are no other threads in the group, or if there is
-	 * a group stop in progress and we are the last to stop,
-	 * report to the parent.  When ptraced, every thread reports itself.
-	 */
-	if (tracehook_notify_jctl(stop_count == 0, CLD_STOPPED)) {
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current, CLD_STOPPED);
-		read_unlock(&tasklist_lock);
-	}
-
-	do {
-		schedule();
-	} while (try_to_freeze());
-	/*
-	 * Now we don't run again until continued.
-	 */
-	current->exit_code = 0;
-}
-
 /*
  * This performs the stopping for SIGSTOP and other stop signals.
  * We have to stop all threads in the thread group.
@@ -1662,6 +1639,7 @@ static int do_signal_stop(int signr)
 {
 	struct signal_struct *sig = current->signal;
 	int stop_count;
+	int notify;
 
 	if (sig->group_stop_count > 0) {
 		/*
@@ -1701,8 +1679,30 @@ static int do_signal_stop(int signr)
 	current->exit_code = sig->group_exit_code;
 	__set_current_state(TASK_STOPPED);
 
+	/*
+	 * If there are no other threads in the group, or if there is
+	 * a group stop in progress and we are the last to stop,
+	 * report to the parent.  When ptraced, every thread reports itself.
+	 */
+	notify = tracehook_notify_jctl(stop_count == 0 ? CLD_STOPPED : 0,
+				       CLD_STOPPED);
+
 	spin_unlock_irq(&current->sighand->siglock);
-	finish_stop(stop_count);
+
+	if (notify) {
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current, notify);
+		read_unlock(&tasklist_lock);
+	}
+
+	do {
+		schedule();
+	} while (try_to_freeze());
+	/*
+	 * Now we don't run again until continued.
+	 */
+	current->exit_code = 0;
+
 	return 1;
 }
 
@@ -1771,14 +1771,15 @@ relock:
 		int why = (signal->flags & SIGNAL_STOP_CONTINUED)
 				? CLD_CONTINUED : CLD_STOPPED;
 		signal->flags &= ~SIGNAL_CLD_MASK;
-		spin_unlock_irq(&sighand->siglock);
 
-		if (unlikely(!tracehook_notify_jctl(1, why)))
-			goto relock;
+		why = tracehook_notify_jctl(why, CLD_CONTINUED);
+		spin_unlock_irq(&sighand->siglock);
 
-		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(current->group_leader, why);
-		read_unlock(&tasklist_lock);
+		if (why) {
+			read_lock(&tasklist_lock);
+			do_notify_parent_cldstop(current->group_leader, why);
+			read_unlock(&tasklist_lock);
+		}
 		goto relock;
 	}
 
@@ -1936,14 +1937,14 @@ void exit_signals(struct task_struct *ts
 	if (unlikely(tsk->signal->group_stop_count) &&
 			!--tsk->signal->group_stop_count) {
 		tsk->signal->flags = SIGNAL_STOP_STOPPED;
-		group_stop = 1;
+		group_stop = tracehook_notify_jctl(CLD_STOPPED, CLD_STOPPED);
 	}
 out:
 	spin_unlock_irq(&tsk->sighand->siglock);
 
-	if (unlikely(group_stop) && tracehook_notify_jctl(1, CLD_STOPPED)) {
+	if (unlikely(group_stop)) {
 		read_lock(&tasklist_lock);
-		do_notify_parent_cldstop(tsk, CLD_STOPPED);
+		do_notify_parent_cldstop(tsk, group_stop);
 		read_unlock(&tasklist_lock);
 	}
 }


From roland at redhat.com  Sat Mar 21 01:41:40 2009
From: roland at redhat.com (Roland McGrath)
Date: Fri, 20 Mar 2009 18:41:40 -0700 (PDT)
Subject: [PATCH 2/3] utrace core
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
Message-ID: <20090321014140.AA4F5FC3AB@magilla.sf.frob.com>

This adds the utrace facility, a new modular interface in the kernel for
implementing user thread tracing and debugging.  This fits on top of the
tracehook_* layer, so the new code is well-isolated.

The new interface is in <linux/utrace.h> and the DocBook utrace book
describes it.  It allows for multiple separate tracing engines to work in
parallel without interfering with each other.  Higher-level tracing
facilities can be implemented as loadable kernel modules using this layer.

The new facility is made optional under CONFIG_UTRACE.
When this is not enabled, no new code is added.
It can only be enabled on machines that have all the
prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK.

In this initial version, utrace and ptrace do not play together at all.
If ptrace is attached to a thread, the attach calls in the utrace kernel
API return -EBUSY.  If utrace is attached to a thread, the PTRACE_ATTACH
or PTRACE_TRACEME request will return EBUSY to userland.  The old ptrace
code is otherwise unchanged and nothing using ptrace should be affected
by this patch as long as utrace is not used at the same time.  In the
future we can clean up the ptrace implementation and rework it to use
the utrace API.

Signed-off-by: Roland McGrath <roland at redhat.com>
---
 Documentation/DocBook/Makefile    |    2 +-
 Documentation/DocBook/utrace.tmpl |  571 +++++++++
 fs/proc/array.c                   |    3 +
 include/linux/init_task.h         |    1 +
 include/linux/sched.h             |    6 +
 include/linux/tracehook.h         |   50 +-
 include/linux/utrace.h            |  692 +++++++++++
 include/linux/utrace_struct.h     |   58 +
 init/Kconfig                      |    9 +
 kernel/Makefile                   |    1 +
 kernel/ptrace.c                   |   18 +-
 kernel/utrace.c                   | 2348 +++++++++++++++++++++++++++++++++++++
 12 files changed, 3756 insertions(+), 3 deletions(-)

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 1462ed8..f5da1b4 100644  
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -9,7 +9,7 @@
 DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
 	    kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
 	    procfs-guide.xml writing_usb_driver.xml networking.xml \
-	    kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
+	    kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml utrace.xml \
 	    gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
 	    genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
 	    mac80211.xml debugobjects.xml sh.xml regulator.xml
diff --git a/Documentation/DocBook/utrace.tmpl b/Documentation/DocBook/utrace.tmpl
new file mode 100644
index ...b802c55 100644  
--- /dev/null
+++ b/Documentation/DocBook/utrace.tmpl
@@ -0,0 +1,571 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
+
+<book id="utrace">
+  <bookinfo>
+    <title>The utrace User Debugging Infrastructure</title>
+  </bookinfo>
+
+  <toc></toc>
+
+  <chapter id="concepts"><title>utrace concepts</title>
+
+  <sect1 id="intro"><title>Introduction</title>
+
+  <para>
+    <application>utrace</application> is infrastructure code for tracing
+    and controlling user threads.  This is the foundation for writing
+    tracing engines, which can be loadable kernel modules.
+  </para>
+
+  <para>
+    The basic actors in <application>utrace</application> are the thread
+    and the tracing engine.  A tracing engine is some body of code that
+    calls into the <filename>&lt;linux/utrace.h&gt;</filename>
+    interfaces, represented by a <structname>struct
+    utrace_engine_ops</structname>.  (Usually it's a kernel module,
+    though the legacy <function>ptrace</function> support is a tracing
+    engine that is not in a kernel module.)  The interface operates on
+    individual threads (<structname>struct task_struct</structname>).
+    If an engine wants to treat several threads as a group, that is up
+    to its higher-level code.
+  </para>
+
+  <para>
+    Tracing begins by attaching an engine to a thread, using
+    <function>utrace_attach_task</function> or
+    <function>utrace_attach_pid</function>.  If successful, it returns a
+    pointer that is the handle used in all other calls.
+  </para>
+
+  </sect1>
+
+  <sect1 id="callbacks"><title>Events and Callbacks</title>
+
+  <para>
+    An attached engine does nothing by default.  An engine makes something
+    happen by requesting callbacks via <function>utrace_set_events</function>
+    and poking the thread with <function>utrace_control</function>.
+    The synchronization issues related to these two calls
+    are discussed further below in <xref linkend="teardown"/>.
+  </para>
+
+  <para>
+    Events are specified using the macro
+    <constant>UTRACE_EVENT(<replaceable>type</replaceable>)</constant>.
+    Each event type is associated with a callback in <structname>struct
+    utrace_engine_ops</structname>.  A tracing engine can leave unused
+    callbacks <constant>NULL</constant>.  The only callbacks required
+    are those used by the event flags it sets.
+  </para>
+
+  <para>
+    Many engines can be attached to each thread.  When a thread has an
+    event, each engine gets a callback if it has set the event flag for
+    that event type.  Engines are called in the order they attached.
+    Engines that attach after the event has occurred do not get callbacks
+    for that event.  This includes any new engines just attached by an
+    existing engine's callback function.  Once the sequence of callbacks
+    for that one event has completed, such new engines are then eligible in
+    the next sequence that starts when there is another event.
+  </para>
+
+  <para>
+    Event reporting callbacks have details particular to the event type,
+    but are all called in similar environments and have the same
+    constraints.  Callbacks are made from safe points, where no locks
+    are held, no special resources are pinned (usually), and the
+    user-mode state of the thread is accessible.  So, callback code has
+    a pretty free hand.  But to be a good citizen, callback code should
+    never block for long periods.  It is fine to block in
+    <function>kmalloc</function> and the like, but never wait for i/o or
+    for user mode to do something.  If you need the thread to wait, use
+    <constant>UTRACE_STOP</constant> and return from the callback
+    quickly.  When your i/o finishes or whatever, you can use
+    <function>utrace_control</function> to resume the thread.
+  </para>
+
+  </sect1>
+
+  <sect1 id="safely"><title>Stopping Safely</title>
+
+  <sect2 id="well-behaved"><title>Writing well-behaved callbacks</title>
+
+  <para>
+    Well-behaved callbacks are important to maintain two essential
+    properties of the interface.  The first of these is that unrelated
+    tracing engines should not interfere with each other.  If your engine's
+    event callback does not return quickly, then another engine won't get
+    the event notification in a timely manner.  The second important
+    property is that tracing should be as noninvasive as possible to the
+    normal operation of the system overall and of the traced thread in
+    particular.  That is, attached tracing engines should not perturb a
+    thread's behavior, except to the extent that changing its user-visible
+    state is explicitly what you want to do.  (Obviously some perturbation
+    is unavoidable, primarily timing changes, ranging from small delays due
+    to the overhead of tracing, to arbitrary pauses in user code execution
+    when a user stops a thread with a debugger for examination.)  Even when
+    you explicitly want the perturbation of making the traced thread block,
+    just blocking directly in your callback has more unwanted effects.  For
+    example, the <constant>CLONE</constant> event callbacks are called when
+    the new child thread has been created but not yet started running; the
+    child can never be scheduled until the <constant>CLONE</constant>
+    tracing callbacks return.  (This allows engines tracing the parent to
+    attach to the child.)  If a <constant>CLONE</constant> event callback
+    blocks the parent thread, it also prevents the child thread from
+    running (even to process a <constant>SIGKILL</constant>).  If what you
+    want is to make both the parent and child block, then use
+    <function>utrace_attach_task</function> on the child and then use
+    <constant>UTRACE_STOP</constant> on both threads.  A more crucial
+    problem with blocking in callbacks is that it can prevent
+    <constant>SIGKILL</constant> from working.  A thread that is blocking
+    due to <constant>UTRACE_STOP</constant> will still wake up and die
+    immediately when sent a <constant>SIGKILL</constant>, as all threads
+    should.  Relying on the <application>utrace</application>
+    infrastructure rather than on private synchronization calls in event
+    callbacks is an important way to help keep tracing robustly
+    noninvasive.
+  </para>
+
+  </sect2>
+
+  <sect2 id="UTRACE_STOP"><title>Using <constant>UTRACE_STOP</constant></title>
+
+  <para>
+    To control another thread and access its state, it must be stopped
+    with <constant>UTRACE_STOP</constant>.  This means that it is
+    stopped and won't start running again while we access it.  When a
+    thread is not already stopped, <function>utrace_control</function>
+    returns <constant>-EINPROGRESS</constant> and an engine must wait
+    for an event callback when the thread is ready to stop.  The thread
+    may be running on another CPU or may be blocked.  When it is ready
+    to be examined, it will make callbacks to engines that set the
+    <constant>UTRACE_EVENT(QUIESCE)</constant> event bit.  To wake up an
+    interruptible wait, use <constant>UTRACE_INTERRUPT</constant>.
+  </para>
+
+  <para>
+    As long as some engine has used <constant>UTRACE_STOP</constant> and
+    not called <function>utrace_control</function> to resume the thread,
+    then the thread will remain stopped.  <constant>SIGKILL</constant>
+    will wake it up, but it will not run user code.  When the stop is
+    cleared with <function>utrace_control</function> or a callback
+    return value, the thread starts running again.
+    (See also <xref linkend="teardown"/>.)
+  </para>
+
+  </sect2>
+
+  </sect1>
+
+  <sect1 id="teardown"><title>Tear-down Races</title>
+
+  <sect2 id="SIGKILL"><title>Primacy of <constant>SIGKILL</constant></title>
+  <para>
+    Ordinarily synchronization issues for tracing engines are kept fairly
+    straightforward by using <constant>UTRACE_STOP</constant>.  You ask a
+    thread to stop, and then once it makes the
+    <function>report_quiesce</function> callback it cannot do anything else
+    that would result in another callback, until you let it with a
+    <function>utrace_control</function> call.  This simple arrangement
+    avoids complex and error-prone code in each one of a tracing engine's
+    event callbacks to keep them serialized with the engine's other
+    operations done on that thread from another thread of control.
+    However, giving tracing engines complete power to keep a traced thread
+    stuck in place runs afoul of a more important kind of simplicity that
+    the kernel overall guarantees: nothing can prevent or delay
+    <constant>SIGKILL</constant> from making a thread die and release its
+    resources.  To preserve this important property of
+    <constant>SIGKILL</constant>, it as a special case can break
+    <constant>UTRACE_STOP</constant> like nothing else normally can.  This
+    includes both explicit <constant>SIGKILL</constant> signals and the
+    implicit <constant>SIGKILL</constant> sent to each other thread in the
+    same thread group by a thread doing an exec, or processing a fatal
+    signal, or making an <function>exit_group</function> system call.  A
+    tracing engine can prevent a thread from beginning the exit or exec or
+    dying by signal (other than <constant>SIGKILL</constant>) if it is
+    attached to that thread, but once the operation begins, no tracing
+    engine can prevent or delay all other threads in the same thread group
+    dying.
+  </para>
+  </sect2>
+
+  <sect2 id="reap"><title>Final callbacks</title>
+  <para>
+    The <function>report_reap</function> callback is always the final event
+    in the life cycle of a traced thread.  Tracing engines can use this as
+    the trigger to clean up their own data structures.  The
+    <function>report_death</function> callback is always the penultimate
+    event a tracing engine might see; it's seen unless the thread was
+    already in the midst of dying when the engine attached.  Many tracing
+    engines will have no interest in when a parent reaps a dead process,
+    and nothing they want to do with a zombie thread once it dies; for
+    them, the <function>report_death</function> callback is the natural
+    place to clean up data structures and detach.  To facilitate writing
+    such engines robustly, given the asynchrony of
+    <constant>SIGKILL</constant>, and without error-prone manual
+    implementation of synchronization schemes, the
+    <application>utrace</application> infrastructure provides some special
+    guarantees about the <function>report_death</function> and
+    <function>report_reap</function> callbacks.  It still takes some care
+    to be sure your tracing engine is robust to tear-down races, but these
+    rules make it reasonably straightforward and concise to handle a lot of
+    corner cases correctly.
+  </para>
+  </sect2>
+
+  <sect2 id="refcount"><title>Engine and task pointers</title>
+  <para>
+    The first sort of guarantee concerns the core data structures
+    themselves.  <structname>struct utrace_engine</structname> is
+    a reference-counted data structure.  While you hold a reference, an
+    engine pointer will always stay valid so that you can safely pass it to
+    any <application>utrace</application> call.  Each call to
+    <function>utrace_attach_task</function> or
+    <function>utrace_attach_pid</function> returns an engine pointer with a
+    reference belonging to the caller.  You own that reference until you
+    drop it using <function>utrace_engine_put</function>.  There is an
+    implicit reference on the engine while it is attached.  So if you drop
+    your only reference, and then use
+    <function>utrace_attach_task</function> without
+    <constant>UTRACE_ATTACH_CREATE</constant> to look up that same engine,
+    you will get the same pointer with a new reference to replace the one
+    you dropped, just like calling <function>utrace_engine_get</function>.
+    When an engine has been detached, either explicitly with
+    <constant>UTRACE_DETACH</constant> or implicitly after
+    <function>report_reap</function>, then any references you hold are all
+    that keep the old engine pointer alive.
+  </para>
+
+  <para>
+    There is nothing a kernel module can do to keep a <structname>struct
+    task_struct</structname> alive outside of
+    <function>rcu_read_lock</function>.  When the task dies and is reaped
+    by its parent (or itself), that structure can be freed so that any
+    dangling pointers you have stored become invalid.
+    <application>utrace</application> will not prevent this, but it can
+    help you detect it safely.  By definition, a task that has been reaped
+    has had all its engines detached.  All
+    <application>utrace</application> calls can be safely called on a
+    detached engine if the caller holds a reference on that engine pointer,
+    even if the task pointer passed in the call is invalid.  All calls
+    return <constant>-ESRCH</constant> for a detached engine, which tells
+    you that the task pointer you passed could be invalid now.  Since
+    <function>utrace_control</function> and
+    <function>utrace_set_events</function> do not block, you can call those
+    inside a <function>rcu_read_lock</function> section and be sure after
+    they don't return <constant>-ESRCH</constant> that the task pointer is
+    still valid until <function>rcu_read_unlock</function>.  The
+    infrastructure never holds task references of its own.  Though neither
+    <function>rcu_read_lock</function> nor any other lock is held while
+    making a callback, it's always guaranteed that the <structname>struct
+    task_struct</structname> and the <structname>struct
+    utrace_engine</structname> passed as arguments remain valid
+    until the callback function returns.
+  </para>
+
+  <para>
+    The common means for safely holding task pointers that is available to
+    kernel modules is to use <structname>struct pid</structname>, which
+    permits <function>put_pid</function> from kernel modules.  When using
+    that, the calls <function>utrace_attach_pid</function>,
+    <function>utrace_control_pid</function>,
+    <function>utrace_set_events_pid</function>, and
+    <function>utrace_barrier_pid</function> are available.
+  </para>
+  </sect2>
+
+  <sect2 id="reap-after-death">
+    <title>
+      Serialization of <constant>DEATH</constant> and <constant>REAP</constant>
+    </title>
+    <para>
+      The second guarantee is the serialization of
+      <constant>DEATH</constant> and <constant>REAP</constant> event
+      callbacks for a given thread.  The actual reaping by the parent
+      (<function>release_task</function> call) can occur simultaneously
+      while the thread is still doing the final steps of dying, including
+      the <function>report_death</function> callback.  If a tracing engine
+      has requested both <constant>DEATH</constant> and
+      <constant>REAP</constant> event reports, it's guaranteed that the
+      <function>report_reap</function> callback will not be made until
+      after the <function>report_death</function> callback has returned.
+      If the <function>report_death</function> callback itself detaches
+      from the thread, then the <function>report_reap</function> callback
+      will never be made.  Thus it is safe for a
+      <function>report_death</function> callback to clean up data
+      structures and detach.
+    </para>
+  </sect2>
+
+  <sect2 id="interlock"><title>Interlock with final callbacks</title>
+  <para>
+    The final sort of guarantee is that a tracing engine will know for sure
+    whether or not the <function>report_death</function> and/or
+    <function>report_reap</function> callbacks will be made for a certain
+    thread.  These tear-down races are disambiguated by the error return
+    values of <function>utrace_set_events</function> and
+    <function>utrace_control</function>.  Normally
+    <function>utrace_control</function> called with
+    <constant>UTRACE_DETACH</constant> returns zero, and this means that no
+    more callbacks will be made.  If the thread is in the midst of dying,
+    it returns <constant>-EALREADY</constant> to indicate that the
+    <constant>report_death</constant> callback may already be in progress;
+    when you get this error, you know that any cleanup your
+    <function>report_death</function> callback does is about to happen or
+    has just happened--note that if the <function>report_death</function>
+    callback does not detach, the engine remains attached until the thread
+    gets reaped.  If the thread is in the midst of being reaped,
+    <function>utrace_control</function> returns <constant>-ESRCH</constant>
+    to indicate that the <function>report_reap</function> callback may
+    already be in progress; this means the engine is implicitly detached
+    when the callback completes.  This makes it possible for a tracing
+    engine that has decided asynchronously to detach from a thread to
+    safely clean up its data structures, knowing that no
+    <function>report_death</function> or <function>report_reap</function>
+    callback will try to do the same.  <constant>utrace_detach</constant>
+    returns <constant>-ESRCH</constant> when the <structname>struct
+    utrace_engine</structname> has already been detached, but is
+    still a valid pointer because of its reference count.  A tracing engine
+    can use this to safely synchronize its own independent multiple threads
+    of control with each other and with its event callbacks that detach.
+  </para>
+
+  <para>
+    In the same vein, <function>utrace_set_events</function> normally
+    returns zero; if the target thread was stopped before the call, then
+    after a successful call, no event callbacks not requested in the new
+    flags will be made.  It fails with <constant>-EALREADY</constant> if
+    you try to clear <constant>UTRACE_EVENT(DEATH)</constant> when the
+    <function>report_death</function> callback may already have begun, if
+    you try to clear <constant>UTRACE_EVENT(REAP)</constant> when the
+    <function>report_reap</function> callback may already have begun, or if
+    you try to newly set <constant>UTRACE_EVENT(DEATH)</constant> or
+    <constant>UTRACE_EVENT(QUIESCE)</constant> when the target is already
+    dead or dying.  Like <function>utrace_control</function>, it returns
+    <constant>-ESRCH</constant> when the thread has already been detached
+    (including forcible detach on reaping).  This lets the tracing engine
+    know for sure which event callbacks it will or won't see after
+    <function>utrace_set_events</function> has returned.  By checking for
+    errors, it can know whether to clean up its data structures immediately
+    or to let its callbacks do the work.
+  </para>
+  </sect2>
+
+  <sect2 id="barrier"><title>Using <function>utrace_barrier</function></title>
+  <para>
+    When a thread is safely stopped, calling
+    <function>utrace_control</function> with <constant>UTRACE_DETACH</constant>
+    or calling <function>utrace_set_events</function> to disable some events
+    ensures synchronously that your engine won't get any more of the callbacks
+    that have been disabled (none at all when detaching).  But these can also
+    be used while the thread is not stopped, when it might be simultaneously
+    making a callback to your engine.  For this situation, these calls return
+    <constant>-EINPROGRESS</constant> when it's possible a callback is in
+    progress.  If you are not prepared to have your old callbacks still run,
+    then you can synchronize to be sure all the old callbacks are finished,
+    using <function>utrace_barrier</function>.  This is necessary if the
+    kernel module containing your callback code is going to be unloaded.
+  </para>
+  <para>
+    After using <constant>UTRACE_DETACH</constant> once, further calls to
+    <function>utrace_control</function> with the same engine pointer will
+    return <constant>-ESRCH</constant>.  In contrast, after getting
+    <constant>-EINPROGRESS</constant> from
+    <function>utrace_set_events</function>, you can call
+    <function>utrace_set_events</function> again later and if it returns zero
+    then know the old callbacks have finished.
+  </para>
+  <para>
+    Unlike all other calls, <function>utrace_barrier</function> (and
+    <function>utrace_barrier_pid</function>) will accept any engine pointer you
+    hold a reference on, even if <constant>UTRACE_DETACH</constant> has already
+    been used.  After any <function>utrace_control</function> or
+    <function>utrace_set_events</function> call (these do not block), you can
+    call <function>utrace_barrier</function> to block until callbacks have
+    finished.  This returns <constant>-ESRCH</constant> only if the engine is
+    completely detached (finished all callbacks).  Otherwise it waits
+    until the thread is definitely not in the midst of a callback to this
+    engine and then returns zero, but can return
+    <constant>-ERESTARTSYS</constant> if its wait is interrupted.
+  </para>
+  </sect2>
+
+</sect1>
+
+</chapter>
+
+<chapter id="core"><title>utrace core API</title>
+
+<para>
+  The utrace API is declared in <filename>&lt;linux/utrace.h&gt;</filename>.
+</para>
+
+!Iinclude/linux/utrace.h
+!Ekernel/utrace.c
+
+</chapter>
+
+<chapter id="machine"><title>Machine State</title>
+
+<para>
+  The <function>task_current_syscall</function> function can be used on any
+  valid <structname>struct task_struct</structname> at any time, and does
+  not even require that <function>utrace_attach_task</function> was used at all.
+</para>
+
+<para>
+  The other ways to access the registers and other machine-dependent state of
+  a task can only be used on a task that is at a known safe point.  The safe
+  points are all the places where <function>utrace_set_events</function> can
+  request callbacks (except for the <constant>DEATH</constant> and
+  <constant>REAP</constant> events).  So at any event callback, it is safe to
+  examine <varname>current</varname>.
+</para>
+
+<para>
+  One task can examine another only after a callback in the target task that
+  returns <constant>UTRACE_STOP</constant> so that task will not return to user
+  mode after the safe point.  This guarantees that the task will not resume
+  until the same engine uses <function>utrace_control</function>, unless the
+  task dies suddenly.  To examine safely, one must use a pair of calls to
+  <function>utrace_prepare_examine</function> and
+  <function>utrace_finish_examine</function> surrounding the calls to
+  <structname>struct user_regset</structname> functions or direct examination
+  of task data structures.  <function>utrace_prepare_examine</function> returns
+  an error if the task is not properly stopped and not dead.  After a
+  successful examination, the paired <function>utrace_finish_examine</function>
+  call returns an error if the task ever woke up during the examination.  If
+  so, any data gathered may be scrambled and should be discarded.  This means
+  there was a spurious wake-up (which should not happen), or a sudden death.
+</para>
+
+<sect1 id="regset"><title><structname>struct user_regset</structname></title>
+
+<para>
+  The <structname>struct user_regset</structname> API
+  is declared in <filename>&lt;linux/regset.h&gt;</filename>.
+</para>
+
+!Finclude/linux/regset.h
+
+</sect1>
+
+<sect1 id="task_current_syscall">
+  <title><filename>System Call Information</filename></title>
+
+<para>
+  This function is declared in <filename>&lt;linux/ptrace.h&gt;</filename>.
+</para>
+
+!Elib/syscall.c
+
+</sect1>
+
+<sect1 id="syscall"><title><filename>System Call Tracing</filename></title>
+
+<para>
+  The arch API for system call information is declared in
+  <filename>&lt;asm/syscall.h&gt;</filename>.
+  Each of these calls can be used only at system call entry tracing,
+  or can be used only at system call exit and the subsequent safe points
+  before returning to user mode.
+  At system call entry tracing means either during a
+  <structfield>report_syscall_entry</structfield> callback,
+  or any time after that callback has returned <constant>UTRACE_STOP</constant>.
+</para>
+
+!Finclude/asm-generic/syscall.h
+
+</sect1>
+
+</chapter>
+
+<chapter id="internals"><title>Kernel Internals</title>
+
+<para>
+  This chapter covers the interface to the tracing infrastructure
+  from the core of the kernel and the architecture-specific code.
+  This is for maintainers of the kernel and arch code, and not relevant
+  to using the tracing facilities described in preceding chapters.
+</para>
+
+<sect1 id="tracehook"><title>Core Calls In</title>
+
+<para>
+  These calls are declared in <filename>&lt;linux/tracehook.h&gt;</filename>.
+  The core kernel calls these functions at various important places.
+</para>
+
+!Finclude/linux/tracehook.h
+
+</sect1>
+
+<sect1 id="arch"><title>Architecture Calls Out</title>
+
+<para>
+  An arch that has done all these things sets
+  <constant>CONFIG_HAVE_ARCH_TRACEHOOK</constant>.
+  This is required to enable the <application>utrace</application> code.
+</para>
+
+<sect2 id="arch-ptrace"><title><filename>&lt;asm/ptrace.h&gt;</filename></title>
+
+<para>
+  An arch defines these in <filename>&lt;asm/ptrace.h&gt;</filename>
+  if it supports hardware single-step or block-step features.
+</para>
+
+!Finclude/linux/ptrace.h arch_has_single_step arch_has_block_step
+!Finclude/linux/ptrace.h user_enable_single_step user_enable_block_step
+!Finclude/linux/ptrace.h user_disable_single_step
+
+</sect2>
+
+<sect2 id="arch-syscall">
+  <title><filename>&lt;asm/syscall.h&gt;</filename></title>
+
+  <para>
+    An arch provides <filename>&lt;asm/syscall.h&gt;</filename> that
+    defines these as inlines, or declares them as exported functions.
+    These interfaces are described in <xref linkend="syscall"/>.
+  </para>
+
+</sect2>
+
+<sect2 id="arch-tracehook">
+  <title><filename>&lt;linux/tracehook.h&gt;</filename></title>
+
+  <para>
+    An arch must define <constant>TIF_NOTIFY_RESUME</constant>
+    and <constant>TIF_SYSCALL_TRACE</constant>
+    in its <filename>&lt;asm/thread_info.h&gt;</filename>.
+    The arch code must call the following functions, all declared
+    in <filename>&lt;linux/tracehook.h&gt;</filename> and
+    described in <xref linkend="tracehook"/>:
+
+    <itemizedlist>
+      <listitem>
+	<para><function>tracehook_notify_resume</function></para>
+      </listitem>
+      <listitem>
+	<para><function>tracehook_report_syscall_entry</function></para>
+      </listitem>
+      <listitem>
+	<para><function>tracehook_report_syscall_exit</function></para>
+      </listitem>
+      <listitem>
+	<para><function>tracehook_signal_handler</function></para>
+      </listitem>
+    </itemizedlist>
+
+  </para>
+
+</sect2>
+
+</sect1>
+
+</chapter>
+
+</book>
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 7e4877d..0c683ed 100644  
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -81,6 +81,7 @@
 #include <linux/seq_file.h>
 #include <linux/pid_namespace.h>
 #include <linux/tracehook.h>
+#include <linux/utrace.h>
 
 #include <asm/pgtable.h>
 #include <asm/processor.h>
@@ -187,6 +188,8 @@ static inline void task_state(struct seq
 		cred->uid, cred->euid, cred->suid, cred->fsuid,
 		cred->gid, cred->egid, cred->sgid, cred->fsgid);
 
+	task_utrace_proc_status(m, p);
+
 	task_lock(p);
 	if (p->files)
 		fdt = files_fdtable(p->files);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index e752d97..39eebc8 100644  
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -181,6 +181,7 @@ extern struct cred init_cred;
 		[PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),		\
 	},								\
 	.dirties = INIT_PROP_LOCAL_SINGLE(dirties),			\
+	INIT_UTRACE(tsk)						\
 	INIT_IDS							\
 	INIT_TRACE_IRQFLAGS						\
 	INIT_LOCKDEP							\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 011db2f..786ef2d 100644  
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -59,6 +59,7 @@ struct sched_param {
 #include <linux/errno.h>
 #include <linux/nodemask.h>
 #include <linux/mm_types.h>
+#include <linux/utrace_struct.h>
 
 #include <asm/system.h>
 #include <asm/page.h>
@@ -1287,6 +1288,11 @@ struct task_struct {
 #endif
 	seccomp_t seccomp;
 
+#ifdef CONFIG_UTRACE
+	struct utrace utrace;
+	unsigned long utrace_flags;
+#endif
+
 /* Thread group tracking */
    	u32 parent_exec_id;
    	u32 self_exec_id;
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index b622498..6ff7277 100644  
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -49,6 +49,7 @@
 #include <linux/sched.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
+#include <linux/utrace.h>
 struct linux_binprm;
 
 /**
@@ -63,6 +64,8 @@ struct linux_binprm;
  */
 static inline int tracehook_expect_breakpoints(struct task_struct *task)
 {
+	if (unlikely(task_utrace_flags(task) & UTRACE_EVENT(SIGNAL_CORE)))
+		return 1;
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
 
@@ -111,6 +114,9 @@ static inline void ptrace_report_syscall
 static inline __must_check int tracehook_report_syscall_entry(
 	struct pt_regs *regs)
 {
+	if ((task_utrace_flags(current) & UTRACE_EVENT(SYSCALL_ENTRY)) &&
+	    utrace_report_syscall_entry(regs))
+		return 1;
 	ptrace_report_syscall(regs);
 	return 0;
 }
@@ -134,6 +140,8 @@ static inline __must_check int tracehook
  */
 static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
 {
+	if (task_utrace_flags(current) & UTRACE_EVENT(SYSCALL_EXIT))
+		utrace_report_syscall_exit(regs);
 	ptrace_report_syscall(regs);
 }
 
@@ -194,6 +202,8 @@ static inline void tracehook_report_exec
 					 struct linux_binprm *bprm,
 					 struct pt_regs *regs)
 {
+	if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(EXEC)))
+		utrace_report_exec(fmt, bprm, regs);
 	if (!ptrace_event(PT_TRACE_EXEC, PTRACE_EVENT_EXEC, 0) &&
 	    unlikely(task_ptrace(current) & PT_PTRACED))
 		send_sig(SIGTRAP, current, 0);
@@ -211,6 +221,8 @@ static inline void tracehook_report_exec
  */
 static inline void tracehook_report_exit(long *exit_code)
 {
+	if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(EXIT)))
+		utrace_report_exit(exit_code);
 	ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code);
 }
 
@@ -254,6 +266,7 @@ static inline int tracehook_prepare_clon
 static inline void tracehook_finish_clone(struct task_struct *child,
 					  unsigned long clone_flags, int trace)
 {
+	utrace_init_task(child);
 	ptrace_init_task(child, (clone_flags & CLONE_PTRACE) || trace);
 }
 
@@ -280,6 +293,8 @@ static inline void tracehook_report_clon
 					  unsigned long clone_flags,
 					  pid_t pid, struct task_struct *child)
 {
+	if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(CLONE)))
+		utrace_report_clone(clone_flags, child);
 	if (unlikely(trace) || unlikely(clone_flags & CLONE_PTRACE)) {
 		/*
 		 * The child starts up with an immediate SIGSTOP.
@@ -311,6 +326,9 @@ static inline void tracehook_report_clon
 						   pid_t pid,
 						   struct task_struct *child)
 {
+	if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(CLONE)) &&
+	    (clone_flags & CLONE_VFORK))
+		utrace_finish_vfork(current);
 	if (unlikely(trace))
 		ptrace_event(0, trace, pid);
 }
@@ -345,6 +363,7 @@ static inline void tracehook_report_vfor
  */
 static inline void tracehook_prepare_release_task(struct task_struct *task)
 {
+	utrace_release_task(task);
 }
 
 /**
@@ -359,6 +378,7 @@ static inline void tracehook_prepare_rel
 static inline void tracehook_finish_release_task(struct task_struct *task)
 {
 	ptrace_release_task(task);
+	BUG_ON(task->exit_state != EXIT_DEAD);
 }
 
 /**
@@ -380,6 +400,8 @@ static inline void tracehook_signal_hand
 					    const struct k_sigaction *ka,
 					    struct pt_regs *regs, int stepping)
 {
+	if (task_utrace_flags(current))
+		utrace_signal_handler(current, stepping);
 	if (stepping)
 		ptrace_notify(SIGTRAP);
 }
@@ -400,6 +422,8 @@ static inline int tracehook_consider_ign
 						    int sig,
 						    void __user *handler)
 {
+	if (unlikely(task_utrace_flags(task) & UTRACE_EVENT(SIGNAL_IGN)))
+		return 1;
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
 
@@ -421,6 +445,9 @@ static inline int tracehook_consider_fat
 						  int sig,
 						  void __user *handler)
 {
+	if (unlikely(task_utrace_flags(task) & (UTRACE_EVENT(SIGNAL_TERM) |
+						UTRACE_EVENT(SIGNAL_CORE))))
+		return 1;
 	return (task_ptrace(task) & PT_PTRACED) != 0;
 }
 
@@ -435,6 +462,8 @@ static inline int tracehook_consider_fat
  */
 static inline int tracehook_force_sigpending(void)
 {
+	if (unlikely(task_utrace_flags(current)))
+		return utrace_interrupt_pending();
 	return 0;
 }
 
@@ -464,6 +493,8 @@ static inline int tracehook_get_signal(s
 				       siginfo_t *info,
 				       struct k_sigaction *return_ka)
 {
+	if (unlikely(task_utrace_flags(task)))
+		return utrace_get_signal(task, regs, info, return_ka);
 	return 0;
 }
 
@@ -491,6 +522,8 @@ static inline int tracehook_get_signal(s
  */
 static inline int tracehook_notify_jctl(int notify, int why)
 {
+	if (task_utrace_flags(current) & UTRACE_EVENT(JCTL))
+		utrace_report_jctl(notify, why);
 	return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
 }
 
@@ -514,6 +547,8 @@ static inline int tracehook_notify_jctl(
 static inline int tracehook_notify_death(struct task_struct *task,
 					 void **death_cookie, int group_dead)
 {
+	*death_cookie = task_utrace_struct(task);
+
 	if (task->exit_signal == -1)
 		return task->ptrace ? SIGCHLD : DEATH_REAP;
 
@@ -550,6 +585,9 @@ static inline void tracehook_report_deat
 					  int signal, void *death_cookie,
 					  int group_dead)
 {
+	smp_mb();
+	if (task_utrace_flags(task) & _UTRACE_DEATH_EVENTS)
+		utrace_report_death(task, death_cookie, group_dead, signal);
 }
 
 #ifdef TIF_NOTIFY_RESUME
@@ -579,10 +617,20 @@ static inline void set_notify_resume(str
  * asynchronously, this will be called again before we return to
  * user mode.
  *
- * Called without locks.
+ * Called without locks.  However, on some machines this may be
+ * called with interrupts disabled.
  */
 static inline void tracehook_notify_resume(struct pt_regs *regs)
 {
+	struct task_struct *task = current;
+	/*
+	 * This pairs with the barrier implicit in set_notify_resume().
+	 * It ensures that we read the nonzero utrace_flags set before
+	 * set_notify_resume() was called by utrace setup.
+	 */
+	smp_rmb();
+	if (task_utrace_flags(task))
+		utrace_resume(task, regs);
 }
 #endif	/* TIF_NOTIFY_RESUME */
 
diff --git a/include/linux/utrace.h b/include/linux/utrace.h
new file mode 100644
index ...f46cc0f 100644  
--- /dev/null
+++ b/include/linux/utrace.h
@@ -0,0 +1,692 @@
+/*
+ * utrace infrastructure interface for debugging user processes
+ *
+ * Copyright (C) 2006-2009 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * Red Hat Author: Roland McGrath.
+ *
+ * This interface allows for notification of interesting events in a
+ * thread.  It also mediates access to thread state such as registers.
+ * Multiple unrelated users can be associated with a single thread.
+ * We call each of these a tracing engine.
+ *
+ * A tracing engine starts by calling utrace_attach_task() or
+ * utrace_attach_pid() on the chosen thread, passing in a set of hooks
+ * (&struct utrace_engine_ops), and some associated data.  This produces a
+ * &struct utrace_engine, which is the handle used for all other
+ * operations.  An attached engine has its ops vector, its data, and an
+ * event mask controlled by utrace_set_events().
+ *
+ * For each event bit that is set, that engine will get the
+ * appropriate ops->report_*() callback when the event occurs.  The
+ * &struct utrace_engine_ops need not provide callbacks for an event
+ * unless the engine sets one of the associated event bits.
+ */
+
+#ifndef _LINUX_UTRACE_H
+#define _LINUX_UTRACE_H	1
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/signal.h>
+#include <linux/sched.h>
+
+struct linux_binprm;
+struct pt_regs;
+struct utrace;
+struct user_regset;
+struct user_regset_view;
+
+/*
+ * Event bits passed to utrace_set_events().
+ * These appear in &struct task_struct. at utrace_flags
+ * and &struct utrace_engine. at flags.
+ */
+enum utrace_events {
+	_UTRACE_EVENT_QUIESCE,	/* Thread is available for examination.  */
+	_UTRACE_EVENT_REAP,  	/* Zombie reaped, no more tracing possible.  */
+	_UTRACE_EVENT_CLONE,	/* Successful clone/fork/vfork just done.  */
+	_UTRACE_EVENT_EXEC,	/* Successful execve just completed.  */
+	_UTRACE_EVENT_EXIT,	/* Thread exit in progress.  */
+	_UTRACE_EVENT_DEATH,	/* Thread has died.  */
+	_UTRACE_EVENT_SYSCALL_ENTRY, /* User entered kernel for system call. */
+	_UTRACE_EVENT_SYSCALL_EXIT, /* Returning to user after system call.  */
+	_UTRACE_EVENT_SIGNAL,	/* Signal delivery will run a user handler.  */
+	_UTRACE_EVENT_SIGNAL_IGN, /* No-op signal to be delivered.  */
+	_UTRACE_EVENT_SIGNAL_STOP, /* Signal delivery will suspend.  */
+	_UTRACE_EVENT_SIGNAL_TERM, /* Signal delivery will terminate.  */
+	_UTRACE_EVENT_SIGNAL_CORE, /* Signal delivery will dump core.  */
+	_UTRACE_EVENT_JCTL,	/* Job control stop or continue completed.  */
+	_UTRACE_NEVENTS
+};
+#define UTRACE_EVENT(type)	(1UL << _UTRACE_EVENT_##type)
+
+/*
+ * All the kinds of signal events.
+ * These all use the @report_signal() callback.
+ */
+#define UTRACE_EVENT_SIGNAL_ALL	(UTRACE_EVENT(SIGNAL) \
+				 | UTRACE_EVENT(SIGNAL_IGN) \
+				 | UTRACE_EVENT(SIGNAL_STOP) \
+				 | UTRACE_EVENT(SIGNAL_TERM) \
+				 | UTRACE_EVENT(SIGNAL_CORE))
+/*
+ * Both kinds of syscall events; these call the @report_syscall_entry()
+ * and @report_syscall_exit() callbacks, respectively.
+ */
+#define UTRACE_EVENT_SYSCALL	\
+	(UTRACE_EVENT(SYSCALL_ENTRY) | UTRACE_EVENT(SYSCALL_EXIT))
+
+/*
+ * The event reports triggered synchronously by task death.
+ */
+#define _UTRACE_DEATH_EVENTS (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE))
+
+/*
+ * Hooks in <linux/tracehook.h> call these entry points to the
+ * utrace dispatch.  They are weak references here only so
+ * tracehook.h doesn't need to #ifndef CONFIG_UTRACE them to
+ * avoid external references in case of unoptimized compilation.
+ */
+bool utrace_interrupt_pending(void)
+	__attribute__((weak));
+void utrace_resume(struct task_struct *, struct pt_regs *)
+	__attribute__((weak));
+int utrace_get_signal(struct task_struct *, struct pt_regs *,
+		      siginfo_t *, struct k_sigaction *)
+	__attribute__((weak));
+void utrace_report_clone(unsigned long, struct task_struct *)
+	__attribute__((weak));
+void utrace_finish_vfork(struct task_struct *)
+	__attribute__((weak));
+void utrace_report_exit(long *exit_code)
+	__attribute__((weak));
+void utrace_report_death(struct task_struct *, struct utrace *, bool, int)
+	__attribute__((weak));
+void utrace_report_jctl(int notify, int type)
+	__attribute__((weak));
+void utrace_report_exec(struct linux_binfmt *, struct linux_binprm *,
+			struct pt_regs *regs)
+	__attribute__((weak));
+bool utrace_report_syscall_entry(struct pt_regs *)
+	__attribute__((weak));
+void utrace_report_syscall_exit(struct pt_regs *)
+	__attribute__((weak));
+void utrace_signal_handler(struct task_struct *, int)
+	__attribute__((weak));
+
+#ifndef CONFIG_UTRACE
+
+/*
+ * <linux/tracehook.h> uses these accessors to avoid #ifdef CONFIG_UTRACE.
+ */
+static inline unsigned long task_utrace_flags(struct task_struct *task)
+{
+	return 0;
+}
+static inline struct utrace *task_utrace_struct(struct task_struct *task)
+{
+	return NULL;
+}
+static inline void utrace_init_task(struct task_struct *child)
+{
+}
+static inline void utrace_release_task(struct task_struct *task)
+{
+}
+
+static inline void task_utrace_proc_status(struct seq_file *m,
+					   struct task_struct *p)
+{
+}
+
+#else  /* CONFIG_UTRACE */
+
+static inline unsigned long task_utrace_flags(struct task_struct *task)
+{
+	return task->utrace_flags;
+}
+
+static inline struct utrace *task_utrace_struct(struct task_struct *task)
+{
+	return &task->utrace;
+}
+
+static inline void utrace_init_task(struct task_struct *task)
+{
+	task->utrace_flags = 0;
+	memset(&task->utrace, 0, sizeof(task->utrace));
+	INIT_LIST_HEAD(&task->utrace.attached);
+	INIT_LIST_HEAD(&task->utrace.attaching);
+	spin_lock_init(&task->utrace.lock);
+}
+
+void utrace_release_task(struct task_struct *);
+void task_utrace_proc_status(struct seq_file *m, struct task_struct *p);
+
+
+/*
+ * Version number of the API defined in this file.  This will change
+ * whenever a tracing engine's code would need some updates to keep
+ * working.  We maintain this here for the benefit of tracing engine code
+ * that is developed concurrently with utrace API improvements before they
+ * are merged into the kernel, making LINUX_VERSION_CODE checks unwieldy.
+ */
+#define UTRACE_API_VERSION	20090302
+
+/**
+ * enum utrace_resume_action - engine's choice of action for a traced task
+ * @UTRACE_STOP:		Stay quiescent after callbacks.
+ * @UTRACE_REPORT:		Make some callback soon.
+ * @UTRACE_INTERRUPT:		Make @report_signal() callback soon.
+ * @UTRACE_SINGLESTEP:		Resume in user mode for one instruction.
+ * @UTRACE_BLOCKSTEP:		Resume in user mode until next branch.
+ * @UTRACE_RESUME:		Resume normally in user mode.
+ * @UTRACE_DETACH:		Detach my engine (implies %UTRACE_RESUME).
+ *
+ * See utrace_control() for detailed descriptions of each action.  This is
+ * encoded in the @action argument and the return value for every callback
+ * with a &u32 return value.
+ *
+ * The order of these is important.  When there is more than one engine,
+ * each supplies its choice and the smallest value prevails.
+ */
+enum utrace_resume_action {
+	UTRACE_STOP,
+	UTRACE_REPORT,
+	UTRACE_INTERRUPT,
+	UTRACE_SINGLESTEP,
+	UTRACE_BLOCKSTEP,
+	UTRACE_RESUME,
+	UTRACE_DETACH
+};
+#define	UTRACE_RESUME_MASK	0x0f
+
+/**
+ * utrace_resume_action - &enum utrace_resume_action from callback action
+ * @action:		&u32 callback @action argument or return value
+ *
+ * This extracts the &enum utrace_resume_action from @action,
+ * which is the @action argument to a &struct utrace_engine_ops
+ * callback or the return value from one.
+ */
+static inline enum utrace_resume_action utrace_resume_action(u32 action)
+{
+	return action & UTRACE_RESUME_MASK;
+}
+
+/**
+ * enum utrace_signal_action - disposition of signal
+ * @UTRACE_SIGNAL_DELIVER:	Deliver according to sigaction.
+ * @UTRACE_SIGNAL_IGN:		Ignore the signal.
+ * @UTRACE_SIGNAL_TERM:		Terminate the process.
+ * @UTRACE_SIGNAL_CORE:		Terminate with core dump.
+ * @UTRACE_SIGNAL_STOP:		Deliver as absolute stop.
+ * @UTRACE_SIGNAL_TSTP:		Deliver as job control stop.
+ * @UTRACE_SIGNAL_REPORT:	Reporting before pending signals.
+ * @UTRACE_SIGNAL_HANDLER:	Reporting after signal handler setup.
+ *
+ * This is encoded in the @action argument and the return value for
+ * a @report_signal() callback.  It says what will happen to the
+ * signal described by the &siginfo_t parameter to the callback.
+ *
+ * The %UTRACE_SIGNAL_REPORT value is used in an @action argument when
+ * a tracing report is being made before dequeuing any pending signal.
+ * If this is immediately after a signal handler has been set up, then
+ * %UTRACE_SIGNAL_HANDLER is used instead.  A @report_signal callback
+ * that uses %UTRACE_SIGNAL_DELIVER|%UTRACE_SINGLESTEP will ensure
+ * it sees a %UTRACE_SIGNAL_HANDLER report.
+ */
+enum utrace_signal_action {
+	UTRACE_SIGNAL_DELIVER	= 0x00,
+	UTRACE_SIGNAL_IGN	= 0x10,
+	UTRACE_SIGNAL_TERM	= 0x20,
+	UTRACE_SIGNAL_CORE	= 0x30,
+	UTRACE_SIGNAL_STOP	= 0x40,
+	UTRACE_SIGNAL_TSTP	= 0x50,
+	UTRACE_SIGNAL_REPORT	= 0x60,
+	UTRACE_SIGNAL_HANDLER	= 0x70
+};
+#define	UTRACE_SIGNAL_MASK	0xf0
+#define UTRACE_SIGNAL_HOLD	0x100 /* Flag, push signal back on queue.  */
+
+/**
+ * utrace_signal_action - &enum utrace_signal_action from callback action
+ * @action:		@report_signal callback @action argument or return value
+ *
+ * This extracts the &enum utrace_signal_action from @action, which
+ * is the @action argument to a @report_signal callback or the
+ * return value from one.
+ */
+static inline enum utrace_signal_action utrace_signal_action(u32 action)
+{
+	return action & UTRACE_SIGNAL_MASK;
+}
+
+/**
+ * enum utrace_syscall_action - disposition of system call attempt
+ * @UTRACE_SYSCALL_RUN:		Run the system call.
+ * @UTRACE_SYSCALL_ABORT:	Don't run the system call.
+ *
+ * This is encoded in the @action argument and the return value for
+ * a @report_syscall_entry callback.
+ */
+enum utrace_syscall_action {
+	UTRACE_SYSCALL_RUN	= 0x00,
+	UTRACE_SYSCALL_ABORT	= 0x10
+};
+#define	UTRACE_SYSCALL_MASK	0xf0
+
+/**
+ * utrace_syscall_action - &enum utrace_syscall_action from callback action
+ * @action:		@report_syscall_entry callback @action or return value
+ *
+ * This extracts the &enum utrace_syscall_action from @action, which
+ * is the @action argument to a @report_syscall_entry callback or the
+ * return value from one.
+ */
+static inline enum utrace_syscall_action utrace_syscall_action(u32 action)
+{
+	return action & UTRACE_SYSCALL_MASK;
+}
+
+/*
+ * Flags for utrace_attach_task() and utrace_attach_pid().
+ */
+#define UTRACE_ATTACH_CREATE		0x0010 /* Attach a new engine.  */
+#define UTRACE_ATTACH_EXCLUSIVE		0x0020 /* Refuse if existing match.  */
+#define UTRACE_ATTACH_MATCH_OPS		0x0001 /* Match engines on ops.  */
+#define UTRACE_ATTACH_MATCH_DATA	0x0002 /* Match engines on data.  */
+#define UTRACE_ATTACH_MATCH_MASK	0x000f
+
+/**
+ * struct utrace_engine - per-engine structure
+ * @ops:	&struct utrace_engine_ops pointer passed to utrace_attach_task()
+ * @data:	engine-private &void * passed to utrace_attach_task()
+ * @flags:	event mask set by utrace_set_events() plus internal flag bits
+ *
+ * The task itself never has to worry about engines detaching while
+ * it's doing event callbacks.  These structures are removed from the
+ * task's active list only when it's stopped, or by the task itself.
+ *
+ * utrace_engine_get() and utrace_engine_put() maintain a reference count.
+ * When it drops to zero, the structure is freed.  One reference is held
+ * implicitly while the engine is attached to its task.
+ */
+struct utrace_engine {
+/* private: */
+	struct kref kref;
+	struct list_head entry;
+
+/* public: */
+	const struct utrace_engine_ops *ops;
+	void *data;
+
+	unsigned long flags;
+};
+
+/**
+ * utrace_engine_get - acquire a reference on a &struct utrace_engine
+ * @engine:	&struct utrace_engine pointer
+ *
+ * You must hold a reference on @engine, and you get another.
+ */
+static inline void utrace_engine_get(struct utrace_engine *engine)
+{
+	kref_get(&engine->kref);
+}
+
+void __utrace_engine_release(struct kref *);
+
+/**
+ * utrace_engine_put - release a reference on a &struct utrace_engine
+ * @engine:	&struct utrace_engine pointer
+ *
+ * You must hold a reference on @engine, and you lose that reference.
+ * If it was the last one, @engine becomes an invalid pointer.
+ */
+static inline void utrace_engine_put(struct utrace_engine *engine)
+{
+	kref_put(&engine->kref, __utrace_engine_release);
+}
+
+/**
+ * struct utrace_engine_ops - tracing engine callbacks
+ *
+ * Each @report_*() callback corresponds to an %UTRACE_EVENT(*) bit.
+ * utrace_set_events() calls on @engine choose which callbacks will be made
+ * to @engine from @task.
+ *
+ * Most callbacks take an @action argument, giving the resume action
+ * chosen by other tracing engines.  All callbacks take an @engine
+ * argument, and a @task argument, which is always equal to @current.
+ * For some calls, @action also includes bits specific to that event
+ * and utrace_resume_action() is used to extract the resume action.
+ * This shows what would happen if @engine wasn't there, or will if
+ * the callback's return value uses %UTRACE_RESUME.  This always
+ * starts as %UTRACE_RESUME when no other tracing is being done on
+ * this task.
+ *
+ * All return values contain &enum utrace_resume_action bits.  For
+ * some calls, other bits specific to that kind of event are added to
+ * the resume action bits with OR.  These are the same bits used in
+ * the @action argument.  The resume action returned by a callback
+ * does not override previous engines' choices, it only says what
+ * @engine wants done.  What @task actually does is the action that's
+ * most constrained among the choices made by all attached engines.
+ * See utrace_control() for more information on the actions.
+ *
+ * When %UTRACE_STOP is used in @report_syscall_entry, then @task
+ * stops before attempting the system call.  In other cases, the
+ * resume action does not take effect until @task is ready to check
+ * for signals and return to user mode.  If there are more callbacks
+ * to be made, the last round of calls determines the final action.
+ * A @report_quiesce callback with @event zero, or a @report_signal
+ * callback, will always be the last one made before @task resumes.
+ * Only %UTRACE_STOP is "sticky"--if @engine returned %UTRACE_STOP
+ * then @task stays stopped unless @engine returns different from a
+ * following callback.
+ *
+ * The report_death() and report_reap() callbacks do not take @action
+ * arguments, and only %UTRACE_DETACH is meaningful in the return value
+ * from a report_death() callback.  None of the resume actions applies
+ * to a dead thread.
+ *
+ * All @report_*() hooks are called with no locks held, in a generally
+ * safe environment when we will be returning to user mode soon (or just
+ * entered the kernel).  It is fine to block for memory allocation and
+ * the like, but all hooks are asynchronous and must not block on
+ * external events!  If you want the thread to block, use %UTRACE_STOP
+ * in your hook's return value; then later wake it up with utrace_control().
+ *
+ * @report_quiesce:
+ *	Requested by %UTRACE_EVENT(%QUIESCE).
+ *	This does not indicate any event, but just that @task (the current
+ *	thread) is in a safe place for examination.  This call is made
+ *	before each specific event callback, except for @report_reap.
+ *	The @event argument gives the %UTRACE_EVENT(@which) value for
+ *	the event occurring.  This callback might be made for events @engine
+ *	has not requested, if some other engine is tracing the event;
+ *	calling utrace_set_events() call here can request the immediate
+ *	callback for this occurrence of @event.  @event is zero when there
+ *	is no other event, @task is now ready to check for signals and
+ *	return to user mode, and some engine has used %UTRACE_REPORT or
+ *	%UTRACE_INTERRUPT to request this callback.  For this case,
+ *	if @report_signal is not %NULL, the @report_quiesce callback
+ *	may be replaced with a @report_signal callback passing
+ *	%UTRACE_SIGNAL_REPORT in its @action argument, whenever @task is
+ *	entering the signal-check path anyway.
+ *
+ * @report_signal:
+ *	Requested by %UTRACE_EVENT(%SIGNAL_*) or %UTRACE_EVENT(%QUIESCE).
+ *	Use utrace_signal_action() and utrace_resume_action() on @action.
+ *	The signal action is %UTRACE_SIGNAL_REPORT when some engine has
+ *	used %UTRACE_REPORT or %UTRACE_INTERRUPT; the callback can choose
+ *	to stop or to deliver an artificial signal, before pending signals.
+ *	It's %UTRACE_SIGNAL_HANDLER instead when signal handler setup just
+ *	finished (after a previous %UTRACE_SIGNAL_DELIVER return); this
+ *	serves in lieu of any %UTRACE_SIGNAL_REPORT callback requested by
+ *	%UTRACE_REPORT or %UTRACE_INTERRUPT, and is also implicitly
+ *	requested by %UTRACE_SINGLESTEP or %UTRACE_BLOCKSTEP into the
+ *	signal delivery.  The other signal actions indicate a signal about
+ *	to be delivered; the previous engine's return value sets the signal
+ *	action seen by the the following engine's callback.  The @info data
+ *	can be changed at will, including @info->si_signo.  The settings in
+ *	@return_ka determines what %UTRACE_SIGNAL_DELIVER does.  @orig_ka
+ *	is what was in force before other tracing engines intervened, and
+ *	it's %NULL when this report began as %UTRACE_SIGNAL_REPORT or
+ *	%UTRACE_SIGNAL_HANDLER.  For a report without a new signal, @info
+ *	is left uninitialized and must be set completely by an engine that
+ *	chooses to deliver a signal; if there was a previous @report_signal
+ *	callback ending in %UTRACE_STOP and it was just resumed using
+ *	%UTRACE_REPORT or %UTRACE_INTERRUPT, then @info is left unchanged
+ *	from the previous callback.  In this way, the original signal can
+ *	be left in @info while returning %UTRACE_STOP|%UTRACE_SIGNAL_IGN
+ *	and then found again when resuming @task with %UTRACE_INTERRUPT.
+ *	The %UTRACE_SIGNAL_HOLD flag bit can be OR'd into the return value,
+ *	and might be in @action if the previous engine returned it.  This
+ *	flag asks that the signal in @info be pushed back on @task's queue
+ *	so that it will be seen again after whatever action is taken now.
+ *
+ * @report_clone:
+ *	Requested by %UTRACE_EVENT(%CLONE).
+ *	Event reported for parent, before the new task @child might run.
+ *	@clone_flags gives the flags used in the clone system call,
+ *	or equivalent flags for a fork() or vfork() system call.
+ *	This function can use utrace_attach_task() on @child.  It's guaranteed
+ *	that asynchronous utrace_attach_task() calls will be ordered after
+ *	any calls in @report_clone callbacks for the parent.  Thus
+ *	when using %UTRACE_ATTACH_EXCLUSIVE in the asynchronous calls,
+ *	you can be sure that the parent's @report_clone callback has
+ *	already attached to @child or chosen not to.  Passing %UTRACE_STOP
+ *	to utrace_control() on @child here keeps the child stopped before
+ *	it ever runs in user mode, %UTRACE_REPORT or %UTRACE_INTERRUPT
+ *	ensures a callback from @child before it starts in user mode.
+ *
+ * @report_jctl:
+ *	Requested by %UTRACE_EVENT(%JCTL).
+ *	Job control event; @type is %CLD_STOPPED or %CLD_CONTINUED,
+ *	indicating whether we are stopping or resuming now.  If @notify
+ *	is nonzero, @task is the last thread to stop and so will send
+ *	%SIGCHLD to its parent after this callback; @notify reflects
+ *	what the parent's %SIGCHLD has in @si_code, which can sometimes
+ *	be %CLD_STOPPED even when @type is %CLD_CONTINUED.
+ *
+ * @report_exec:
+ *	Requested by %UTRACE_EVENT(%EXEC).
+ *	An execve system call has succeeded and the new program is about to
+ *	start running.  The initial user register state is handy to be tweaked
+ *	directly in @regs.  @fmt and @bprm gives the details of this exec.
+ *
+ * @report_syscall_entry:
+ *	Requested by %UTRACE_EVENT(%SYSCALL_ENTRY).
+ *	Thread has entered the kernel to request a system call.
+ *	The user register state is handy to be tweaked directly in @regs.
+ *	The @action argument contains an &enum utrace_syscall_action,
+ *	use utrace_syscall_action() to extract it.  The return value
+ *	overrides the last engine's action for the system call.
+ *	If the final action is %UTRACE_SYSCALL_ABORT, no system call
+ *	is made.  The details of the system call being attempted can
+ *	be fetched here with syscall_get_nr() and syscall_get_arguments().
+ *	The parameter registers can be changed with syscall_set_arguments().
+ *
+ * @report_syscall_exit:
+ *	Requested by %UTRACE_EVENT(%SYSCALL_EXIT).
+ *	Thread is about to leave the kernel after a system call request.
+ *	The user register state is handy to be tweaked directly in @regs.
+ *	The results of the system call attempt can be examined here using
+ *	syscall_get_error() and syscall_get_return_value().  It is safe
+ *	here to call syscall_set_return_value() or syscall_rollback().
+ *
+ * @report_exit:
+ *	Requested by %UTRACE_EVENT(%EXIT).
+ *	Thread is exiting and cannot be prevented from doing so,
+ *	but all its state is still live.  The @code value will be
+ *	the wait result seen by the parent, and can be changed by
+ *	this engine or others.  The @orig_code value is the real
+ *	status, not changed by any tracing engine.  Returning %UTRACE_STOP
+ *	here keeps @task stopped before it cleans up its state and dies,
+ *	so it can be examined by other processes.  When @task is allowed
+ *	to run, it will die and get to the @report_death callback.
+ *
+ * @report_death:
+ *	Requested by %UTRACE_EVENT(%DEATH).
+ *	Thread is really dead now.  It might be reaped by its parent at
+ *	any time, or self-reap immediately.  Though the actual reaping
+ *	may happen in parallel, a report_reap() callback will always be
+ *	ordered after a report_death() callback.
+ *
+ * @report_reap:
+ *	Requested by %UTRACE_EVENT(%REAP).
+ *	Called when someone reaps the dead task (parent, init, or self).
+ *	This means the parent called wait, or else this was a detached
+ *	thread or a process whose parent ignores SIGCHLD.
+ *	No more callbacks are made after this one.
+ *	The engine is always detached.
+ *	There is nothing more a tracing engine can do about this thread.
+ *	After this callback, the @engine pointer will become invalid.
+ *	The @task pointer may become invalid if get_task_struct() hasn't
+ *	been used to keep it alive.
+ *	An engine should always request this callback if it stores the
+ *	@engine pointer or stores any pointer in @engine->data, so it
+ *	can clean up its data structures.
+ *	Unlike other callbacks, this can be called from the parent's context
+ *	rather than from the traced thread itself--it must not delay the
+ *	parent by blocking.
+ */
+struct utrace_engine_ops {
+	u32 (*report_quiesce)(enum utrace_resume_action action,
+			      struct utrace_engine *engine,
+			      struct task_struct *task,
+			      unsigned long event);
+	u32 (*report_signal)(u32 action,
+			     struct utrace_engine *engine,
+			     struct task_struct *task,
+			     struct pt_regs *regs,
+			     siginfo_t *info,
+			     const struct k_sigaction *orig_ka,
+			     struct k_sigaction *return_ka);
+	u32 (*report_clone)(enum utrace_resume_action action,
+			    struct utrace_engine *engine,
+			    struct task_struct *parent,
+			    unsigned long clone_flags,
+			    struct task_struct *child);
+	u32 (*report_jctl)(enum utrace_resume_action action,
+			   struct utrace_engine *engine,
+			   struct task_struct *task,
+			   int type, int notify);
+	u32 (*report_exec)(enum utrace_resume_action action,
+			   struct utrace_engine *engine,
+			   struct task_struct *task,
+			   const struct linux_binfmt *fmt,
+			   const struct linux_binprm *bprm,
+			   struct pt_regs *regs);
+	u32 (*report_syscall_entry)(u32 action,
+				    struct utrace_engine *engine,
+				    struct task_struct *task,
+				    struct pt_regs *regs);
+	u32 (*report_syscall_exit)(enum utrace_resume_action action,
+				   struct utrace_engine *engine,
+				   struct task_struct *task,
+				   struct pt_regs *regs);
+	u32 (*report_exit)(enum utrace_resume_action action,
+			   struct utrace_engine *engine,
+			   struct task_struct *task,
+			   long orig_code, long *code);
+	u32 (*report_death)(struct utrace_engine *engine,
+			    struct task_struct *task,
+			    bool group_dead, int signal);
+	void (*report_reap)(struct utrace_engine *engine,
+			    struct task_struct *task);
+};
+
+/**
+ * struct utrace_examiner - private state for using utrace_prepare_examine()
+ *
+ * The members of &struct utrace_examiner are private to the implementation.
+ * This data type holds the state from a call to utrace_prepare_examine()
+ * to be used by a call to utrace_finish_examine().
+ */
+struct utrace_examiner {
+/* private: */
+	long state;
+	unsigned long ncsw;
+};
+
+/*
+ * These are the exported entry points for tracing engines to use.
+ * See kernel/utrace.c for their kerneldoc comments with interface details.
+ */
+struct utrace_engine *utrace_attach_task(struct task_struct *, int,
+					 const struct utrace_engine_ops *,
+					 void *);
+struct utrace_engine *utrace_attach_pid(struct pid *, int,
+					const struct utrace_engine_ops *,
+					void *);
+int __must_check utrace_control(struct task_struct *,
+				struct utrace_engine *,
+				enum utrace_resume_action);
+int __must_check utrace_set_events(struct task_struct *,
+				   struct utrace_engine *,
+				   unsigned long eventmask);
+int __must_check utrace_barrier(struct task_struct *,
+				struct utrace_engine *);
+int __must_check utrace_prepare_examine(struct task_struct *,
+					struct utrace_engine *,
+					struct utrace_examiner *);
+int __must_check utrace_finish_examine(struct task_struct *,
+				       struct utrace_engine *,
+				       struct utrace_examiner *);
+
+/**
+ * utrace_control_pid - control a thread being traced by a tracing engine
+ * @pid:		thread to affect
+ * @engine:		attached engine to affect
+ * @action:		&enum utrace_resume_action for thread to do
+ *
+ * This is the same as utrace_control(), but takes a &struct pid
+ * pointer rather than a &struct task_struct pointer.  The caller must
+ * hold a ref on @pid, but does not need to worry about the task
+ * staying valid.  If it's been reaped so that @pid points nowhere,
+ * then this call returns -%ESRCH.
+ */
+static inline __must_check int utrace_control_pid(
+	struct pid *pid, struct utrace_engine *engine,
+	enum utrace_resume_action action)
+{
+	/*
+	 * We don't bother with rcu_read_lock() here to protect the
+	 * task_struct pointer, because utrace_control will return
+	 * -ESRCH without looking at that pointer if the engine is
+	 * already detached.  A task_struct pointer can't die before
+	 * all the engines are detached in release_task() first.
+	 */
+	struct task_struct *task = pid_task(pid, PIDTYPE_PID);
+	return unlikely(!task) ? -ESRCH : utrace_control(task, engine, action);
+}
+
+/**
+ * utrace_set_events_pid - choose which event reports a tracing engine gets
+ * @pid:		thread to affect
+ * @engine:		attached engine to affect
+ * @eventmask:		new event mask
+ *
+ * This is the same as utrace_set_events(), but takes a &struct pid
+ * pointer rather than a &struct task_struct pointer.  The caller must
+ * hold a ref on @pid, but does not need to worry about the task
+ * staying valid.  If it's been reaped so that @pid points nowhere,
+ * then this call returns -%ESRCH.
+ */
+static inline __must_check int utrace_set_events_pid(
+	struct pid *pid, struct utrace_engine *engine, unsigned long eventmask)
+{
+	struct task_struct *task = pid_task(pid, PIDTYPE_PID);
+	return unlikely(!task) ? -ESRCH :
+		utrace_set_events(task, engine, eventmask);
+}
+
+/**
+ * utrace_barrier_pid - synchronize with simultaneous tracing callbacks
+ * @pid:		thread to affect
+ * @engine:		engine to affect (can be detached)
+ *
+ * This is the same as utrace_barrier(), but takes a &struct pid
+ * pointer rather than a &struct task_struct pointer.  The caller must
+ * hold a ref on @pid, but does not need to worry about the task
+ * staying valid.  If it's been reaped so that @pid points nowhere,
+ * then this call returns -%ESRCH.
+ */
+static inline __must_check int utrace_barrier_pid(struct pid *pid,
+						  struct utrace_engine *engine)
+{
+	struct task_struct *task = pid_task(pid, PIDTYPE_PID);
+	return unlikely(!task) ? -ESRCH : utrace_barrier(task, engine);
+}
+
+#endif	/* CONFIG_UTRACE */
+
+#endif	/* linux/utrace.h */
diff --git a/include/linux/utrace_struct.h b/include/linux/utrace_struct.h
new file mode 100644
index ...aba7e09 100644  
--- /dev/null
+++ b/include/linux/utrace_struct.h
@@ -0,0 +1,58 @@
+/*
+ * 'struct utrace' data structure for kernel/utrace.c private use.
+ *
+ * Copyright (C) 2006-2009 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ */
+
+#ifndef _LINUX_UTRACE_STRUCT_H
+#define _LINUX_UTRACE_STRUCT_H	1
+
+#ifdef CONFIG_UTRACE
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+
+/*
+ * Per-thread structure private to utrace implementation.  This properly
+ * belongs in kernel/utrace.c and its use is entirely private to the code
+ * there.  It is only defined in a header file so that it can be embedded
+ * in the struct task_struct layout.  It is here rather than in utrace.h
+ * to avoid header nesting order issues getting too complex.
+ *
+ */
+struct utrace {
+	struct task_struct *cloning;
+
+	struct list_head attached, attaching;
+	spinlock_t lock;
+
+	struct utrace_engine *reporting;
+
+	unsigned int stopped:1;
+	unsigned int report:1;
+	unsigned int interrupt:1;
+	unsigned int signal_handler:1;
+	unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */
+	unsigned int death:1;	/* in utrace_report_death() now */
+	unsigned int reap:1;	/* release_task() has run */
+};
+
+# define INIT_UTRACE(tsk)						      \
+	.utrace_flags = 0,						      \
+	.utrace = {							      \
+		.lock = __SPIN_LOCK_UNLOCKED(tsk.utrace.lock),		      \
+		.attached = LIST_HEAD_INIT(tsk.utrace.attached),	      \
+		.attaching = LIST_HEAD_INIT(tsk.utrace.attaching),	      \
+	},
+
+#else
+
+# define INIT_UTRACE(tsk)	/* Nothing. */
+
+#endif	/* CONFIG_UTRACE */
+
+#endif	/* linux/utrace_struct.h */
diff --git a/init/Kconfig b/init/Kconfig
index 6a5c5fe..4b5ab3e 100644  
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1060,6 +1060,15 @@ config STOP_MACHINE
 	help
 	  Need stop_machine() primitive.
 
+menuconfig UTRACE
+	bool "Infrastructure for tracing and debugging user processes"
+	depends on EXPERIMENTAL
+	depends on HAVE_ARCH_TRACEHOOK
+	help
+	  Enable the utrace process tracing interface.  This is an internal
+	  kernel interface exported to kernel modules, to track events in
+	  user threads, extract and change user thread state.
+
 source "block/Kconfig"
 
 config PREEMPT_NOTIFIERS
diff --git a/kernel/Makefile b/kernel/Makefile
index e4791b3..7bff724 100644  
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
+obj-$(CONFIG_UTRACE) += utrace.o
 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
 obj-$(CONFIG_AUDIT_TREE) += audit_tree.o
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c9cf48b..41e9542 100644  
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -16,6 +16,7 @@
 #include <linux/pagemap.h>
 #include <linux/smp_lock.h>
 #include <linux/ptrace.h>
+#include <linux/utrace.h>
 #include <linux/security.h>
 #include <linux/signal.h>
 #include <linux/audit.h>
@@ -172,6 +173,14 @@ bool ptrace_may_access(struct task_struc
 	return (!err ? true : false);
 }
 
+/*
+ * For experimental use of utrace, exclude ptrace on the same task.
+ */
+static inline bool exclude_ptrace(struct task_struct *task)
+{
+	return unlikely(!!task_utrace_flags(task));
+}
+
 int ptrace_attach(struct task_struct *task)
 {
 	int retval;
@@ -210,6 +219,11 @@ repeat:
 		goto repeat;
 	}
 
+	if (exclude_ptrace(task)) {
+		retval = -EBUSY;
+		goto bad;
+	}
+
 	if (!task->mm)
 		goto bad;
 	/* the same process cannot be attached many times */
@@ -515,7 +529,9 @@ int ptrace_traceme(void)
 	 */
 repeat:
 	task_lock(current);
-	if (!(current->ptrace & PT_PTRACED)) {
+	if (exclude_ptrace(current)) {
+		ret = -EBUSY;
+	} else if (!(current->ptrace & PT_PTRACED)) {
 		/*
 		 * See ptrace_attach() comments about the locking here.
 		 */
diff --git a/kernel/utrace.c b/kernel/utrace.c
new file mode 100644
index ...3af06a6 100644  
--- /dev/null
+++ b/kernel/utrace.c
@@ -0,0 +1,2348 @@
+/*
+ * utrace infrastructure interface for debugging user processes
+ *
+ * Copyright (C) 2006-2009 Red Hat, Inc.  All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * Red Hat Author: Roland McGrath.
+ */
+
+#include <linux/utrace.h>
+#include <linux/tracehook.h>
+#include <linux/regset.h>
+#include <asm/syscall.h>
+#include <linux/ptrace.h>
+#include <linux/err.h>
+#include <linux/sched.h>
+#include <linux/freezer.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/seq_file.h>
+
+
+/*
+ * Rules for 'struct utrace', defined in <linux/utrace_struct.h>
+ * but used entirely privately in this file.
+ *
+ * The common event reporting loops are done by the task making the
+ * report without ever taking any locks.  To facilitate this, the two
+ * lists @attached and @attaching work together for smooth asynchronous
+ * attaching with low overhead.  Modifying either list requires @lock.
+ * The @attaching list can be modified any time while holding @lock.
+ * New engines being attached always go on this list.
+ *
+ * The @attached list is what the task itself uses for its reporting
+ * loops.  When the task itself is not quiescent, it can use the
+ * @attached list without taking any lock.  Nobody may modify the list
+ * when the task is not quiescent.  When it is quiescent, that means
+ * that it won't run again without taking @lock itself before using
+ * the list.
+ *
+ * At each place where we know the task is quiescent (or it's current),
+ * while holding @lock, we call splice_attaching(), below.  This moves
+ * the @attaching list members on to the end of the @attached list.
+ * Since this happens at the start of any reporting pass, any new
+ * engines attached asynchronously go on the stable @attached list
+ * in time to have their callbacks seen.
+ */
+
+static struct kmem_cache *utrace_engine_cachep;
+static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */
+
+static int __init utrace_init(void)
+{
+	utrace_engine_cachep = KMEM_CACHE(utrace_engine, SLAB_PANIC);
+	return 0;
+}
+module_init(utrace_init);
+
+/*
+ * This is called with @utrace->lock held when the task is safely
+ * quiescent, i.e. it won't consult utrace->attached without the lock.
+ * Move any engines attached asynchronously from @utrace->attaching
+ * onto the @utrace->attached list.
+ */
+static void splice_attaching(struct utrace *utrace)
+{
+	list_splice_tail_init(&utrace->attaching, &utrace->attached);
+}
+
+/*
+ * This is the exported function used by the utrace_engine_put() inline.
+ */
+void __utrace_engine_release(struct kref *kref)
+{
+	struct utrace_engine *engine = container_of(kref, struct utrace_engine,
+						    kref);
+	BUG_ON(!list_empty(&engine->entry));
+	kmem_cache_free(utrace_engine_cachep, engine);
+}
+EXPORT_SYMBOL_GPL(__utrace_engine_release);
+
+static bool engine_matches(struct utrace_engine *engine, int flags,
+			   const struct utrace_engine_ops *ops, void *data)
+{
+	if ((flags & UTRACE_ATTACH_MATCH_OPS) && engine->ops != ops)
+		return false;
+	if ((flags & UTRACE_ATTACH_MATCH_DATA) && engine->data != data)
+		return false;
+	return engine->ops && engine->ops != &utrace_detached_ops;
+}
+
+static struct utrace_engine *matching_engine(
+	struct utrace *utrace, int flags,
+	const struct utrace_engine_ops *ops, void *data)
+{
+	struct utrace_engine *engine;
+	list_for_each_entry(engine, &utrace->attached, entry)
+		if (engine_matches(engine, flags, ops, data))
+			return engine;
+	list_for_each_entry(engine, &utrace->attaching, entry)
+		if (engine_matches(engine, flags, ops, data))
+			return engine;
+	return NULL;
+}
+
+/*
+ * For experimental use, utrace attach is mutually exclusive with ptrace.
+ */
+static inline bool exclude_utrace(struct task_struct *task)
+{
+	return unlikely(!!task->ptrace);
+}
+
+/*
+ * Called without locks, when we might be the first utrace engine to attach.
+ * If this is a newborn thread and we are not the creator, we have to wait
+ * for it.  The creator gets the first chance to attach.  The PF_STARTING
+ * flag is cleared after its report_clone hook has had a chance to run.
+ */
+static inline int utrace_attach_delay(struct task_struct *target)
+{
+	if ((target->flags & PF_STARTING) &&
+	    current->utrace.cloning != target)
+		do {
+			schedule_timeout_interruptible(1);
+			if (signal_pending(current))
+				return -ERESTARTNOINTR;
+		} while (target->flags & PF_STARTING);
+
+	return 0;
+}
+
+/*
+ * Enqueue @engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE.
+ */
+static int utrace_add_engine(struct task_struct *target,
+			     struct utrace *utrace,
+			     struct utrace_engine *engine,
+			     int flags,
+			     const struct utrace_engine_ops *ops,
+			     void *data)
+{
+	int ret;
+
+	spin_lock(&utrace->lock);
+
+	if (utrace->reap) {
+		/*
+		 * Already entered utrace_release_task(), cannot attach now.
+		 */
+		ret = -ESRCH;
+	} else if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
+	    unlikely(matching_engine(utrace, flags, ops, data))) {
+		ret = -EEXIST;
+	} else {
+		/*
+		 * Put the new engine on the pending ->attaching list.
+		 * Make sure it gets onto the ->attached list by the next
+		 * time it's examined.
+		 *
+		 * When target == current, it would be safe just to call
+		 * splice_attaching() right here.  But if we're inside a
+		 * callback, that would mean the new engine also gets
+		 * notified about the event that precipitated its own
+		 * creation.  This is not what the user wants.
+		 *
+		 * Setting ->report ensures that start_report() takes the
+		 * lock and does it next time.  Whenever setting ->report,
+		 * we must maintain the invariant that TIF_NOTIFY_RESUME is
+		 * also set.  Otherwise utrace_control() or utrace_do_stop()
+		 * might skip setting TIF_NOTIFY_RESUME upon seeing ->report
+		 * already set, and we'd miss a necessary callback.
+		 *
+		 * In case we had no engines before, make sure that
+		 * utrace_flags is not zero when tracehook_notify_resume()
+		 * checks.  That would bypass utrace reporting clearing
+		 * TIF_NOTIFY_RESUME, and thus violate the same invariant.
+		 */
+		target->utrace_flags |= UTRACE_EVENT(REAP);
+		list_add_tail(&engine->entry, &utrace->attaching);
+		utrace->report = 1;
+		set_notify_resume(target);
+
+		ret = 0;
+	}
+
+	spin_unlock(&utrace->lock);
+
+	return ret;
+}
+
+/**
+ * utrace_attach_task - attach new engine, or look up an attached engine
+ * @target:	thread to attach to
+ * @flags:	flag bits combined with OR, see below
+ * @ops:	callback table for new engine
+ * @data:	engine private data pointer
+ *
+ * The caller must ensure that the @target thread does not get freed,
+ * i.e. hold a ref or be its parent.  It is always safe to call this
+ * on @current, or on the @child pointer in a @report_clone callback.
+ * For most other cases, it's easier to use utrace_attach_pid() instead.
+ *
+ * UTRACE_ATTACH_CREATE:
+ * Create a new engine.  If %UTRACE_ATTACH_CREATE is not specified, you
+ * only look up an existing engine already attached to the thread.
+ *
+ * UTRACE_ATTACH_EXCLUSIVE:
+ * Attempting to attach a second (matching) engine fails with -%EEXIST.
+ *
+ * UTRACE_ATTACH_MATCH_OPS: Only consider engines matching @ops.
+ * UTRACE_ATTACH_MATCH_DATA: Only consider engines matching @data.
+ */
+struct utrace_engine *utrace_attach_task(
+	struct task_struct *target, int flags,
+	const struct utrace_engine_ops *ops, void *data)
+{
+	struct utrace *utrace;
+	struct utrace_engine *engine;
+	int ret;
+
+	utrace = &target->utrace;
+
+	if (unlikely(target->exit_state == EXIT_DEAD)) {
+		/*
+		 * The target has already been reaped.
+		 * Check this early, though it's not synchronized.
+		 * utrace_add_engine() will do the final check.
+		 */
+		if (!(flags & UTRACE_ATTACH_CREATE))
+			return ERR_PTR(-ENOENT);
+		return ERR_PTR(-ESRCH);
+	}
+
+	if (!(flags & UTRACE_ATTACH_CREATE)) {
+		spin_lock(&utrace->lock);
+		engine = matching_engine(utrace, flags, ops, data);
+		if (engine)
+			utrace_engine_get(engine);
+		spin_unlock(&utrace->lock);
+		return engine ?: ERR_PTR(-ENOENT);
+	}
+
+	if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops))
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(target->flags & PF_KTHREAD))
+		/*
+		 * Silly kernel, utrace is for users!
+		 */
+		return ERR_PTR(-EPERM);
+
+	engine = kmem_cache_alloc(utrace_engine_cachep, GFP_KERNEL);
+	if (unlikely(!engine))
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Initialize the new engine structure.  It starts out with two
+	 * refs: one ref to return, and one ref for being attached.
+	 */
+	kref_set(&engine->kref, 2);
+	engine->flags = 0;
+	engine->ops = ops;
+	engine->data = data;
+
+	ret = utrace_attach_delay(target);
+	if (likely(!ret))
+		ret = utrace_add_engine(target, utrace, engine,
+					flags, ops, data);
+
+	if (unlikely(ret)) {
+		kmem_cache_free(utrace_engine_cachep, engine);
+		engine = ERR_PTR(ret);
+	}
+
+	return engine;
+}
+EXPORT_SYMBOL_GPL(utrace_attach_task);
+
+/**
+ * utrace_attach_pid - attach new engine, or look up an attached engine
+ * @pid:	&struct pid pointer representing thread to attach to
+ * @flags:	flag bits combined with OR, see utrace_attach_task()
+ * @ops:	callback table for new engine
+ * @data:	engine private data pointer
+ *
+ * This is the same as utrace_attach_task(), but takes a &struct pid
+ * pointer rather than a &struct task_struct pointer.  The caller must
+ * hold a ref on @pid, but does not need to worry about the task
+ * staying valid.  If it's been reaped so that @pid points nowhere,
+ * then this call returns -%ESRCH.
+ */
+struct utrace_engine *utrace_attach_pid(
+	struct pid *pid, int flags,
+	const struct utrace_engine_ops *ops, void *data)
+{
+	struct utrace_engine *engine = ERR_PTR(-ESRCH);
+	struct task_struct *task = get_pid_task(pid, PIDTYPE_PID);
+	if (task) {
+		engine = utrace_attach_task(task, flags, ops, data);
+		put_task_struct(task);
+	}
+	return engine;
+}
+EXPORT_SYMBOL_GPL(utrace_attach_pid);
+
+/*
+ * When an engine is detached, the target thread may still see it and
+ * make callbacks until it quiesces.  We install a special ops vector
+ * with these two callbacks.  When the target thread quiesces, it can
+ * safely free the engine itself.  For any event we will always get
+ * the report_quiesce() callback first, so we only need this one
+ * pointer to be set.  The only exception is report_reap(), so we
+ * supply that callback too.
+ */
+static u32 utrace_detached_quiesce(enum utrace_resume_action action,
+				   struct utrace_engine *engine,
+				   struct task_struct *task,
+				   unsigned long event)
+{
+	return UTRACE_DETACH;
+}
+
+static void utrace_detached_reap(struct utrace_engine *engine,
+				 struct task_struct *task)
+{
+}
+
+static const struct utrace_engine_ops utrace_detached_ops = {
+	.report_quiesce = &utrace_detached_quiesce,
+	.report_reap = &utrace_detached_reap
+};
+
+/*
+ * After waking up from TASK_TRACED, clear bookkeeping in @utrace.
+ * Returns true if we were woken up prematurely by SIGKILL.
+ */
+static inline bool finish_utrace_stop(struct task_struct *task,
+				      struct utrace *utrace)
+{
+	bool killed = false;
+
+	/*
+	 * utrace_wakeup() clears @utrace->stopped before waking us up.
+	 * We're officially awake if it's clear.
+	 */
+	spin_lock(&utrace->lock);
+	if (unlikely(utrace->stopped)) {
+		/*
+		 * If we're here with it still set, it must have been
+		 * signal_wake_up() instead, waking us up for a SIGKILL.
+		 */
+		spin_lock_irq(&task->sighand->siglock);
+		WARN_ON(!sigismember(&task->pending.signal, SIGKILL));
+		spin_unlock_irq(&task->sighand->siglock);
+		utrace->stopped = 0;
+		killed = true;
+	}
+	spin_unlock(&utrace->lock);
+
+	return killed;
+}
+
+/*
+ * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
+ * @task == current, @utrace == current->utrace, which is not locked.
+ * Return true if we were woken up by SIGKILL even though some utrace
+ * engine may still want us to stay stopped.
+ */
+static bool utrace_stop(struct task_struct *task, struct utrace *utrace,
+			bool report)
+{
+	bool killed;
+
+	/*
+	 * @utrace->stopped is the flag that says we are safely
+	 * inside this function.  It should never be set on entry.
+	 */
+	BUG_ON(utrace->stopped);
+
+	/*
+	 * The siglock protects us against signals.  As well as SIGKILL
+	 * waking us up, we must synchronize with the signal bookkeeping
+	 * for stop signals and SIGCONT.
+	 */
+	spin_lock(&utrace->lock);
+	spin_lock_irq(&task->sighand->siglock);
+
+	if (unlikely(sigismember(&task->pending.signal, SIGKILL))) {
+		spin_unlock_irq(&task->sighand->siglock);
+		spin_unlock(&utrace->lock);
+		return true;
+	}
+
+	if (report) {
+		/*
+		 * Ensure a reporting pass when we're resumed.
+		 */
+		utrace->report = 1;
+		set_thread_flag(TIF_NOTIFY_RESUME);
+	}
+
+	utrace->stopped = 1;
+	__set_current_state(TASK_TRACED);
+
+	/*
+	 * If there is a group stop in progress,
+	 * we must participate in the bookkeeping.
+	 */
+	if (task->signal->group_stop_count > 0)
+		--task->signal->group_stop_count;
+
+	spin_unlock_irq(&task->sighand->siglock);
+	spin_unlock(&utrace->lock);
+
+	schedule();
+
+	/*
+	 * While in TASK_TRACED, we were considered "frozen enough".
+	 * Now that we woke up, it's crucial if we're supposed to be
+	 * frozen that we freeze now before running anything substantial.
+	 */
+	try_to_freeze();
+
+	killed = finish_utrace_stop(task, utrace);
+
+	/*
+	 * While we were in TASK_TRACED, complete_signal() considered
+	 * us "uninterested" in signal wakeups.  Now make sure our
+	 * TIF_SIGPENDING state is correct for normal running.
+	 */
+	spin_lock_irq(&task->sighand->siglock);
+	recalc_sigpending();
+	spin_unlock_irq(&task->sighand->siglock);
+
+	return killed;
+}
+
+/*
+ * The caller has to hold a ref on the engine.  If the attached flag is
+ * true (all but utrace_barrier() calls), the engine is supposed to be
+ * attached.  If the attached flag is false (utrace_barrier() only),
+ * then return -ERESTARTSYS for an engine marked for detach but not yet
+ * fully detached.  The task pointer can be invalid if the engine is
+ * detached.
+ *
+ * Get the utrace lock for the target task.
+ * Returns the struct if locked, or ERR_PTR(-errno).
+ *
+ * This has to be robust against races with:
+ *	utrace_control(target, UTRACE_DETACH) calls
+ *	UTRACE_DETACH after reports
+ *	utrace_report_death
+ *	utrace_release_task
+ */
+static struct utrace *get_utrace_lock(struct task_struct *target,
+				      struct utrace_engine *engine,
+				      bool attached)
+	__acquires(utrace->lock)
+{
+	struct utrace *utrace;
+
+	rcu_read_lock();
+
+	/*
+	 * If this engine was already detached, bail out before we look at
+	 * the task_struct pointer at all.  If it's detached after this
+	 * check, then RCU is still keeping this task_struct pointer valid.
+	 *
+	 * The ops pointer is NULL when the engine is fully detached.
+	 * It's &utrace_detached_ops when it's marked detached but still
+	 * on the list.  In the latter case, utrace_barrier() still works,
+	 * since the target might be in the middle of an old callback.
+	 */
+	if (unlikely(!engine->ops)) {
+		rcu_read_unlock();
+		return ERR_PTR(-ESRCH);
+	}
+
+	if (unlikely(engine->ops == &utrace_detached_ops)) {
+		rcu_read_unlock();
+		return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS);
+	}
+
+	utrace = &target->utrace;
+	if (unlikely(target->exit_state == EXIT_DEAD)) {
+		/*
+		 * If all engines detached already, utrace is clear.
+		 * Otherwise, we're called after utrace_release_task might
+		 * have started.  A call to this engine's report_reap
+		 * callback might already be in progress.
+		 */
+		utrace = ERR_PTR(-ESRCH);
+	} else {
+		spin_lock(&utrace->lock);
+		if (unlikely(!engine->ops) ||
+		    unlikely(engine->ops == &utrace_detached_ops)) {
+			/*
+			 * By the time we got the utrace lock,
+			 * it had been reaped or detached already.
+			 */
+			spin_unlock(&utrace->lock);
+			utrace = ERR_PTR(-ESRCH);
+			if (!attached && engine->ops == &utrace_detached_ops)
+				utrace = ERR_PTR(-ERESTARTSYS);
+		}
+	}
+	rcu_read_unlock();
+
+	return utrace;
+}
+
+/*
+ * Now that we don't hold any locks, run through any
+ * detached engines and free their references.  Each
+ * engine had one implicit ref while it was attached.
+ */
+static void put_detached_list(struct list_head *list)
+{
+	struct utrace_engine *engine, *next;
+	list_for_each_entry_safe(engine, next, list, entry) {
+		list_del_init(&engine->entry);
+		utrace_engine_put(engine);
+	}
+}
+
+/*
+ * Called with utrace->lock held.
+ * Notify and clean up all engines, then free utrace.
+ */
+static void utrace_reap(struct task_struct *target, struct utrace *utrace)
+	__releases(utrace->lock)
+{
+	struct utrace_engine *engine, *next;
+	const struct utrace_engine_ops *ops;
+	LIST_HEAD(detached);
+
+restart:
+	splice_attaching(utrace);
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		ops = engine->ops;
+		engine->ops = NULL;
+		list_move(&engine->entry, &detached);
+
+		/*
+		 * If it didn't need a callback, we don't need to drop
+		 * the lock.  Now nothing else refers to this engine.
+		 */
+		if (!(engine->flags & UTRACE_EVENT(REAP)))
+			continue;
+
+		/*
+		 * This synchronizes with utrace_barrier().  Since we
+		 * need the utrace->lock here anyway (unlike the other
+		 * reporting loops), we don't need any memory barrier
+		 * as utrace_barrier() holds the lock.
+		 */
+		utrace->reporting = engine;
+		spin_unlock(&utrace->lock);
+
+		(*ops->report_reap)(engine, target);
+
+		utrace->reporting = NULL;
+
+		put_detached_list(&detached);
+
+		spin_lock(&utrace->lock);
+		goto restart;
+	}
+
+	spin_unlock(&utrace->lock);
+
+	put_detached_list(&detached);
+}
+
+/*
+ * Called by release_task.  After this, target->utrace must be cleared.
+ */
+void utrace_release_task(struct task_struct *target)
+{
+	struct utrace *utrace;
+
+	utrace = &target->utrace;
+
+	spin_lock(&utrace->lock);
+
+	utrace->reap = 1;
+
+	if (!(target->utrace_flags & _UTRACE_DEATH_EVENTS)) {
+		utrace_reap(target, utrace); /* Unlocks and frees.  */
+		return;
+	}
+
+	/*
+	 * The target will do some final callbacks but hasn't
+	 * finished them yet.  We know because it clears these
+	 * event bits after it's done.  Instead of cleaning up here
+	 * and requiring utrace_report_death to cope with it, we
+	 * delay the REAP report and the teardown until after the
+	 * target finishes its death reports.
+	 */
+
+	spin_unlock(&utrace->lock);
+}
+
+/*
+ * We use an extra bit in utrace_engine.flags past the event bits,
+ * to record whether the engine is keeping the target thread stopped.
+ */
+#define ENGINE_STOP		(1UL << _UTRACE_NEVENTS)
+
+static void mark_engine_wants_stop(struct utrace_engine *engine)
+{
+	engine->flags |= ENGINE_STOP;
+}
+
+static void clear_engine_wants_stop(struct utrace_engine *engine)
+{
+	engine->flags &= ~ENGINE_STOP;
+}
+
+static bool engine_wants_stop(struct utrace_engine *engine)
+{
+	return (engine->flags & ENGINE_STOP) != 0;
+}
+
+/**
+ * utrace_set_events - choose which event reports a tracing engine gets
+ * @target:		thread to affect
+ * @engine:		attached engine to affect
+ * @events:		new event mask
+ *
+ * This changes the set of events for which @engine wants callbacks made.
+ *
+ * This fails with -%EALREADY and does nothing if you try to clear
+ * %UTRACE_EVENT(%DEATH) when the @report_death callback may already have
+ * begun, if you try to clear %UTRACE_EVENT(%REAP) when the @report_reap
+ * callback may already have begun, or if you try to newly set
+ * %UTRACE_EVENT(%DEATH) or %UTRACE_EVENT(%QUIESCE) when @target is
+ * already dead or dying.
+ *
+ * This can fail with -%ESRCH when @target has already been detached,
+ * including forcible detach on reaping.
+ *
+ * If @target was stopped before the call, then after a successful call,
+ * no event callbacks not requested in @events will be made; if
+ * %UTRACE_EVENT(%QUIESCE) is included in @events, then a @report_quiesce
+ * callback will be made when @target resumes.  If @target was not stopped,
+ * and was about to make a callback to @engine, this returns -%EINPROGRESS.
+ * In this case, the callback in progress might be one excluded from the
+ * new @events setting.  When this returns zero, you can be sure that no
+ * event callbacks you've disabled in @events can be made.
+ *
+ * To synchronize after an -%EINPROGRESS return, see utrace_barrier().
+ *
+ * When @target is @current, -%EINPROGRESS is not returned.  But
+ * note that a newly-created engine will not receive any callbacks
+ * related to an event notification already in progress.  This call
+ * enables @events callbacks to be made as soon as @engine becomes
+ * eligible for any callbacks, see utrace_attach_task().
+ *
+ * These rules provide for coherent synchronization based on %UTRACE_STOP,
+ * even when %SIGKILL is breaking its normal simple rules.
+ */
+int utrace_set_events(struct task_struct *target,
+		      struct utrace_engine *engine,
+		      unsigned long events)
+{
+	struct utrace *utrace;
+	unsigned long old_flags, old_utrace_flags, set_utrace_flags;
+	int ret;
+
+	utrace = get_utrace_lock(target, engine, true);
+	if (unlikely(IS_ERR(utrace)))
+		return PTR_ERR(utrace);
+
+	old_utrace_flags = target->utrace_flags;
+	set_utrace_flags = events;
+	old_flags = engine->flags;
+
+	if (target->exit_state &&
+	    (((events & ~old_flags) & _UTRACE_DEATH_EVENTS) ||
+	     (utrace->death &&
+	      ((old_flags & ~events) & _UTRACE_DEATH_EVENTS)) ||
+	     (utrace->reap && ((old_flags & ~events) & UTRACE_EVENT(REAP))))) {
+		spin_unlock(&utrace->lock);
+		return -EALREADY;
+	}
+
+	/*
+	 * When setting these flags, it's essential that we really
+	 * synchronize with exit_notify().  They cannot be set after
+	 * exit_notify() takes the tasklist_lock.  By holding the read
+	 * lock here while setting the flags, we ensure that the calls
+	 * to tracehook_notify_death() and tracehook_report_death() will
+	 * see the new flags.  This ensures that utrace_release_task()
+	 * knows positively that utrace_report_death() will be called or
+	 * that it won't.
+	 */
+	if ((set_utrace_flags & ~old_utrace_flags) & _UTRACE_DEATH_EVENTS) {
+		read_lock(&tasklist_lock);
+		if (unlikely(target->exit_state)) {
+			read_unlock(&tasklist_lock);
+			spin_unlock(&utrace->lock);
+			return -EALREADY;
+		}
+		target->utrace_flags |= set_utrace_flags;
+		read_unlock(&tasklist_lock);
+	}
+
+	engine->flags = events | (engine->flags & ENGINE_STOP);
+	target->utrace_flags |= set_utrace_flags;
+
+	if ((set_utrace_flags & UTRACE_EVENT_SYSCALL) &&
+	    !(old_utrace_flags & UTRACE_EVENT_SYSCALL))
+		set_tsk_thread_flag(target, TIF_SYSCALL_TRACE);
+
+	ret = 0;
+	if (!utrace->stopped && target != current) {
+		/*
+		 * This barrier ensures that our engine->flags changes
+		 * have hit before we examine utrace->reporting,
+		 * pairing with the barrier in start_callback().  If
+		 * @target has not yet hit finish_callback() to clear
+		 * utrace->reporting, we might be in the middle of a
+		 * callback to @engine.
+		 */
+		smp_mb();
+		if (utrace->reporting == engine)
+			ret = -EINPROGRESS;
+	}
+
+	spin_unlock(&utrace->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(utrace_set_events);
+
+/*
+ * Asynchronously mark an engine as being detached.
+ *
+ * This must work while the target thread races with us doing
+ * start_callback(), defined below.  It uses smp_rmb() between checking
+ * @engine->flags and using @engine->ops.  Here we change @engine->ops
+ * first, then use smp_wmb() before changing @engine->flags.  This ensures
+ * it can check the old flags before using the old ops, or check the old
+ * flags before using the new ops, or check the new flags before using the
+ * new ops, but can never check the new flags before using the old ops.
+ * Hence, utrace_detached_ops might be used with any old flags in place.
+ * It has report_quiesce() and report_reap() callbacks to handle all cases.
+ */
+static void mark_engine_detached(struct utrace_engine *engine)
+{
+	engine->ops = &utrace_detached_ops;
+	smp_wmb();
+	engine->flags = UTRACE_EVENT(QUIESCE);
+}
+
+/*
+ * Get @target to stop and return true if it is already stopped now.
+ * If we return false, it will make some event callback soonish.
+ * Called with @utrace locked.
+ */
+static bool utrace_do_stop(struct task_struct *target, struct utrace *utrace)
+{
+	bool stopped = false;
+
+	spin_lock_irq(&target->sighand->siglock);
+	if (unlikely(target->exit_state)) {
+		/*
+		 * On the exit path, it's only truly quiescent
+		 * if it has already been through
+		 * utrace_report_death(), or never will.
+		 */
+		if (!(target->utrace_flags & _UTRACE_DEATH_EVENTS))
+			utrace->stopped = stopped = true;
+	} else if (task_is_stopped(target)) {
+		/*
+		 * Stopped is considered quiescent; when it wakes up, it will
+		 * go through utrace_get_signal() before doing anything else.
+		 */
+		utrace->stopped = stopped = true;
+	} else if (!utrace->report && !utrace->interrupt) {
+		utrace->report = 1;
+		set_notify_resume(target);
+	}
+	spin_unlock_irq(&target->sighand->siglock);
+
+	return stopped;
+}
+
+/*
+ * If the target is not dead it should not be in tracing
+ * stop any more.  Wake it unless it's in job control stop.
+ *
+ * Called with @utrace->lock held and @utrace->stopped set.
+ */
+static void utrace_wakeup(struct task_struct *target, struct utrace *utrace)
+{
+	struct sighand_struct *sighand;
+	unsigned long irqflags;
+
+	utrace->stopped = 0;
+
+	sighand = lock_task_sighand(target, &irqflags);
+	if (unlikely(!sighand))
+		return;
+
+	if (likely(task_is_stopped_or_traced(target))) {
+		if (target->signal->flags & SIGNAL_STOP_STOPPED)
+			target->state = TASK_STOPPED;
+		else
+			wake_up_state(target, __TASK_STOPPED | __TASK_TRACED);
+	}
+
+	unlock_task_sighand(target, &irqflags);
+}
+
+/*
+ * This is called when there might be some detached engines on the list or
+ * some stale bits in @task->utrace_flags.  Clean them up and recompute the
+ * flags.
+ *
+ * @action is NULL when @task is stopped and @utrace->stopped is set; wake
+ * it up if it should not be.  @action is set when @task is current; if
+ * we're fully detached, reset *@action to UTRACE_RESUME.
+ *
+ * Called with @utrace->lock held, returns with it released.
+ * After this returns, @utrace might be freed if everything detached.
+ */
+static void utrace_reset(struct task_struct *task, struct utrace *utrace,
+			 enum utrace_resume_action *action)
+	__releases(utrace->lock)
+{
+	struct utrace_engine *engine, *next;
+	unsigned long flags = 0;
+	LIST_HEAD(detached);
+	bool wake = !action;
+	BUG_ON(wake != (task != current));
+
+	splice_attaching(utrace);
+
+	/*
+	 * Update the set of events of interest from the union
+	 * of the interests of the remaining tracing engines.
+	 * For any engine marked detached, remove it from the list.
+	 * We'll collect them on the detached list.
+	 */
+	list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
+		if (engine->ops == &utrace_detached_ops) {
+			engine->ops = NULL;
+			list_move(&engine->entry, &detached);
+		} else {
+			flags |= engine->flags | UTRACE_EVENT(REAP);
+			wake = wake && !engine_wants_stop(engine);
+		}
+	}
+
+	if (task->exit_state) {
+		/*
+		 * Once it's already dead, we never install any flags
+		 * except REAP.  When ->exit_state is set and events
+		 * like DEATH are not set, then they never can be set.
+		 * This ensures that utrace_release_task() knows
+		 * positively that utrace_report_death() can never run.
+		 */
+		BUG_ON(utrace->death);
+		flags &= UTRACE_EVENT(REAP);
+		wake = false;
+	} else if (!(flags & UTRACE_EVENT_SYSCALL) &&
+		   test_tsk_thread_flag(task, TIF_SYSCALL_TRACE)) {
+		clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE);
+	}
+
+	task->utrace_flags = flags;
+
+	if (wake)
+		utrace_wakeup(task, utrace);
+
+	/*
+	 * If any engines are left, we're done.
+	 */
+	spin_unlock(&utrace->lock);
+	if (!flags) {
+		/*
+		 * No more engines, cleared out the utrace.
+		 */
+
+		if (action)
+			*action = UTRACE_RESUME;
+	}
+
+	put_detached_list(&detached);
+}
+
+/*
+ * You can't do anything to a dead task but detach it.
+ * If release_task() has been called, you can't do that.
+ *
+ * On the exit path, DEATH and QUIESCE event bits are set only
+ * before utrace_report_death() has taken the lock.  At that point,
+ * the death report will come soon, so disallow detach until it's
+ * done.  This prevents us from racing with it detaching itself.
+ *
+ * Called with utrace->lock held, when @target->exit_state is nonzero.
+ */
+static inline int utrace_control_dead(struct task_struct *target,
+				      struct utrace *utrace,
+				      enum utrace_resume_action action)
+{
+	if (action != UTRACE_DETACH || unlikely(utrace->reap))
+		return -ESRCH;
+
+	if (unlikely(utrace->death))
+		/*
+		 * We have already started the death report.  We can't
+		 * prevent the report_death and report_reap callbacks,
+		 * so tell the caller they will happen.
+		 */
+		return -EALREADY;
+
+	return 0;
+}
+
+/**
+ * utrace_control - control a thread being traced by a tracing engine
+ * @target:		thread to affect
+ * @engine:		attached engine to affect
+ * @action:		&enum utrace_resume_action for thread to do
+ *
+ * This is how a tracing engine asks a traced thread to do something.
+ * This call is controlled by the @action argument, which has the
+ * same meaning as the &enum utrace_resume_action value returned by
+ * event reporting callbacks.
+ *
+ * If @target is already dead (@target->exit_state nonzero),
+ * all actions except %UTRACE_DETACH fail with -%ESRCH.
+ *
+ * The following sections describe each option for the @action argument.
+ *
+ * UTRACE_DETACH:
+ *
+ * After this, the @engine data structure is no longer accessible,
+ * and the thread might be reaped.  The thread will start running
+ * again if it was stopped and no longer has any attached engines
+ * that want it stopped.
+ *
+ * If the @report_reap callback may already have begun, this fails
+ * with -%ESRCH.  If the @report_death callback may already have
+ * begun, this fails with -%EALREADY.
+ *
+ * If @target is not already stopped, then a callback to this engine
+ * might be in progress or about to start on another CPU.  If so,
+ * then this returns -%EINPROGRESS; the detach happens as soon as
+ * the pending callback is finished.  To synchronize after an
+ * -%EINPROGRESS return, see utrace_barrier().
+ *
+ * If @target is properly stopped before utrace_control() is called,
+ * then after successful return it's guaranteed that no more callbacks
+ * to the @engine->ops vector will be made.
+ *
+ * The only exception is %SIGKILL (and exec or group-exit by another
+ * thread in the group), which can cause asynchronous @report_death
+ * and/or @report_reap callbacks even when %UTRACE_STOP was used.
+ * (In that event, this fails with -%ESRCH or -%EALREADY, see above.)
+ *
+ * UTRACE_STOP:
+ * This asks that @target stop running.  This returns 0 only if
+ * @target is already stopped, either for tracing or for job
+ * control.  Then @target will remain stopped until another
+ * utrace_control() call is made on @engine; @target can be woken
+ * only by %SIGKILL (or equivalent, such as exec or termination by
+ * another thread in the same thread group).
+ *
+ * This returns -%EINPROGRESS if @target is not already stopped.
+ * Then the effect is like %UTRACE_REPORT.  A @report_quiesce or
+ * @report_signal callback will be made soon.  Your callback can
+ * then return %UTRACE_STOP to keep @target stopped.
+ *
+ * This does not interrupt system calls in progress, including ones
+ * that sleep for a long time.  For that, use %UTRACE_INTERRUPT.
+ * To interrupt system calls and then keep @target stopped, your
+ * @report_signal callback can return %UTRACE_STOP.
+ *
+ * UTRACE_RESUME:
+ *
+ * Just let @target continue running normally, reversing the effect
+ * of a previous %UTRACE_STOP.  If another engine is keeping @target
+ * stopped, then it remains stopped until all engines let it resume.
+ * If @target was not stopped, this has no effect.
+ *
+ * UTRACE_REPORT:
+ *
+ * This is like %UTRACE_RESUME, but also ensures that there will be
+ * a @report_quiesce or @report_signal callback made soon.  If
+ * @target had been stopped, then there will be a callback before it
+ * resumes running normally.  If another engine is keeping @target
+ * stopped, then there might be no callbacks until all engines let
+ * it resume.
+ *
+ * UTRACE_INTERRUPT:
+ *
+ * This is like %UTRACE_REPORT, but ensures that @target will make a
+ * @report_signal callback before it resumes or delivers signals.
+ * If @target was in a system call or about to enter one, work in
+ * progress will be interrupted as if by %SIGSTOP.  If another
+ * engine is keeping @target stopped, then there might be no
+ * callbacks until all engines let it resume.
+ *
+ * This gives @engine an opportunity to introduce a forced signal
+ * disposition via its @report_signal callback.
+ *
+ * UTRACE_SINGLESTEP:
+ *
+ * It's invalid to use this unless arch_has_single_step() returned true.
+ * This is like %UTRACE_RESUME, but resumes for one user instruction
+ * only.  It's invalid to use this in utrace_control() unless @target
+ * had been stopped by @engine previously.
+ *
+ * Note that passing %UTRACE_SINGLESTEP or %UTRACE_BLOCKSTEP to
+ * utrace_control() or returning it from an event callback alone does
+ * not necessarily ensure that stepping will be enabled.  If there are
+ * more callbacks made to any engine before returning to user mode,
+ * then the resume action is chosen only by the last set of callbacks.
+ * To be sure, enable %UTRACE_EVENT(%QUIESCE) and look for the
+ * @report_quiesce callback with a zero event mask, or the
+ * @report_signal callback with %UTRACE_SIGNAL_REPORT.
+ *
+ * UTRACE_BLOCKSTEP:
+ *
+ * It's invalid to use this unless arch_has_block_step() returned true.
+ * This is like %UTRACE_SINGLESTEP, but resumes for one whole basic
+ * block of user instructions.
+ *
+ * %UTRACE_BLOCKSTEP devolves to %UTRACE_SINGLESTEP when another
+ * tracing engine is using %UTRACE_SINGLESTEP at the same time.
+ */
+int utrace_control(struct task_struct *target,
+		   struct utrace_engine *engine,
+		   enum utrace_resume_action action)
+{
+	struct utrace *utrace;
+	bool resume;
+	int ret;
+
+	if (unlikely(action > UTRACE_DETACH))
+		return -EINVAL;
+
+	utrace = get_utrace_lock(target, engine, true);
+	if (unlikely(IS_ERR(utrace)))
+		return PTR_ERR(utrace);
+
+	if (target->exit_state) {
+		ret = utrace_control_dead(target, utrace, action);
+		if (ret) {
+			spin_unlock(&utrace->lock);
+			return ret;
+		}
+	}
+
+	resume = utrace->stopped;
+	ret = 0;
+
+	clear_engine_wants_stop(engine);
+	switch (action) {
+	case UTRACE_STOP:
+		mark_engine_wants_stop(engine);
+		if (!resume && !utrace_do_stop(target, utrace))
+			ret = -EINPROGRESS;
+		resume = false;
+		break;
+
+	case UTRACE_DETACH:
+		mark_engine_detached(engine);
+		resume = resume || utrace_do_stop(target, utrace);
+		if (!resume) {
+			/*
+			 * As in utrace_set_events(), this barrier ensures
+			 * that our engine->flags changes have hit before we
+			 * examine utrace->reporting, pairing with the barrier
+			 * in start_callback().  If @target has not yet hit
+			 * finish_callback() to clear utrace->reporting, we
+			 * might be in the middle of a callback to @engine.
+			 */
+			smp_mb();
+			if (utrace->reporting == engine)
+				ret = -EINPROGRESS;
+			break;
+		}
+		/* Fall through.  */
+
+	case UTRACE_RESUME:
+		/*
+		 * This and all other cases imply resuming if stopped.
+		 * There might not be another report before it just
+		 * resumes, so make sure single-step is not left set.
+		 */
+		if (likely(resume))
+			user_disable_single_step(target);
+		break;
+
+	case UTRACE_REPORT:
+		/*
+		 * Make the thread call tracehook_notify_resume() soon.
+		 * But don't bother if it's already been interrupted.
+		 * In that case, utrace_get_signal() will be reporting soon.
+		 */
+		if (!utrace->report && !utrace->interrupt) {
+			utrace->report = 1;
+			set_notify_resume(target);
+		}
+		break;
+
+	case UTRACE_INTERRUPT:
+		/*
+		 * Make the thread call tracehook_get_signal() soon.
+		 */
+		if (utrace->interrupt)
+			break;
+		utrace->interrupt = 1;
+
+		/*
+		 * If it's not already stopped, interrupt it now.
+		 * We need the siglock here in case it calls
+		 * recalc_sigpending() and clears its own
+		 * TIF_SIGPENDING.  By taking the lock, we've
+		 * serialized any later recalc_sigpending() after
+		 * our setting of utrace->interrupt to force it on.
+		 */
+		if (resume) {
+			/*
+			 * This is really just to keep the invariant
+			 * that TIF_SIGPENDING is set with utrace->interrupt.
+			 * When it's stopped, we know it's always going
+			 * through utrace_get_signal and will recalculate.
+			 */
+			set_tsk_thread_flag(target, TIF_SIGPENDING);
+		} else {
+			struct sighand_struct *sighand;
+			unsigned long irqflags;
+			sighand = lock_task_sighand(target, &irqflags);
+			if (likely(sighand)) {
+				signal_wake_up(target, 0);
+				unlock_task_sighand(target, &irqflags);
+			}
+		}
+		break;
+
+	case UTRACE_BLOCKSTEP:
+		/*
+		 * Resume from stopped, step one block.
+		 */
+		if (unlikely(!arch_has_block_step())) {
+			WARN_ON(1);
+			/* Fall through to treat it as SINGLESTEP.  */
+		} else if (likely(resume)) {
+			user_enable_block_step(target);
+			break;
+		}
+
+	case UTRACE_SINGLESTEP:
+		/*
+		 * Resume from stopped, step one instruction.
+		 */
+		if (unlikely(!arch_has_single_step())) {
+			WARN_ON(1);
+			resume = false;
+			ret = -EOPNOTSUPP;
+			break;
+		}
+
+		if (likely(resume))
+			user_enable_single_step(target);
+		else
+			/*
+			 * You were supposed to stop it before asking
+			 * it to step.
+			 */
+			ret = -EAGAIN;
+		break;
+	}
+
+	/*
+	 * Let the thread resume running.  If it's not stopped now,
+	 * there is nothing more we need to do.
+	 */
+	if (resume)
+		utrace_reset(target, utrace, NULL);
+	else
+		spin_unlock(&utrace->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(utrace_control);
+
+/**
+ * utrace_barrier - synchronize with simultaneous tracing callbacks
+ * @target:		thread to affect
+ * @engine:		engine to affect (can be detached)
+ *
+ * This blocks while @target might be in the midst of making a callback to
+ * @engine.  It can be interrupted by signals and will return -%ERESTARTSYS.
+ * A return value of zero means no callback from @target to @engine was
+ * in progress.  Any effect of its return value (such as %UTRACE_STOP) has
+ * already been applied to @engine.
+ *
+ * It's not necessary to keep the @target pointer alive for this call.
+ * It's only necessary to hold a ref on @engine.  This will return
+ * safely even if @target has been reaped and has no task refs.
+ *
+ * A successful return from utrace_barrier() guarantees its ordering
+ * with respect to utrace_set_events() and utrace_control() calls.  If
+ * @target was not properly stopped, event callbacks just disabled might
+ * still be in progress; utrace_barrier() waits until there is no chance
+ * an unwanted callback can be in progress.
+ */
+int utrace_barrier(struct task_struct *target, struct utrace_engine *engine)
+{
+	struct utrace *utrace;
+	int ret = -ERESTARTSYS;
+
+	if (unlikely(target == current))
+		return 0;
+
+	do {
+		utrace = get_utrace_lock(target, engine, false);
+		if (unlikely(IS_ERR(utrace))) {
+			ret = PTR_ERR(utrace);
+			if (ret != -ERESTARTSYS)
+				break;
+		} else {
+			/*
+			 * All engine state changes are done while
+			 * holding the lock, i.e. before we get here.
+			 * Since we have the lock, we only need to
+			 * worry about @target making a callback.
+			 * When it has entered start_callback() but
+			 * not yet gotten to finish_callback(), we
+			 * will see utrace->reporting == @engine.
+			 * When @target doesn't take the lock, it uses
+			 * barriers to order setting utrace->reporting
+			 * before it examines the engine state.
+			 */
+			if (utrace->reporting != engine)
+				ret = 0;
+			spin_unlock(&utrace->lock);
+			if (!ret)
+				break;
+		}
+		schedule_timeout_interruptible(1);
+	} while (!signal_pending(current));
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(utrace_barrier);
+
+/*
+ * This is local state used for reporting loops, perhaps optimized away.
+ */
+struct utrace_report {
+	enum utrace_resume_action action;
+	u32 result;
+	bool detaches;
+	bool reports;
+	bool takers;
+	bool killed;
+};
+
+#define INIT_REPORT(var) \
+	struct utrace_report var = { UTRACE_RESUME, 0, \
+				     false, false, false, false }
+
+/*
+ * We are now making the report, so clear the flag saying we need one.
+ */
+static void start_report(struct utrace *utrace)
+{
+	BUG_ON(utrace->stopped);
+	if (utrace->report) {
+		spin_lock(&utrace->lock);
+		utrace->report = 0;
+		splice_attaching(utrace);
+		spin_unlock(&utrace->lock);
+	}
+}
+
+/*
+ * Complete a normal reporting pass, pairing with a start_report() call.
+ * This handles any UTRACE_DETACH or UTRACE_REPORT or UTRACE_INTERRUPT
+ * returns from engine callbacks.  If any engine's last callback used
+ * UTRACE_STOP, we do UTRACE_REPORT here to ensure we stop before user
+ * mode.  If there were no callbacks made, it will recompute
+ * @task->utrace_flags to avoid another false-positive.
+ */
+static void finish_report(struct utrace_report *report,
+			  struct task_struct *task, struct utrace *utrace)
+{
+	bool clean = (report->takers && !report->detaches);
+
+	if (report->action <= UTRACE_REPORT && !utrace->report) {
+		spin_lock(&utrace->lock);
+		utrace->report = 1;
+		set_tsk_thread_flag(task, TIF_NOTIFY_RESUME);
+	} else if (report->action == UTRACE_INTERRUPT && !utrace->interrupt) {
+		spin_lock(&utrace->lock);
+		utrace->interrupt = 1;
+		set_tsk_thread_flag(task, TIF_SIGPENDING);
+	} else if (clean) {
+		return;
+	} else {
+		spin_lock(&utrace->lock);
+	}
+
+	if (clean)
+		spin_unlock(&utrace->lock);
+	else
+		utrace_reset(task, utrace, &report->action);
+}
+
+/*
+ * Apply the return value of one engine callback to @report.
+ * Returns true if @engine detached and should not get any more callbacks.
+ */
+static bool finish_callback(struct utrace *utrace,
+			    struct utrace_report *report,
+			    struct utrace_engine *engine,
+			    u32 ret)
+{
+	enum utrace_resume_action action = utrace_resume_action(ret);
+
+	report->result = ret & ~UTRACE_RESUME_MASK;
+
+	/*
+	 * If utrace_control() was used, treat that like UTRACE_DETACH here.
+	 */
+	if (action == UTRACE_DETACH || engine->ops == &utrace_detached_ops) {
+		engine->ops = &utrace_detached_ops;
+		report->detaches = true;
+	} else {
+		if (action < report->action)
+			report->action = action;
+
+		if (action == UTRACE_STOP) {
+			if (!engine_wants_stop(engine)) {
+				spin_lock(&utrace->lock);
+				mark_engine_wants_stop(engine);
+				spin_unlock(&utrace->lock);
+			}
+		} else {
+			if (action == UTRACE_REPORT)
+				report->reports = true;
+
+			if (engine_wants_stop(engine)) {
+				spin_lock(&utrace->lock);
+				clear_engine_wants_stop(engine);
+				spin_unlock(&utrace->lock);
+			}
+		}
+	}
+
+	/*
+	 * Now that we have applied the effect of the return value,
+	 * clear this so that utrace_barrier() can stop waiting.
+	 * A subsequent utrace_control() can stop or resume @engine
+	 * and know this was ordered after its callback's action.
+	 *
+	 * We don't need any barriers here because utrace_barrier()
+	 * takes utrace->lock.  If we touched engine->flags above,
+	 * the lock guaranteed this change was before utrace_barrier()
+	 * examined utrace->reporting.
+	 */
+	utrace->reporting = NULL;
+
+	/*
+	 * This is a good place to make sure tracing engines don't
+	 * introduce too much latency under voluntary preemption.
+	 */
+	if (need_resched())
+		cond_resched();
+
+	return engine->ops == &utrace_detached_ops;
+}
+
+/*
+ * Start the callbacks for @engine to consider @event (a bit mask).
+ * This makes the report_quiesce() callback first.  If @engine wants
+ * a specific callback for @event, we return the ops vector to use.
+ * If not, we return NULL.  The return value from the ops->callback
+ * function called should be passed to finish_callback().
+ */
+static const struct utrace_engine_ops *start_callback(
+	struct utrace *utrace, struct utrace_report *report,
+	struct utrace_engine *engine, struct task_struct *task,
+	unsigned long event)
+{
+	const struct utrace_engine_ops *ops;
+	unsigned long want;
+
+	/*
+	 * This barrier ensures that we've set utrace->reporting before
+	 * we examine engine->flags or engine->ops.  utrace_barrier()
+	 * relies on this ordering to indicate that the effect of any
+	 * utrace_control() and utrace_set_events() calls is in place
+	 * by the time utrace->reporting can be seen to be NULL.
+	 */
+	utrace->reporting = engine;
+	smp_mb();
+
+	/*
+	 * This pairs with the barrier in mark_engine_detached().
+	 * It makes sure that we never see the old ops vector with
+	 * the new flags, in case the original vector had no report_quiesce.
+	 */
+	want = engine->flags;
+	smp_rmb();
+	ops = engine->ops;
+
+	if (want & UTRACE_EVENT(QUIESCE)) {
+		if (finish_callback(utrace, report, engine,
+				    (*ops->report_quiesce)(report->action,
+							   engine, task,
+							   event)))
+			return NULL;
+
+		/*
+		 * finish_callback() reset utrace->reporting after the
+		 * quiesce callback.  Now we set it again (as above)
+		 * before re-examining engine->flags, which could have
+		 * been changed synchronously by ->report_quiesce or
+		 * asynchronously by utrace_control() or utrace_set_events().
+		 */
+		utrace->reporting = engine;
+		smp_mb();
+		want = engine->flags;
+	}
+
+	if (want & ENGINE_STOP)
+		report->action = UTRACE_STOP;
+
+	if (want & event) {
+		report->takers = true;
+		return ops;
+	}
+
+	return NULL;
+}
+
+/*
+ * Do a normal reporting pass for engines interested in @event.
+ * @callback is the name of the member in the ops vector, and remaining
+ * args are the extras it takes after the standard three args.
+ */
+#define REPORT(task, utrace, report, event, callback, ...)		      \
+	do {								      \
+		start_report(utrace);					      \
+		REPORT_CALLBACKS(task, utrace, report, event, callback,	      \
+				 (report)->action, engine, current,	      \
+				 ## __VA_ARGS__);  	   		      \
+		finish_report(report, task, utrace);			      \
+	} while (0)
+#define REPORT_CALLBACKS(task, utrace, report, event, callback, ...)	      \
+	do {								      \
+		struct utrace_engine *engine;				      \
+		const struct utrace_engine_ops *ops;			      \
+		list_for_each_entry(engine, &utrace->attached, entry) {	      \
+			ops = start_callback(utrace, report, engine, task,    \
+					     event);			      \
+			if (!ops)					      \
+				continue;				      \
+			finish_callback(utrace, report, engine,		      \
+					(*ops->callback)(__VA_ARGS__));	      \
+		}							      \
+	} while (0)
+
+/*
+ * Called iff UTRACE_EVENT(EXEC) flag is set.
+ */
+void utrace_report_exec(struct linux_binfmt *fmt, struct linux_binprm *bprm,
+			struct pt_regs *regs)
+{
+	struct task_struct *task = current;
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+
+	REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
+	       report_exec, fmt, bprm, regs);
+}
+
+/*
+ * Called iff UTRACE_EVENT(SYSCALL_ENTRY) flag is set.
+ * Return true to prevent the system call.
+ */
+bool utrace_report_syscall_entry(struct pt_regs *regs)
+{
+	struct task_struct *task = current;
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+
+	start_report(utrace);
+	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(SYSCALL_ENTRY),
+			 report_syscall_entry, report.result | report.action,
+			 engine, current, regs);
+	finish_report(&report, task, utrace);
+
+	if (report.action == UTRACE_STOP &&
+	    unlikely(utrace_stop(task, utrace, false)))
+		/*
+		 * We are continuing despite UTRACE_STOP because of a
+		 * SIGKILL.  Don't let the system call actually proceed.
+		 */
+		return true;
+
+	if (unlikely(report.result == UTRACE_SYSCALL_ABORT))
+		return true;
+
+	if (signal_pending(task)) {
+		/*
+		 * Clear TIF_SIGPENDING if it no longer needs to be set.
+		 * It may have been set as part of quiescence, and won't
+		 * ever have been cleared by another thread.  For other
+		 * reports, we can just leave it set and will go through
+		 * utrace_get_signal() to reset things.  But here we are
+		 * about to enter a syscall, which might bail out with an
+		 * -ERESTART* error if it's set now.
+		 */
+		spin_lock_irq(&task->sighand->siglock);
+		recalc_sigpending();
+		spin_unlock_irq(&task->sighand->siglock);
+	}
+
+	return false;
+}
+
+/*
+ * Called iff UTRACE_EVENT(SYSCALL_EXIT) flag is set.
+ */
+void utrace_report_syscall_exit(struct pt_regs *regs)
+{
+	struct task_struct *task = current;
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+
+	REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
+	       report_syscall_exit, regs);
+}
+
+/*
+ * Called iff UTRACE_EVENT(CLONE) flag is set.
+ * This notification call blocks the wake_up_new_task call on the child.
+ * So we must not quiesce here.  tracehook_report_clone_complete will do
+ * a quiescence check momentarily.
+ */
+void utrace_report_clone(unsigned long clone_flags, struct task_struct *child)
+{
+	struct task_struct *task = current;
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+
+	/*
+	 * We don't use the REPORT() macro here, because we need
+	 * to clear utrace->cloning before finish_report().
+	 * After finish_report(), utrace can be a stale pointer
+	 * in cases when report.action is still UTRACE_RESUME.
+	 */
+	start_report(utrace);
+	utrace->cloning = child;
+
+	REPORT_CALLBACKS(task, utrace, &report,
+			 UTRACE_EVENT(CLONE), report_clone,
+			 report.action, engine, task, clone_flags, child);
+
+	utrace->cloning = NULL;
+	finish_report(&report, task, utrace);
+
+	/*
+	 * For a vfork, we will go into an uninterruptible block waiting
+	 * for the child.  We need UTRACE_STOP to happen before this, not
+	 * after.  For CLONE_VFORK, utrace_finish_vfork() will be called.
+	 */
+	if (report.action == UTRACE_STOP && (clone_flags & CLONE_VFORK)) {
+		spin_lock(&utrace->lock);
+		utrace->vfork_stop = 1;
+		spin_unlock(&utrace->lock);
+	}
+}
+
+/*
+ * We're called after utrace_report_clone() for a CLONE_VFORK.
+ * If UTRACE_STOP was left from the clone report, we stop here.
+ * After this, we'll enter the uninterruptible wait_for_completion()
+ * waiting for the child.
+ */
+void utrace_finish_vfork(struct task_struct *task)
+{
+	struct utrace *utrace = task_utrace_struct(task);
+
+	spin_lock(&utrace->lock);
+	if (!utrace->vfork_stop)
+		spin_unlock(&utrace->lock);
+	else {
+		utrace->vfork_stop = 0;
+		spin_unlock(&utrace->lock);
+		utrace_stop(task, utrace, false);
+	}
+}
+
+/*
+ * Called iff UTRACE_EVENT(JCTL) flag is set.
+ *
+ * Called with siglock held.
+ */
+void utrace_report_jctl(int notify, int what)
+{
+	struct task_struct *task = current;
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+	bool stop = task_is_stopped(task);
+
+	/*
+	 * We have to come out of TASK_STOPPED in case the event report
+	 * hooks might block.  Since we held the siglock throughout, it's
+	 * as if we were never in TASK_STOPPED yet at all.
+	 */
+	if (stop) {
+		__set_current_state(TASK_RUNNING);
+		task->signal->flags &= ~SIGNAL_STOP_STOPPED;
+		++task->signal->group_stop_count;
+	}
+	spin_unlock_irq(&task->sighand->siglock);
+
+	/*
+	 * We get here with CLD_STOPPED when we've just entered
+	 * TASK_STOPPED, or with CLD_CONTINUED when we've just come
+	 * out but not yet been through utrace_get_signal() again.
+	 *
+	 * While in TASK_STOPPED, we can be considered safely
+	 * stopped by utrace_do_stop() and detached asynchronously.
+	 * If we woke up and checked task->utrace_flags before that
+	 * was finished, we might be here with utrace already
+	 * removed or in the middle of being removed.
+	 *
+	 * If we are indeed attached, then make sure we are no
+	 * longer considered stopped while we run callbacks.
+	 */
+	spin_lock(&utrace->lock);
+	utrace->stopped = 0;
+	/*
+	 * Do start_report()'s work too since we already have the lock anyway.
+	 */
+	utrace->report = 0;
+	splice_attaching(utrace);
+	spin_unlock(&utrace->lock);
+
+	REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
+	       report_jctl, what, notify);
+
+	/*
+	 * Retake the lock, and go back into TASK_STOPPED
+	 * unless the stop was just cleared.
+	 */
+	spin_lock_irq(&task->sighand->siglock);
+	if (stop && task->signal->group_stop_count > 0) {
+		__set_current_state(TASK_STOPPED);
+		if (--task->signal->group_stop_count == 0)
+			task->signal->flags |= SIGNAL_STOP_STOPPED;
+	}
+}
+
+/*
+ * Called iff UTRACE_EVENT(EXIT) flag is set.
+ */
+void utrace_report_exit(long *exit_code)
+{
+	struct task_struct *task = current;
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+	long orig_code = *exit_code;
+
+	REPORT(task, utrace, &report, UTRACE_EVENT(EXIT),
+	       report_exit, orig_code, exit_code);
+
+	if (report.action == UTRACE_STOP)
+		utrace_stop(task, utrace, false);
+}
+
+/*
+ * Called iff UTRACE_EVENT(DEATH) or UTRACE_EVENT(QUIESCE) flag is set.
+ *
+ * It is always possible that we are racing with utrace_release_task here.
+ * For this reason, utrace_release_task checks for the event bits that get
+ * us here, and delays its cleanup for us to do.
+ */
+void utrace_report_death(struct task_struct *task, struct utrace *utrace,
+			 bool group_dead, int signal)
+{
+	INIT_REPORT(report);
+
+	BUG_ON(!task->exit_state);
+
+	/*
+	 * We are presently considered "quiescent"--which is accurate
+	 * inasmuch as we won't run any more user instructions ever again.
+	 * But for utrace_control and utrace_set_events to be robust, they
+	 * must be sure whether or not we will run any more callbacks.  If
+	 * a call comes in before we do, taking the lock here synchronizes
+	 * us so we don't run any callbacks just disabled.  Calls that come
+	 * in while we're running the callbacks will see the exit.death
+	 * flag and know that we are not yet fully quiescent for purposes
+	 * of detach bookkeeping.
+	 */
+	spin_lock(&utrace->lock);
+	BUG_ON(utrace->death);
+	utrace->death = 1;
+	utrace->report = 0;
+	utrace->interrupt = 0;
+	spin_unlock(&utrace->lock);
+
+	REPORT_CALLBACKS(task, utrace, &report, UTRACE_EVENT(DEATH),
+			 report_death, engine, task, group_dead, signal);
+
+	spin_lock(&utrace->lock);
+
+	/*
+	 * After we unlock (possibly inside utrace_reap for callbacks) with
+	 * this flag clear, competing utrace_control/utrace_set_events calls
+	 * know that we've finished our callbacks and any detach bookkeeping.
+	 */
+	utrace->death = 0;
+
+	if (utrace->reap)
+		/*
+		 * utrace_release_task() was already called in parallel.
+		 * We must complete its work now.
+		 */
+		utrace_reap(task, utrace);
+	else
+		utrace_reset(task, utrace, &report.action);
+}
+
+/*
+ * Finish the last reporting pass before returning to user mode.
+ */
+static void finish_resume_report(struct utrace_report *report,
+				 struct task_struct *task,
+				 struct utrace *utrace)
+{
+	if (report->detaches || !report->takers) {
+		spin_lock(&utrace->lock);
+		utrace_reset(task, utrace, &report->action);
+	}
+
+	switch (report->action) {
+	case UTRACE_STOP:
+		report->killed = utrace_stop(task, utrace, report->reports);
+		break;
+
+	case UTRACE_INTERRUPT:
+		if (!signal_pending(task))
+			set_tsk_thread_flag(task, TIF_SIGPENDING);
+		break;
+
+	case UTRACE_SINGLESTEP:
+		user_enable_single_step(task);
+		break;
+
+	case UTRACE_BLOCKSTEP:
+		user_enable_block_step(task);
+		break;
+
+	case UTRACE_REPORT:
+	case UTRACE_RESUME:
+	default:
+		user_disable_single_step(task);
+		break;
+	}
+}
+
+/*
+ * This is called when TIF_NOTIFY_RESUME had been set (and is now clear).
+ * We are close to user mode, and this is the place to report or stop.
+ * When we return, we're going to user mode or into the signals code.
+ */
+void utrace_resume(struct task_struct *task, struct pt_regs *regs)
+{
+	struct utrace *utrace = task_utrace_struct(task);
+	INIT_REPORT(report);
+	struct utrace_engine *engine;
+
+	/*
+	 * Some machines get here with interrupts disabled.  The same arch
+	 * code path leads to calling into get_signal_to_deliver(), which
+	 * implicitly reenables them by virtue of spin_unlock_irq.
+	 */
+	local_irq_enable();
+
+	/*
+	 * If this flag is still set it's because there was a signal
+	 * handler setup done but no report_signal following it.  Clear
+	 * the flag before we get to user so it doesn't confuse us later.
+	 */
+	if (unlikely(utrace->signal_handler)) {
+		int skip;
+		spin_lock(&utrace->lock);
+		utrace->signal_handler = 0;
+		skip = !utrace->report;
+		spin_unlock(&utrace->lock);
+		if (skip)
+			return;
+	}
+
+	/*
+	 * If UTRACE_INTERRUPT was just used, we don't bother with a
+	 * report here.  We will report and stop in utrace_get_signal().
+	 */
+	if (unlikely(utrace->interrupt))
+		return;
+
+	/*
+	 * Do a simple reporting pass, with no callback after report_quiesce.
+	 */
+	start_report(utrace);
+
+	list_for_each_entry(engine, &utrace->attached, entry)
+		start_callback(utrace, &report, engine, task, 0);
+
+	/*
+	 * Finish the report and either stop or get ready to resume.
+	 */
+	finish_resume_report(&report, task, utrace);
+}
+
+/*
+ * Return true if current has forced signal_pending().
+ *
+ * This is called only when current->utrace_flags is nonzero, so we know
+ * that current->utrace must be set.  It's not inlined in tracehook.h
+ * just so that struct utrace can stay opaque outside this file.
+ */
+bool utrace_interrupt_pending(void)
+{
+	return task_utrace_struct(current)->interrupt;
+}
+
+/*
+ * Take the siglock and push @info back on our queue.
+ * Returns with @task->sighand->siglock held.
+ */
+static void push_back_signal(struct task_struct *task, siginfo_t *info)
+	__acquires(task->sighand->siglock)
+{
+	struct sigqueue *q;
+
+	if (unlikely(!info->si_signo)) { /* Oh, a wise guy! */
+		spin_lock_irq(&task->sighand->siglock);
+		return;
+	}
+
+	q = sigqueue_alloc();
+	if (likely(q)) {
+		q->flags = 0;
+		copy_siginfo(&q->info, info);
+	}
+
+	spin_lock_irq(&task->sighand->siglock);
+
+	sigaddset(&task->pending.signal, info->si_signo);
+	if (likely(q))
+		list_add(&q->list, &task->pending.list);
+
+	set_tsk_thread_flag(task, TIF_SIGPENDING);
+}
+
+/*
+ * This is the hook from the signals code, called with the siglock held.
+ * Here is the ideal place to stop.  We also dequeue and intercept signals.
+ */
+int utrace_get_signal(struct task_struct *task, struct pt_regs *regs,
+		      siginfo_t *info, struct k_sigaction *return_ka)
+	__releases(task->sighand->siglock)
+	__acquires(task->sighand->siglock)
+{
+	struct utrace *utrace;
+	struct k_sigaction *ka;
+	INIT_REPORT(report);
+	struct utrace_engine *engine;
+	const struct utrace_engine_ops *ops;
+	unsigned long event, want;
+	u32 ret;
+	int signr;
+
+	utrace = &task->utrace;
+	if (utrace->interrupt || utrace->report || utrace->signal_handler) {
+		/*
+		 * We've been asked for an explicit report before we
+		 * even check for pending signals.
+		 */
+
+		spin_unlock_irq(&task->sighand->siglock);
+
+		spin_lock(&utrace->lock);
+
+		splice_attaching(utrace);
+
+		if (unlikely(!utrace->interrupt) && unlikely(!utrace->report))
+			report.result = UTRACE_SIGNAL_IGN;
+		else if (utrace->signal_handler)
+			report.result = UTRACE_SIGNAL_HANDLER;
+		else
+			report.result = UTRACE_SIGNAL_REPORT;
+
+		/*
+		 * We are now making the report and it's on the
+		 * interrupt path, so clear the flags asking for those.
+		 */
+		utrace->interrupt = utrace->report = utrace->signal_handler = 0;
+		utrace->stopped = 0;
+
+		/*
+		 * Make sure signal_pending() only returns true
+		 * if there are real signals pending.
+		 */
+		if (signal_pending(task)) {
+			spin_lock_irq(&task->sighand->siglock);
+			recalc_sigpending();
+			spin_unlock_irq(&task->sighand->siglock);
+		}
+
+		spin_unlock(&utrace->lock);
+
+		if (unlikely(report.result == UTRACE_SIGNAL_IGN))
+			/*
+			 * We only got here to clear utrace->signal_handler.
+			 */
+			return -1;
+
+		/*
+		 * Do a reporting pass for no signal, just for EVENT(QUIESCE).
+		 * The engine callbacks can fill in *info and *return_ka.
+		 * We'll pass NULL for the @orig_ka argument to indicate
+		 * that there was no original signal.
+		 */
+		event = 0;
+		ka = NULL;
+		memset(return_ka, 0, sizeof *return_ka);
+	} else if ((task->utrace_flags & UTRACE_EVENT_SIGNAL_ALL) == 0 &&
+		   !utrace->stopped) {
+		/*
+		 * If no engine is interested in intercepting signals,
+		 * let the caller just dequeue them normally.
+		 */
+		return 0;
+	} else {
+		if (unlikely(utrace->stopped)) {
+			spin_unlock_irq(&task->sighand->siglock);
+			spin_lock(&utrace->lock);
+			utrace->stopped = 0;
+			spin_unlock(&utrace->lock);
+			spin_lock_irq(&task->sighand->siglock);
+		}
+
+		/*
+		 * Steal the next signal so we can let tracing engines
+		 * examine it.  From the signal number and sigaction,
+		 * determine what normal delivery would do.  If no
+		 * engine perturbs it, we'll do that by returning the
+		 * signal number after setting *return_ka.
+		 */
+		signr = dequeue_signal(task, &task->blocked, info);
+		if (signr == 0)
+			return signr;
+		BUG_ON(signr != info->si_signo);
+
+		ka = &task->sighand->action[signr - 1];
+		*return_ka = *ka;
+
+		/*
+		 * We are never allowed to interfere with SIGKILL.
+		 * Just punt after filling in *return_ka for our caller.
+		 */
+		if (signr == SIGKILL)
+			return signr;
+
+		if (ka->sa.sa_handler == SIG_IGN) {
+			event = UTRACE_EVENT(SIGNAL_IGN);
+			report.result = UTRACE_SIGNAL_IGN;
+		} else if (ka->sa.sa_handler != SIG_DFL) {
+			event = UTRACE_EVENT(SIGNAL);
+			report.result = UTRACE_SIGNAL_DELIVER;
+		} else if (sig_kernel_coredump(signr)) {
+			event = UTRACE_EVENT(SIGNAL_CORE);
+			report.result = UTRACE_SIGNAL_CORE;
+		} else if (sig_kernel_ignore(signr)) {
+			event = UTRACE_EVENT(SIGNAL_IGN);
+			report.result = UTRACE_SIGNAL_IGN;
+		} else if (signr == SIGSTOP) {
+			event = UTRACE_EVENT(SIGNAL_STOP);
+			report.result = UTRACE_SIGNAL_STOP;
+		} else if (sig_kernel_stop(signr)) {
+			event = UTRACE_EVENT(SIGNAL_STOP);
+			report.result = UTRACE_SIGNAL_TSTP;
+		} else {
+			event = UTRACE_EVENT(SIGNAL_TERM);
+			report.result = UTRACE_SIGNAL_TERM;
+		}
+
+		/*
+		 * Now that we know what event type this signal is, we
+		 * can short-circuit if no engines care about those.
+		 */
+		if ((task->utrace_flags & (event | UTRACE_EVENT(QUIESCE))) == 0)
+			return signr;
+
+		/*
+		 * We have some interested engines, so tell them about
+		 * the signal and let them change its disposition.
+		 */
+		spin_unlock_irq(&task->sighand->siglock);
+	}
+
+	/*
+	 * This reporting pass chooses what signal disposition we'll act on.
+	 */
+	list_for_each_entry(engine, &utrace->attached, entry) {
+		/*
+		 * See start_callback() comment about this barrier.
+		 */
+		utrace->reporting = engine;
+		smp_mb();
+
+		/*
+		 * This pairs with the barrier in mark_engine_detached(),
+		 * see start_callback() comments.
+		 */
+		want = engine->flags;
+		smp_rmb();
+		ops = engine->ops;
+
+		if ((want & (event | UTRACE_EVENT(QUIESCE))) == 0) {
+			utrace->reporting = NULL;
+			continue;
+		}
+
+		if (ops->report_signal)
+			ret = (*ops->report_signal)(
+				report.result | report.action, engine, task,
+				regs, info, ka, return_ka);
+		else
+			ret = (report.result | (*ops->report_quiesce)(
+				       report.action, engine, task, event));
+
+		/*
+		 * Avoid a tight loop reporting again and again if some
+		 * engine is too stupid.
+		 */
+		switch (utrace_resume_action(ret)) {
+		default:
+			break;
+		case UTRACE_INTERRUPT:
+		case UTRACE_REPORT:
+			ret = (ret & ~UTRACE_RESUME_MASK) | UTRACE_RESUME;
+			break;
+		}
+
+		finish_callback(utrace, &report, engine, ret);
+	}
+
+	/*
+	 * We express the chosen action to the signals code in terms
+	 * of a representative signal whose default action does it.
+	 * Our caller uses our return value (signr) to decide what to
+	 * do, but uses info->si_signo as the signal number to report.
+	 */
+	switch (utrace_signal_action(report.result)) {
+	case UTRACE_SIGNAL_TERM:
+		signr = SIGTERM;
+		break;
+
+	case UTRACE_SIGNAL_CORE:
+		signr = SIGQUIT;
+		break;
+
+	case UTRACE_SIGNAL_STOP:
+		signr = SIGSTOP;
+		break;
+
+	case UTRACE_SIGNAL_TSTP:
+		signr = SIGTSTP;
+		break;
+
+	case UTRACE_SIGNAL_DELIVER:
+		signr = info->si_signo;
+
+		if (return_ka->sa.sa_handler == SIG_DFL) {
+			/*
+			 * We'll do signr's normal default action.
+			 * For ignore, we'll fall through below.
+			 * For stop/death, break locks and returns it.
+			 */
+			if (likely(signr) && !sig_kernel_ignore(signr))
+				break;
+		} else if (return_ka->sa.sa_handler != SIG_IGN &&
+			   likely(signr)) {
+			/*
+			 * Complete the bookkeeping after the report.
+			 * The handler will run.  If an engine wanted to
+			 * stop or step, then make sure we do another
+			 * report after signal handler setup.
+			 */
+			if (report.action != UTRACE_RESUME)
+				report.action = UTRACE_INTERRUPT;
+			finish_report(&report, task, utrace);
+
+			if (unlikely(report.result & UTRACE_SIGNAL_HOLD))
+				push_back_signal(task, info);
+			else
+				spin_lock_irq(&task->sighand->siglock);
+
+			/*
+			 * We do the SA_ONESHOT work here since the
+			 * normal path will only touch *return_ka now.
+			 */
+			if (unlikely(return_ka->sa.sa_flags & SA_ONESHOT)) {
+				return_ka->sa.sa_flags &= ~SA_ONESHOT;
+				if (likely(valid_signal(signr))) {
+					ka = &task->sighand->action[signr - 1];
+					ka->sa.sa_handler = SIG_DFL;
+				}
+			}
+
+			return signr;
+		}
+
+		/* Fall through for an ignored signal.  */
+
+	case UTRACE_SIGNAL_IGN:
+	case UTRACE_SIGNAL_REPORT:
+	default:
+		/*
+		 * If the signal is being ignored, then we are on the way
+		 * directly back to user mode.  We can stop here, or step,
+		 * as in utrace_resume(), above.  After we've dealt with that,
+		 * our caller will relock and come back through here.
+		 */
+		finish_resume_report(&report, task, utrace);
+
+		if (unlikely(report.killed)) {
+			/*
+			 * The only reason we woke up now was because of a
+			 * SIGKILL.  Don't do normal dequeuing in case it
+			 * might get a signal other than SIGKILL.  That would
+			 * perturb the death state so it might differ from
+			 * what the debugger would have allowed to happen.
+			 * Instead, pluck out just the SIGKILL to be sure
+			 * we'll die immediately with nothing else different
+			 * from the quiescent state the debugger wanted us in.
+			 */
+			sigset_t sigkill_only;
+			siginitsetinv(&sigkill_only, sigmask(SIGKILL));
+			spin_lock_irq(&task->sighand->siglock);
+			signr = dequeue_signal(task, &sigkill_only, info);
+			BUG_ON(signr != SIGKILL);
+			*return_ka = task->sighand->action[SIGKILL - 1];
+			return signr;
+		}
+
+		if (unlikely(report.result & UTRACE_SIGNAL_HOLD)) {
+			push_back_signal(task, info);
+			spin_unlock_irq(&task->sighand->siglock);
+		}
+
+		return -1;
+	}
+
+	/*
+	 * Complete the bookkeeping after the report.
+	 * This sets utrace->report if UTRACE_STOP was used.
+	 */
+	finish_report(&report, task, utrace);
+
+	return_ka->sa.sa_handler = SIG_DFL;
+
+	if (unlikely(report.result & UTRACE_SIGNAL_HOLD))
+		push_back_signal(task, info);
+	else
+		spin_lock_irq(&task->sighand->siglock);
+
+	if (sig_kernel_stop(signr))
+		task->signal->flags |= SIGNAL_STOP_DEQUEUED;
+
+	return signr;
+}
+
+/*
+ * This gets called after a signal handler has been set up.
+ * We set a flag so the next report knows it happened.
+ * If we're already stepping, make sure we do a report_signal.
+ * If not, make sure we get into utrace_resume() where we can
+ * clear the signal_handler flag before resuming.
+ */
+void utrace_signal_handler(struct task_struct *task, int stepping)
+{
+	struct utrace *utrace = task_utrace_struct(task);
+
+	spin_lock(&utrace->lock);
+
+	utrace->signal_handler = 1;
+	if (stepping) {
+		utrace->interrupt = 1;
+		set_tsk_thread_flag(task, TIF_SIGPENDING);
+	} else {
+		set_tsk_thread_flag(task, TIF_NOTIFY_RESUME);
+	}
+
+	spin_unlock(&utrace->lock);
+}
+
+/**
+ * utrace_prepare_examine - prepare to examine thread state
+ * @target:		thread of interest, a &struct task_struct pointer
+ * @engine:		engine pointer returned by utrace_attach_task()
+ * @exam:		temporary state, a &struct utrace_examiner pointer
+ *
+ * This call prepares to safely examine the thread @target using
+ * &struct user_regset calls, or direct access to thread-synchronous fields.
+ *
+ * When @target is current, this call is superfluous.  When @target is
+ * another thread, it must held stopped via %UTRACE_STOP by @engine.
+ *
+ * This call may block the caller until @target stays stopped, so it must
+ * be called only after the caller is sure @target is about to unschedule.
+ * This means a zero return from a utrace_control() call on @engine giving
+ * %UTRACE_STOP, or a report_quiesce() or report_signal() callback to
+ * @engine that used %UTRACE_STOP in its return value.
+ *
+ * Returns -%ESRCH if @target is dead or -%EINVAL if %UTRACE_STOP was
+ * not used.  If @target has started running again despite %UTRACE_STOP
+ * (for %SIGKILL or a spurious wakeup), this call returns -%EAGAIN.
+ *
+ * When this call returns zero, it's safe to use &struct user_regset
+ * calls and task_user_regset_view() on @target and to examine some of
+ * its fields directly.  When the examination is complete, a
+ * utrace_finish_examine() call must follow to check whether it was
+ * completed safely.
+ */
+int utrace_prepare_examine(struct task_struct *target,
+			   struct utrace_engine *engine,
+			   struct utrace_examiner *exam)
+{
+	int ret = 0;
+
+	if (unlikely(target == current))
+		return 0;
+
+	rcu_read_lock();
+	if (unlikely(!engine_wants_stop(engine)))
+		ret = -EINVAL;
+	else if (unlikely(target->exit_state))
+		ret = -ESRCH;
+	else {
+		exam->state = target->state;
+		if (unlikely(exam->state == TASK_RUNNING))
+			ret = -EAGAIN;
+		else
+			get_task_struct(target);
+	}
+	rcu_read_unlock();
+
+	if (likely(!ret)) {
+		exam->ncsw = wait_task_inactive(target, exam->state);
+		put_task_struct(target);
+		if (unlikely(!exam->ncsw))
+			ret = -EAGAIN;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(utrace_prepare_examine);
+
+/**
+ * utrace_finish_examine - complete an examination of thread state
+ * @target:		thread of interest, a &struct task_struct pointer
+ * @engine:		engine pointer returned by utrace_attach_task()
+ * @exam:		pointer passed to utrace_prepare_examine() call
+ *
+ * This call completes an examination on the thread @target begun by a
+ * paired utrace_prepare_examine() call with the same arguments that
+ * returned success (zero).
+ *
+ * When @target is current, this call is superfluous.  When @target is
+ * another thread, this returns zero if @target has remained unscheduled
+ * since the paired utrace_prepare_examine() call returned zero.
+ *
+ * When this returns an error, any examination done since the paired
+ * utrace_prepare_examine() call is unreliable and the data extracted
+ * should be discarded.  The error is -%EINVAL if @engine is not
+ * keeping @target stopped, or -%EAGAIN if @target woke up unexpectedly.
+ */
+int utrace_finish_examine(struct task_struct *target,
+			  struct utrace_engine *engine,
+			  struct utrace_examiner *exam)
+{
+	int ret = 0;
+
+	if (unlikely(target == current))
+		return 0;
+
+	rcu_read_lock();
+	if (unlikely(!engine_wants_stop(engine)))
+		ret = -EINVAL;
+	else if (unlikely(target->state != exam->state))
+		ret = -EAGAIN;
+	else
+		get_task_struct(target);
+	rcu_read_unlock();
+
+	if (likely(!ret)) {
+		unsigned long ncsw = wait_task_inactive(target, exam->state);
+		if (unlikely(ncsw != exam->ncsw))
+			ret = -EAGAIN;
+		put_task_struct(target);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(utrace_finish_examine);
+
+/*
+ * This is declared in linux/regset.h and defined in machine-dependent
+ * code.  We put the export here to ensure no machine forgets it.
+ */
+EXPORT_SYMBOL_GPL(task_user_regset_view);
+
+/*
+ * Called with rcu_read_lock() held.
+ */
+void task_utrace_proc_status(struct seq_file *m, struct task_struct *p)
+{
+	struct utrace *utrace = &p->utrace;
+	seq_printf(m, "Utrace: %lx%s%s%s\n",
+		   p->utrace_flags,
+		   utrace->stopped ? " (stopped)" : "",
+		   utrace->report ? " (report)" : "",
+		   utrace->interrupt ? " (interrupt)" : "");
+}


From roland at redhat.com  Sat Mar 21 01:42:44 2009
From: roland at redhat.com (Roland McGrath)
Date: Fri, 20 Mar 2009 18:42:44 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: Roland McGrath's message of  Friday, 20 March 2009 18:39:46 -0700
	<20090321013946.890F4FC3AB@magilla.sf.frob.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
Message-ID: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com>

From: Frank Ch. Eigler <fche at redhat.com>

This is v2 of the prototype utrace-ftrace interface.  This code is
based on Roland McGrath's utrace API, which provides programmatic
hooks to the in-tree tracehook layer.  This new patch interfaces many
of those events to ftrace, as configured by a small number of debugfs
controls.  Here's the /debugfs/tracing/process_trace_README:

process event tracer mini-HOWTO

1. Select process hierarchy to monitor.  Other processes will be
completely unaffected.  Leave at 0 for system-wide tracing.
%  echo NNN > process_follow_pid

2. Determine which process event traces are potentially desired.
syscall and signal tracing slow down monitored processes.
%  echo 0 > process_trace_{syscalls,signals,lifecycle}

3. Add any final uid- or taskcomm-based filtering.  Non-matching
processes will skip trace messages, but will still be slowed.
%  echo NNN > process_trace_uid_filter # -1: unrestricted
%  echo ls > process_trace_taskcomm_filter # empty: unrestricted

4. Start tracing.
%  echo process > current_tracer

5. Examine trace.
%  cat trace

6. Stop tracing.
%  echo nop > current_tracer

Signed-off-by: Frank Ch. Eigler <fche at redhat.com>
---
 include/linux/processtrace.h |   41 +++
 kernel/trace/Kconfig         |    9 +
 kernel/trace/Makefile        |    1 +
 kernel/trace/trace.h         |    8 +
 kernel/trace/trace_process.c |  601 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 660 insertions(+), 0 deletions(-)

diff --git a/include/linux/processtrace.h b/include/linux/processtrace.h
new file mode 100644
index ...f2b7d94 100644  
--- /dev/null
+++ b/include/linux/processtrace.h
@@ -0,0 +1,41 @@
+#ifndef PROCESSTRACE_H
+#define PROCESSTRACE_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+
+struct process_trace_entry {
+	unsigned char opcode;	/* one of _UTRACE_EVENT_* */
+	char comm[TASK_COMM_LEN]; /* XXX: should be in/via trace_entry */
+	union {
+		struct {
+			pid_t child;
+			unsigned long flags;
+		} trace_clone;
+		struct {
+			long code;
+		} trace_exit;
+		struct {
+		} trace_exec;
+		struct {
+			int si_signo;
+			int si_errno;
+			int si_code;
+		} trace_signal;
+		struct {
+			long callno;
+			unsigned long args[6];
+		} trace_syscall_entry;
+		struct {
+			long rc;
+			long error;
+		} trace_syscall_exit;
+	};
+};
+
+/* in kernel/trace/trace_process.c */
+
+extern void enable_process_trace(void);
+extern void disable_process_trace(void);
+
+#endif /* PROCESSTRACE_H */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 34e707e..8a92d6f 100644  
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -150,6 +150,15 @@ config CONTEXT_SWITCH_TRACER
 	  This tracer gets called from the context switch and records
 	  all switching of tasks.
 
+config PROCESS_TRACER
+	bool "Trace process events via utrace"
+	depends on DEBUG_KERNEL
+	select TRACING
+	select UTRACE
+	help
+	  This tracer provides trace records from process events
+	  accessible to utrace: lifecycle, system calls, and signals.
+
 config BOOT_TRACER
 	bool "Trace boot initcalls"
 	depends on DEBUG_KERNEL
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 349d5a9..a774db2 100644  
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -33,5 +33,6 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += t
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_HW_BRANCH_TRACER) += trace_hw_branches.o
 obj-$(CONFIG_POWER_TRACER) += trace_power.o
+obj-$(CONFIG_PROCESS_TRACER) += trace_process.o
 
 libftrace-y := ftrace.o
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4d3d381..c4d2e7f 100644  
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -7,6 +7,7 @@
 #include <linux/clocksource.h>
 #include <linux/ring_buffer.h>
 #include <linux/mmiotrace.h>
+#include <linux/processtrace.h>
 #include <linux/ftrace.h>
 #include <trace/boot.h>
 
@@ -30,6 +31,7 @@ enum trace_type {
 	TRACE_USER_STACK,
 	TRACE_HW_BRANCHES,
 	TRACE_POWER,
+        TRACE_PROCESS,
 
 	__TRACE_LAST_TYPE
 };
@@ -170,6 +172,11 @@ struct trace_power {
 	struct power_trace	state_data;
 };
 
+struct trace_process {
+	struct trace_entry		ent;
+	struct process_trace_entry	event;
+};
+
 /*
  * trace_flag_type is an enumeration that holds different
  * states when a trace occurs. These are:
@@ -280,6 +287,7 @@ extern void __ftrace_bad_type(void);
 			  TRACE_GRAPH_RET);		\
 		IF_ASSIGN(var, ent, struct hw_branch_entry, TRACE_HW_BRANCHES);\
  		IF_ASSIGN(var, ent, struct trace_power, TRACE_POWER); \
+		IF_ASSIGN(var, ent, struct trace_process, TRACE_PROCESS); \
 		__ftrace_bad_type();					\
 	} while (0)
 
diff --git a/kernel/trace/trace_process.c b/kernel/trace/trace_process.c
new file mode 100644
index ...0820e56 100644  
--- /dev/null
+++ b/kernel/trace/trace_process.c
@@ -0,0 +1,601 @@
+/*
+ * utrace-based process event tracing
+ * Copyright (C) 2009 Red Hat Inc.
+ * By Frank Ch. Eigler <fche at redhat.com>
+ *
+ * Based on mmio ftrace engine by Pekka Paalanen
+ * and utrace-syscall-tracing prototype by Ananth Mavinakayanahalli
+ */
+
+/* #define DEBUG 1 */
+
+#include <linux/kernel.h>
+#include <linux/utrace.h>
+#include <linux/uaccess.h>
+#include <linux/debugfs.h>
+#include <asm/syscall.h>
+
+#include "trace.h"
+
+/* A process must match these filters in order to be traced. */
+static char trace_taskcomm_filter[TASK_COMM_LEN]; /* \0: unrestricted */
+static u32 trace_taskuid_filter = -1; /* -1: unrestricted */
+static u32 trace_lifecycle_p = 1;
+static u32 trace_syscalls_p = 1;
+static u32 trace_signals_p = 1;
+
+/* A process must be a direct child of given pid in order to be
+   followed. */
+static u32 process_follow_pid; /* 0: unrestricted/systemwide */
+
+/* XXX: lock the above? */
+
+
+/* trace data collection */
+
+static struct trace_array *process_trace_array;
+
+static void process_reset_data(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	tracing_reset_online_cpus(tr);
+}
+
+static int process_trace_init(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	process_trace_array = tr;
+	process_reset_data(tr);
+	enable_process_trace();
+	return 0;
+}
+
+static void process_trace_reset(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	disable_process_trace();
+	process_reset_data(tr);
+	process_trace_array = NULL;
+}
+
+static void process_trace_start(struct trace_array *tr)
+{
+	pr_debug("in %s\n", __func__);
+	process_reset_data(tr);
+}
+
+static void __trace_processtrace(struct trace_array *tr,
+				struct trace_array_cpu *data,
+				struct process_trace_entry *ent)
+{
+	struct ring_buffer_event *event;
+	struct trace_process *entry;
+	unsigned long irq_flags;
+
+	event	= ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+					   &irq_flags);
+	if (!event)
+		return;
+	entry	= ring_buffer_event_data(event);
+	tracing_generic_entry_update(&entry->ent, 0, preempt_count());
+	entry->ent.cpu			= raw_smp_processor_id();
+	entry->ent.type			= TRACE_PROCESS;
+	strlcpy(ent->comm, current->comm, TASK_COMM_LEN);
+	entry->event			= *ent;
+	ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+
+	trace_wake_up();
+}
+
+void process_trace(struct process_trace_entry *ent)
+{
+	struct trace_array *tr = process_trace_array;
+	struct trace_array_cpu *data;
+
+	preempt_disable();
+	data = tr->data[smp_processor_id()];
+	__trace_processtrace(tr, data, ent);
+	preempt_enable();
+}
+
+
+/* trace data rendering */
+
+static void process_pipe_open(struct trace_iterator *iter)
+{
+	struct trace_seq *s = &iter->seq;
+	pr_debug("in %s\n", __func__);
+	trace_seq_printf(s, "VERSION 200901\n");
+}
+
+static void process_close(struct trace_iterator *iter)
+{
+	iter->private = NULL;
+}
+
+static ssize_t process_read(struct trace_iterator *iter, struct file *filp,
+				char __user *ubuf, size_t cnt, loff_t *ppos)
+{
+	ssize_t ret;
+	struct trace_seq *s = &iter->seq;
+	ret = trace_seq_to_user(s, ubuf, cnt);
+	return (ret == -EBUSY) ? 0 : ret;
+}
+
+static enum print_line_t process_print(struct trace_iterator *iter)
+{
+	struct trace_entry *entry = iter->ent;
+	struct trace_process *field;
+	struct trace_seq *s	= &iter->seq;
+	unsigned long long t	= ns2usecs(iter->ts);
+	unsigned long usec_rem	= do_div(t, 1000000ULL);
+	unsigned secs		= (unsigned long)t;
+	int ret = 1;
+
+	trace_assign_type(field, entry);
+
+	/* XXX: If print_lat_fmt() were not static, we wouldn't have
+	   to duplicate this. */
+	trace_seq_printf(s, "%16s %5d %3d %9lu.%06ld ",
+			 field->event.comm,
+			 entry->pid, entry->cpu,
+			 secs,
+			 usec_rem);
+
+	switch (field->event.opcode) {
+	case _UTRACE_EVENT_CLONE:
+		ret = trace_seq_printf(s, "fork %d flags 0x%lx\n",
+				       field->event.trace_clone.child,
+				       field->event.trace_clone.flags);
+		break;
+	case _UTRACE_EVENT_EXEC:
+		ret = trace_seq_printf(s, "exec\n");
+		break;
+	case _UTRACE_EVENT_EXIT:
+		ret = trace_seq_printf(s, "exit %ld\n",
+				       field->event.trace_exit.code);
+		break;
+	case _UTRACE_EVENT_SIGNAL:
+		ret = trace_seq_printf(s, "signal %d errno %d code 0x%x\n",
+				       field->event.trace_signal.si_signo,
+				       field->event.trace_signal.si_errno,
+				       field->event.trace_signal.si_code);
+		break;
+	case _UTRACE_EVENT_SYSCALL_ENTRY:
+		ret = trace_seq_printf(s, "syscall %ld [0x%lx 0x%lx 0x%lx"
+					  " 0x%lx 0x%lx 0x%lx]\n",
+				      field->event.trace_syscall_entry.callno,
+				      field->event.trace_syscall_entry.args[0],
+				      field->event.trace_syscall_entry.args[1],
+				      field->event.trace_syscall_entry.args[2],
+				      field->event.trace_syscall_entry.args[3],
+				      field->event.trace_syscall_entry.args[4],
+				      field->event.trace_syscall_entry.args[5]);
+		break;
+	case _UTRACE_EVENT_SYSCALL_EXIT:
+		ret = trace_seq_printf(s, "syscall rc %ld error %ld\n",
+				       field->event.trace_syscall_exit.rc,
+				       field->event.trace_syscall_exit.error);
+		break;
+	default:
+		ret = trace_seq_printf(s, "process code %d?\n",
+				       field->event.opcode);
+		break;
+	}
+	if (ret)
+		return TRACE_TYPE_HANDLED;
+	return TRACE_TYPE_HANDLED;
+}
+
+
+static enum print_line_t process_print_line(struct trace_iterator *iter)
+{
+	switch (iter->ent->type) {
+	case TRACE_PROCESS:
+		return process_print(iter);
+	default:
+		return TRACE_TYPE_HANDLED; /* ignore unknown entries */
+	}
+}
+
+static struct tracer process_tracer = {
+	.name		= "process",
+	.init		= process_trace_init,
+	.reset		= process_trace_reset,
+	.start		= process_trace_start,
+	.pipe_open	= process_pipe_open,
+	.close		= process_close,
+	.read		= process_read,
+	.print_line	= process_print_line,
+};
+
+
+
+/* utrace backend */
+
+/* Should tracing apply to given task?	Compare against filter
+   values. */
+static int trace_test(struct task_struct *tsk)
+{
+	if (trace_taskcomm_filter[0]
+	    && strncmp(trace_taskcomm_filter, tsk->comm, TASK_COMM_LEN))
+		return 0;
+
+	if (trace_taskuid_filter != (u32)-1
+	    && trace_taskuid_filter != task_uid(tsk))
+		return 0;
+
+	return 1;
+}
+
+
+static const struct utrace_engine_ops process_trace_ops;
+
+static void process_trace_tryattach(struct task_struct *tsk)
+{
+	struct utrace_engine *engine;
+
+	pr_debug("in %s\n", __func__);
+	engine = utrace_attach_task(tsk,
+				    UTRACE_ATTACH_CREATE |
+				    UTRACE_ATTACH_EXCLUSIVE,
+				    &process_trace_ops, NULL);
+	if (IS_ERR(engine) || (engine == NULL)) {
+		pr_warning("utrace_attach_task %d (rc %p)\n",
+			   tsk->pid, engine);
+	} else {
+		int rc;
+
+		/* We always hook cost-free events. */
+		unsigned long events =
+			UTRACE_EVENT(CLONE) |
+			UTRACE_EVENT(EXEC) |
+			UTRACE_EVENT(EXIT);
+
+		/* Penalizing events are individually controlled, so that
+		   utrace doesn't even take the monitored threads off their
+		   fast paths, nor bother call our callbacks. */
+		if (trace_syscalls_p)
+			events |= UTRACE_EVENT_SYSCALL;
+		if (trace_signals_p)
+			events |= UTRACE_EVENT_SIGNAL_ALL;
+
+		rc = utrace_set_events(tsk, engine, events);
+		if (rc == -EINPROGRESS)
+			rc = utrace_barrier(tsk, engine);
+		if (rc)
+			pr_warning("utrace_set_events/barrier rc %d\n", rc);
+
+		utrace_engine_put(engine);
+		pr_debug("attached in %s to %s(%d)\n", __func__,
+			 tsk->comm, tsk->pid);
+	}
+}
+
+
+u32 process_trace_report_clone(enum utrace_resume_action action,
+			       struct utrace_engine *engine,
+			       struct task_struct *parent,
+			       unsigned long clone_flags,
+			       struct task_struct *child)
+{
+	if (trace_lifecycle_p && trace_test(parent)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_CLONE;
+		ent.trace_clone.child = child->pid;
+		ent.trace_clone.flags = clone_flags;
+		process_trace(&ent);
+	}
+
+	process_trace_tryattach(child);
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_syscall_entry(u32 action,
+				       struct utrace_engine *engine,
+				       struct task_struct *task,
+				       struct pt_regs *regs)
+{
+	if (trace_syscalls_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_SYSCALL_ENTRY;
+		ent.trace_syscall_entry.callno = syscall_get_nr(task, regs);
+		syscall_get_arguments(task, regs, 0, 6,
+				      ent.trace_syscall_entry.args);
+		process_trace(&ent);
+	}
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_syscall_exit(enum utrace_resume_action action,
+				   struct utrace_engine *engine,
+				   struct task_struct *task,
+				   struct pt_regs *regs)
+{
+	if (trace_syscalls_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_SYSCALL_EXIT;
+		ent.trace_syscall_exit.rc =
+			syscall_get_return_value(task, regs);
+		ent.trace_syscall_exit.error = syscall_get_error(task, regs);
+		process_trace(&ent);
+	}
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_exec(enum utrace_resume_action action,
+			      struct utrace_engine *engine,
+			      struct task_struct *task,
+			      const struct linux_binfmt *fmt,
+			      const struct linux_binprm *bprm,
+			      struct pt_regs *regs)
+{
+	if (trace_lifecycle_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_EXEC;
+		process_trace(&ent);
+	}
+
+	/* We're already attached; no need for a new tryattach. */
+
+	return UTRACE_RESUME;
+}
+
+
+u32 process_trace_report_signal(u32 action,
+				struct utrace_engine *engine,
+				struct task_struct *task,
+				struct pt_regs *regs,
+				siginfo_t *info,
+				const struct k_sigaction *orig_ka,
+				struct k_sigaction *return_ka)
+{
+	if (trace_signals_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_SIGNAL;
+		ent.trace_signal.si_signo = info->si_signo;
+		ent.trace_signal.si_errno = info->si_errno;
+		ent.trace_signal.si_code = info->si_code;
+		process_trace(&ent);
+	}
+
+	/* We're already attached, so no need for a new tryattach. */
+
+	return UTRACE_RESUME | utrace_signal_action(action);
+}
+
+
+u32 process_trace_report_exit(enum utrace_resume_action action,
+			      struct utrace_engine *engine,
+			      struct task_struct *task,
+			      long orig_code, long *code)
+{
+	if (trace_lifecycle_p && trace_test(task)) {
+		struct process_trace_entry ent;
+		ent.opcode = _UTRACE_EVENT_EXIT;
+		ent.trace_exit.code = orig_code;
+		process_trace(&ent);
+	}
+
+	/* There is no need to explicitly attach or detach here. */
+
+	return UTRACE_RESUME;
+}
+
+
+void enable_process_trace()
+{
+	struct task_struct *grp, *tsk;
+
+	pr_debug("in %s\n", __func__);
+	rcu_read_lock();
+	do_each_thread(grp, tsk) {
+		/* Skip over kernel threads. */
+		if (tsk->flags & PF_KTHREAD)
+			continue;
+
+		if (process_follow_pid) {
+			if (tsk->tgid == process_follow_pid ||
+			    tsk->parent->tgid == process_follow_pid)
+				process_trace_tryattach(tsk);
+		} else {
+			process_trace_tryattach(tsk);
+		}
+	} while_each_thread(grp, tsk);
+	rcu_read_unlock();
+}
+
+void disable_process_trace()
+{
+	struct utrace_engine *engine;
+	struct task_struct *grp, *tsk;
+	int rc;
+
+	pr_debug("in %s\n", __func__);
+	rcu_read_lock();
+	do_each_thread(grp, tsk) {
+		/* Find matching engine, if any.  Returns -ENOENT for
+		   unattached threads. */
+		engine = utrace_attach_task(tsk, UTRACE_ATTACH_MATCH_OPS,
+					    &process_trace_ops, 0);
+		if (IS_ERR(engine)) {
+			if (PTR_ERR(engine) != -ENOENT)
+				pr_warning("utrace_attach_task %d (rc %ld)\n",
+					   tsk->pid, -PTR_ERR(engine));
+		} else if (engine == NULL) {
+			pr_warning("utrace_attach_task %d (null engine)\n",
+				   tsk->pid);
+		} else {
+			/* Found one of our own engines.  Detach.  */
+			rc = utrace_control(tsk, engine, UTRACE_DETACH);
+			switch (rc) {
+			case 0:		    /* success */
+				break;
+			case -ESRCH:	    /* REAP callback already begun */
+			case -EALREADY:	    /* DEATH callback already begun */
+				break;
+			default:
+				rc = -rc;
+				pr_warning("utrace_detach %d (rc %d)\n",
+					   tsk->pid, rc);
+				break;
+			}
+			utrace_engine_put(engine);
+			pr_debug("detached in %s from %s(%d)\n", __func__,
+				 tsk->comm, tsk->pid);
+		}
+	} while_each_thread(grp, tsk);
+	rcu_read_unlock();
+}
+
+
+static const struct utrace_engine_ops process_trace_ops = {
+	.report_clone = process_trace_report_clone,
+	.report_exec = process_trace_report_exec,
+	.report_exit = process_trace_report_exit,
+	.report_signal = process_trace_report_signal,
+	.report_syscall_entry = process_trace_report_syscall_entry,
+	.report_syscall_exit = process_trace_report_syscall_exit,
+};
+
+
+
+/* control interfaces */
+
+
+static ssize_t
+trace_taskcomm_filter_read(struct file *filp, char __user *ubuf,
+			   size_t cnt, loff_t *ppos)
+{
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+				       trace_taskcomm_filter, TASK_COMM_LEN);
+}
+
+
+static ssize_t
+trace_taskcomm_filter_write(struct file *filp, const char __user *ubuf,
+			    size_t cnt, loff_t *fpos)
+{
+	char *end;
+
+	if (cnt > TASK_COMM_LEN)
+		cnt = TASK_COMM_LEN;
+
+	if (copy_from_user(trace_taskcomm_filter, ubuf, cnt))
+		return -EFAULT;
+
+	/* Cut from the first nil or newline. */
+	trace_taskcomm_filter[cnt] = '\0';
+	end = strchr(trace_taskcomm_filter, '\n');
+	if (end)
+		*end = '\0';
+
+	*fpos += cnt;
+	return cnt;
+}
+
+
+static const struct file_operations trace_taskcomm_filter_fops = {
+	.open		= tracing_open_generic,
+	.read		= trace_taskcomm_filter_read,
+	.write		= trace_taskcomm_filter_write,
+};
+
+
+
+static char README_text[] =
+	"process event tracer mini-HOWTO\n"
+	"\n"
+	"1. Select process hierarchy to monitor.  Other processes will be\n"
+	"   completely unaffected.  Leave at 0 for system-wide tracing.\n"
+	"#  echo NNN > process_follow_pid\n"
+	"\n"
+	"2. Determine which process event traces are potentially desired.\n"
+	"   syscall and signal tracing slow down monitored processes.\n"
+	"#  echo 0 > process_trace_{syscalls,signals,lifecycle}\n"
+	"\n"
+	"3. Add any final uid- or taskcomm-based filtering.  Non-matching\n"
+	"   processes will skip trace messages, but will still be slowed.\n"
+	"#  echo NNN > process_trace_uid_filter # -1: unrestricted \n"
+	"#  echo ls > process_trace_taskcomm_filter # empty: unrestricted\n"
+	"\n"
+	"4. Start tracing.\n"
+	"#  echo process > current_tracer\n"
+	"\n"
+	"5. Examine trace.\n"
+	"#  cat trace\n"
+	"\n"
+	"6. Stop tracing.\n"
+	"#  echo nop > current_tracer\n"
+	;
+
+static struct debugfs_blob_wrapper README_blob = {
+	.data = README_text,
+	.size = sizeof(README_text),
+};
+
+
+static __init int init_process_trace(void)
+{
+	struct dentry *d_tracer;
+	struct dentry *entry;
+
+	d_tracer = tracing_init_dentry();
+
+	entry = debugfs_create_blob("process_trace_README", 0444, d_tracer,
+				    &README_blob);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_trace_README' entry\n");
+
+	/* Control for scoping process following. */
+	entry = debugfs_create_u32("process_follow_pid", 0644, d_tracer,
+				   &process_follow_pid);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_follow_pid' entry\n");
+
+	/* Process-level filters */
+	entry = debugfs_create_file("process_trace_taskcomm_filter", 0644,
+				    d_tracer, NULL,
+				    &trace_taskcomm_filter_fops);
+	/* XXX: it'd be nice to have a read/write debugfs_create_blob. */
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_trace_taskcomm_filter' entry\n");
+
+	entry = debugfs_create_u32("process_trace_uid_filter", 0644, d_tracer,
+				   &trace_taskuid_filter);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_trace_uid_filter' entry\n");
+
+	/* Event-level filters. */
+	entry = debugfs_create_u32("process_trace_lifecycle", 0644, d_tracer,
+				   &trace_lifecycle_p);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_trace_lifecycle' entry\n");
+
+	entry = debugfs_create_u32("process_trace_syscalls", 0644, d_tracer,
+				   &trace_syscalls_p);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_trace_syscalls' entry\n");
+
+	entry = debugfs_create_u32("process_trace_signals", 0644, d_tracer,
+				   &trace_signals_p);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'process_trace_signals' entry\n");
+
+	return register_tracer(&process_tracer);
+}
+
+device_initcall(init_process_trace);


From mingo at elte.hu  Sat Mar 21 07:43:01 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 21 Mar 2009 08:43:01 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
Message-ID: <20090321074301.GA19384@elte.hu>


* Roland McGrath <roland at redhat.com> wrote:

> From: Frank Ch. Eigler <fche at redhat.com>
> 
> This is v2 of the prototype utrace-ftrace interface.  This code is 
> based on Roland McGrath's utrace API, which provides programmatic 
> hooks to the in-tree tracehook layer.  This new patch interfaces 
> many of those events to ftrace, as configured by a small number of 
> debugfs controls.  Here's the 
> /debugfs/tracing/process_trace_README:

Please submit changes/enhancements to kernel/trace/* to the tracing 
tree maintainers (Steve and me) for review, testing and integration.

Please also post patches against the latest tracing tree:

   http://people.redhat.com/mingo/tip.git/README

As this patch does not apply:

 Applying patch patches/utrace-based-ftrace-process-engine-v2.patch
 patching file include/linux/processtrace.h
 patching file kernel/trace/Kconfig
 Hunk #1 succeeded at 186 with fuzz 2 (offset 36 lines).
 patching file kernel/trace/Makefile
 Hunk #1 FAILED at 33.
 1 out of 1 hunk FAILED -- rejects in file kernel/trace/Makefile
 patching file kernel/trace/trace.h
 Hunk #1 succeeded at 7 with fuzz 1.
 Hunk #2 FAILED at 31.
 Hunk #3 succeeded at 215 with fuzz 2 (offset 43 lines).
 Hunk #4 FAILED at 330.
 2 out of 4 hunks FAILED -- rejects in file kernel/trace/trace.h
 patching file kernel/trace/trace_process.c
 Patch patches/utrace-based-ftrace-process-engine-v2.patch does not apply (enforce with -f)

Thanks,

	Ingo


From akpm at linux-foundation.org  Sat Mar 21 08:39:12 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Sat, 21 Mar 2009 01:39:12 -0700
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321074301.GA19384@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
Message-ID: <20090321013912.ed6039c9.akpm@linux-foundation.org>

On Sat, 21 Mar 2009 08:43:01 +0100 Ingo Molnar <mingo at elte.hu> wrote:

> 
> * Roland McGrath <roland at redhat.com> wrote:
> 
> > From: Frank Ch. Eigler <fche at redhat.com>
> > 
> > This is v2 of the prototype utrace-ftrace interface.  This code is 
> > based on Roland McGrath's utrace API, which provides programmatic 
> > hooks to the in-tree tracehook layer.  This new patch interfaces 
> > many of those events to ftrace, as configured by a small number of 
> > debugfs controls.  Here's the 
> > /debugfs/tracing/process_trace_README:
> 
> Please submit changes/enhancements to kernel/trace/* to the tracing 
> tree maintainers (Steve and me) for review, testing and integration.
> 
> Please also post patches against the latest tracing tree:
> 
>    http://people.redhat.com/mingo/tip.git/README

uhm, this patch depends on the (large) utrace patch, which is not kernel/trace
material.

> As this patch does not apply:
> 
>  Applying patch patches/utrace-based-ftrace-process-engine-v2.patch
>  patching file include/linux/processtrace.h
>  patching file kernel/trace/Kconfig
>  Hunk #1 succeeded at 186 with fuzz 2 (offset 36 lines).
>  patching file kernel/trace/Makefile
>  Hunk #1 FAILED at 33.
>  1 out of 1 hunk FAILED -- rejects in file kernel/trace/Makefile
>  patching file kernel/trace/trace.h
>  Hunk #1 succeeded at 7 with fuzz 1.
>  Hunk #2 FAILED at 31.
>  Hunk #3 succeeded at 215 with fuzz 2 (offset 43 lines).
>  Hunk #4 FAILED at 330.
>  2 out of 4 hunks FAILED -- rejects in file kernel/trace/trace.h
>  patching file kernel/trace/trace_process.c
>  Patch patches/utrace-based-ftrace-process-engine-v2.patch does not apply (enforce with -f)

The rejects are trivial.


From akpm at linux-foundation.org  Sat Mar 21 08:49:09 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Sat, 21 Mar 2009 01:49:09 -0700
Subject: [PATCH 2/3] utrace core
In-Reply-To: <20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
Message-ID: <20090321014909.6b654f55.akpm@linux-foundation.org>

On Fri, 20 Mar 2009 18:41:40 -0700 (PDT) Roland McGrath <roland at redhat.com> wrote:

> This adds the utrace facility, a new modular interface in the kernel for
> implementing user thread tracing and debugging.  This fits on top of the
> tracehook_* layer, so the new code is well-isolated.
> 
> The new interface is in <linux/utrace.h> and the DocBook utrace book
> describes it.  It allows for multiple separate tracing engines to work in
> parallel without interfering with each other.  Higher-level tracing
> facilities can be implemented as loadable kernel modules using this layer.
> 
> The new facility is made optional under CONFIG_UTRACE.
> When this is not enabled, no new code is added.
> It can only be enabled on machines that have all the
> prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK.
> 
> In this initial version, utrace and ptrace do not play together at all.
> If ptrace is attached to a thread, the attach calls in the utrace kernel
> API return -EBUSY.  If utrace is attached to a thread, the PTRACE_ATTACH
> or PTRACE_TRACEME request will return EBUSY to userland.  The old ptrace
> code is otherwise unchanged and nothing using ptrace should be affected
> by this patch as long as utrace is not used at the same time.  In the
> future we can clean up the ptrace implementation and rework it to use
> the utrace API.

I'd be interested in seeing a bit of discussion regarding the overall value
of utrace - it has been quite a while since it floated past.

I assume that redoing ptrace to be a client of utrace _will_ happen, and
that this is merely a cleanup exercise with no new user-visible features?

The "prototype utrace-ftrace interface" seems to be more a cool toy rather
than a serious new kernel feature (yes?)

If so, what are the new killer utrace clients which would justify all these
changes?


Also, is it still the case that RH are shipping utrace?  If so, for what
reasons and what benefits are users seeing from it?


And I recall that there were real problems wiring up the Feb 2007 version
of utrace to the ARM architecture.  Have those issues been resolved?  Are
any problems expected for any architectures?

Thanks.


From mingo at elte.hu  Sat Mar 21 09:12:35 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 21 Mar 2009 10:12:35 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321013912.ed6039c9.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
Message-ID: <20090321091235.GA29678@elte.hu>


* Andrew Morton <akpm at linux-foundation.org> wrote:

> On Sat, 21 Mar 2009 08:43:01 +0100 Ingo Molnar <mingo at elte.hu> wrote:
> 
> > 
> > * Roland McGrath <roland at redhat.com> wrote:
> > 
> > > From: Frank Ch. Eigler <fche at redhat.com>
> > > 
> > > This is v2 of the prototype utrace-ftrace interface.  This code is 
> > > based on Roland McGrath's utrace API, which provides programmatic 
> > > hooks to the in-tree tracehook layer.  This new patch interfaces 
> > > many of those events to ftrace, as configured by a small number of 
> > > debugfs controls.  Here's the 
> > > /debugfs/tracing/process_trace_README:
> > 
> > Please submit changes/enhancements to kernel/trace/* to the tracing 
> > tree maintainers (Steve and me) for review, testing and integration.
> > 
> > Please also post patches against the latest tracing tree:
> > 
> >    http://people.redhat.com/mingo/tip.git/README
> 
> uhm, this patch depends on the (large) utrace patch, which is not 
> kernel/trace material.

The thing is, utrace crashes in Fedora have dominated kerneloops.org 
for many months, so i'm not sure what to make of the idea of posting 
a 4000+ lines of core kernel code patchset on the last day of the 
development cycle, a posting that has carefully avoided the Cc:-ing 
of affected maintainers ;-)

Utrace is very much tracing material - without the ftrace plugin the 
whole utrace machinery is just something that provides a _ton_ of 
hooks to something entirely external: SystemTap mainly.

kernel/utrace.c should probably be introduced as 
kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending 
work in the tracing tree and cooperation would be nice and desired.

The ftrace/utrace plugin is the only real connection utrace has to 
the mainline kernel, so proper review by the tracing folks and 
cooperation with the tracing folks is very much needed for the whole 
thing.

	Ingo


From akpm at linux-foundation.org  Sat Mar 21 11:19:54 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Sat, 21 Mar 2009 04:19:54 -0700
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321091235.GA29678@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
Message-ID: <20090321041954.72b99e69.akpm@linux-foundation.org>

On Sat, 21 Mar 2009 10:12:35 +0100 Ingo Molnar <mingo at elte.hu> wrote:

> 
> * Andrew Morton <akpm at linux-foundation.org> wrote:
> 
> > On Sat, 21 Mar 2009 08:43:01 +0100 Ingo Molnar <mingo at elte.hu> wrote:
> > 
> > > 
> > > * Roland McGrath <roland at redhat.com> wrote:
> > > 
> > > > From: Frank Ch. Eigler <fche at redhat.com>
> > > > 
> > > > This is v2 of the prototype utrace-ftrace interface.  This code is 
> > > > based on Roland McGrath's utrace API, which provides programmatic 
> > > > hooks to the in-tree tracehook layer.  This new patch interfaces 
> > > > many of those events to ftrace, as configured by a small number of 
> > > > debugfs controls.  Here's the 
> > > > /debugfs/tracing/process_trace_README:
> > > 
> > > Please submit changes/enhancements to kernel/trace/* to the tracing 
> > > tree maintainers (Steve and me) for review, testing and integration.
> > > 
> > > Please also post patches against the latest tracing tree:
> > > 
> > >    http://people.redhat.com/mingo/tip.git/README
> > 
> > uhm, this patch depends on the (large) utrace patch, which is not 
> > kernel/trace material.
> 
> The thing is, utrace crashes in Fedora have dominated kerneloops.org 
> for many months, so i'm not sure what to make of the idea of posting 
> a 4000+ lines of core kernel code patchset on the last day of the 
> development cycle, a posting that has carefully avoided the Cc:-ing 
> of affected maintainers ;-)
> 
> Utrace is very much tracing material - without the ftrace plugin the 
> whole utrace machinery is just something that provides a _ton_ of 
> hooks to something entirely external: SystemTap mainly.

Roland's changelogs don't mention systemtap at all afacit.

That was, umm, major information lossage.

> kernel/utrace.c should probably be introduced as 
> kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending 
> work in the tracing tree and cooperation would be nice and desired.
> 
> The ftrace/utrace plugin is the only real connection utrace has to 
> the mainline kernel, so proper review by the tracing folks and 
> cooperation with the tracing folks is very much needed for the whole 
> thing.

Actually it seems that the whole utrace-ftrace thing is a big distraction and
could/should just be omitted.  This is a systemtap feature and should be viewed as
such.

This is all a bit weird.


From fche at redhat.com  Sat Mar 21 11:51:41 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sat, 21 Mar 2009 07:51:41 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321041954.72b99e69.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
Message-ID: <20090321115141.GA3566@redhat.com>

Hi -

On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
> [...]
> > Utrace is very much tracing material - without the ftrace plugin the 
> > whole utrace machinery is just something that provides a _ton_ of 
> > hooks to something entirely external: SystemTap mainly.
> 
> Roland's changelogs don't mention systemtap at all afacit.
> That was, umm, major information lossage.

There have been many mixed messages from LKML on the topic - sometimes
mentioning systemtap is forbidden, other times necessary.  Sorry about
that.

There are several non-systemtap clients in existence or under
development.  You've may have heard of the ptrace cleanup, a
multi-client ptrace replacement, an on-the-fly core dumper, the ftrace
widget, user-space probes.  All of these should have somewhat
compelling non-systemtap uses, if that's an important criterion.


> Actually it seems that the whole utrace-ftrace thing is a big
> distraction and could/should just be omitted.  This is a systemtap
> feature and should be viewed as such. [...]

utrace is a better way to perform user thread management than what is
there now, and the utrace-ftrace widget shows how to *hook* thread
events such as syscalls in a lighter weight / more managed way than
the first one proposed.  (That's one reason we've been participating
in the ftrace discussions.)  Of course it can be made to use the fine
syscall pretty-printing code recently added.


- FChE


From akpm at linux-foundation.org  Sat Mar 21 12:04:22 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Sat, 21 Mar 2009 05:04:22 -0700
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321115141.GA3566@redhat.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
Message-ID: <20090321050422.d1d99eec.akpm@linux-foundation.org>

On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" <fche at redhat.com> wrote:

> Hi -
> 
> On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
> > [...]
> > > Utrace is very much tracing material - without the ftrace plugin the 
> > > whole utrace machinery is just something that provides a _ton_ of 
> > > hooks to something entirely external: SystemTap mainly.
> > 
> > Roland's changelogs don't mention systemtap at all afacit.
> > That was, umm, major information lossage.
> 
> There have been many mixed messages from LKML on the topic - sometimes
> mentioning systemtap is forbidden, other times necessary.  Sorry about
> that.

heh.  We all love systemtap and want it to get better.

> There are several non-systemtap clients in existence or under
> development.  You've may have heard of the ptrace cleanup, a
> multi-client ptrace replacement, an on-the-fly core dumper, the ftrace
> widget, user-space probes.  All of these should have somewhat
> compelling non-systemtap uses, if that's an important criterion.

Well I dunno.  You guys are closer to this than I am, but I'd have thought
that systemtap is the main game here, and most/all of the above is just
fluff.

IOW, "this helps systemtap" is sufficient reason for merging a kernel
change.  For sufficiently large values of "help", and sufficiently small
values of "eww", of course.


I have strong memories of being traumatised by reading the uprobes code. 
What's the story on all of that nowadays?


> 
> > Actually it seems that the whole utrace-ftrace thing is a big
> > distraction and could/should just be omitted.  This is a systemtap
> > feature and should be viewed as such. [...]
> 
> utrace is a better way to perform user thread management than what is
> there now, and the utrace-ftrace widget shows how to *hook* thread
> events such as syscalls in a lighter weight / more managed way than
> the first one proposed.  (That's one reason we've been participating
> in the ftrace discussions.)  Of course it can be made to use the fine
> syscall pretty-printing code recently added.

eh.  Boring.  Let's fix systemtap?


From fche at redhat.com  Sat Mar 21 12:57:06 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sat, 21 Mar 2009 08:57:06 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321050422.d1d99eec.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
Message-ID: <20090321125706.GB3566@redhat.com>

Hi -

On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote:
> [...]
> > There have been many mixed messages from LKML on the topic - sometimes
> > mentioning systemtap is forbidden, other times necessary.  Sorry about
> > that.
> 
> heh.  We all love systemtap and want it to get better.

Great!


> [...]
> I have strong memories of being traumatised by reading the uprobes
> code.  What's the story on all of that nowadays?

uprobes, being a layer upon utrace that provides a kprobes-like
breakpointing API for user threads, is being refactored into several
parts.  I don't know about the aesthetics of it all, but I believe the
general future plan is this:

One piece would perform machine code analysis (to classify
instructions for ideal/safe placement of breakpoints or for code
patching), and another thin layer that uses this and utrace to manage
user-space breakpoints.  (Systemtap would interface at this point.)
Then a user-space syscallish interface could come along to expose this
to a super-ptrace client (to speed up gdb; perhaps to allow multiple
debuggers).  Plus one might as well add an ftrace-engine for it
(directly analogous to the recent kprobe-based one that ftrace people
found "cool".)


> > > Actually it seems that the whole utrace-ftrace thing is a big
> > > distraction and could/should just be omitted.  This is a systemtap
> > > feature and should be viewed as such. [...]
> > 
> > utrace is a better way to perform user thread management than what is
> > there now, and the utrace-ftrace widget shows how to *hook* thread
> > events such as syscalls in a lighter weight / more managed way than
> > the first one proposed.  (That's one reason we've been participating
> > in the ftrace discussions.)  Of course it can be made to use the fine
> > syscall pretty-printing code recently added.
> 
> eh.  Boring.  Let's fix systemtap?

There are several constituencies here, some of which find the above
exciting.  That's OK and we'd like to help them too.


- FChE


From renzo at cs.unibo.it  Sat Mar 21 14:08:22 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Sat, 21 Mar 2009 15:08:22 +0100
Subject: [PATCH 2/3] utrace core
In-Reply-To: <20090321014909.6b654f55.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
Message-ID: <20090321140822.GE18690@cs.unibo.it>

Tracing does not mean only debug. Some tracing facilities can be used for virtualization.
For example User-Mode Linux is based on ptrace.

I have a prototype of kernel module for virtualization (kmview) based on utrace.
Using kmview (module+VMM) it is possible for a user (not root) to mount a filesystem just for 
a process (or a hierarchy of processes), or it is possible for some processes to
use different networking stacks or virtual devices. It is something like user-mode containers.
kmview provides the same features of umview, based on ptrace, in a (very) faster way.
(umview is in Debian lenny,squeeze,sid if you want to test it)

*Utrace is really what I wanted* to support kmview (apart from
some minor issues about the support of nested virtualizations).
Other virtualizations now based on ptrace could move part of their implementation
at kernel level by utrace and several speedups become possible.
For example kmview is a partial virtual machine monitor: some system calls are forwarded
to the kernel, some others virtualized.
When a user mounts a filesystem, all the system calls which use pathnames inside the mountpoint 
subtree get virtualized while the others are forwarded to the kernel.
With utrace the kmview kernel module handles many system calls at kernel level.
I mean, if an "open" system call was sent to the kernel because the path is outside
the virtualized part of the file system, all the system calls on the same file descriptors 
can be forwarded to the kernel without any request to the VMM at user level.
This is just one example of speedup, several others are possible.

Other virtualizations like user-mode linux or fakeroot-ng could use utrace to
speedup their virtualization, too.

As far as I have seen, systemtap is a wonderful tool for debugging, expecially for 
kernel debugging but it has not been designed for virtualization.
Ptrace provide a standard set of features and all the implementations of VMM must be 
in userland. Utrace provides the flexibility to split a VMM and move part of it to a 
kernel module.

Utrace provides a unified interface to kernel modules for tracing/virtualization.
kmview can be implemented as a client of utrace or by spreading code around the kernel and
like kmview other virtualizations based on ptrace could need to move some of their
logic to the kernel to speedup their execution.
These VMMs will use utrace based modules instead of kernel patches.

renzo

On Sat, Mar 21, 2009 at 01:49:09AM -0700, Andrew Morton wrote:
> I'd be interested in seeing a bit of discussion regarding the overall value
> of utrace - it has been quite a while since it floated past.
> 
> I assume that redoing ptrace to be a client of utrace _will_ happen, and
> that this is merely a cleanup exercise with no new user-visible features?
> 
> The "prototype utrace-ftrace interface" seems to be more a cool toy rather
> than a serious new kernel feature (yes?)
> 
> If so, what are the new killer utrace clients which would justify all these
> changes?
> 
> Also, is it still the case that RH are shipping utrace?  If so, for what
> reasons and what benefits are users seeing from it?
> 
> And I recall that there were real problems wiring up the Feb 2007 version
> of utrace to the ARM architecture.  Have those issues been resolved?  Are
> any problems expected for any architectures?


From mingo at elte.hu  Sat Mar 21 14:34:57 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 21 Mar 2009 15:34:57 +0100
Subject: [PATCH 2/3] utrace core
In-Reply-To: <20090321140822.GE18690@cs.unibo.it>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
	<20090321140822.GE18690@cs.unibo.it>
Message-ID: <20090321143457.GA24254@elte.hu>


* Renzo Davoli <renzo at cs.unibo.it> wrote:

> Tracing does not mean only debug. Some tracing facilities can be 
> used for virtualization. For example User-Mode Linux is based on 
> ptrace.
> 
> I have a prototype of kernel module for virtualization (kmview) 
> based on utrace. [...]

Hm, i cannot find the source code. Can it be downloaded from 
somewhere?

	Ingo


From mingo at elte.hu  Sat Mar 21 15:45:01 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 21 Mar 2009 16:45:01 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321050422.d1d99eec.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
Message-ID: <20090321154501.GA2707@elte.hu>


* Andrew Morton <akpm at linux-foundation.org> wrote:

> [...]  Let's fix systemtap?

Yes, it needs to be fixed.

The main issue i see is that no kernel developer i work with on a 
daily basis uses SystemTap - and i work with a lot of people. Yes, i 
could perhaps name two or three people from lkml using it, but its 
average penetration amongst kernel folks is essentially zero.

Was any critical analysis done why that penetration is so absymally 
low for a tool with such a promise and with years of availability, 
and what are the measures planned to address those problems?

To me personally there are two big direct usability issues with 
SystemTap:

 1) It relies on DEBUG_INFO for any reasonable level of utility.
    Yes, it will limp along otherwise as well, but most of the
    actual novel capabilities depend on debuginfo. Which is an
    acceptable constraint for enterprise usage where kernels are
    switched every few months and having a debuginfo package is not
    a big issue. Not acceptable for upstream kernel development. It 
    also puts way too trust into the compiler generating 1GB+ of 
    debuginfo correctly. I want to be able to rely on tools all the 
    time and thus i want tools to have some really simple and 
    predictable foundations.

 2) It's not upstream and folks using it seem to insist on not 
    having it upstream ;-) This 'distance' to upstream seems to have 
    grown during the past few years - instead of shrinking. As a 
    result it simply does not matter and there's no know-how and no 
    visibility of it upstream.

If these fundamental problems are addressed then i'd even argue for 
the totality of SystemTap to be aimed upstreamed (including the 
scripting language, etc.), because for something this fundamental 
there's just no good reason not to have a turn-key solution there.

Plus then there should be a (steadily growing) library of utility 
scripts in the kernel proper as well.

Anything less does not make much sense IMO. Having a separate tool 
will reduce efficiency, increases the latency of fixes and 
enhancements and creates ABI-like expectations - which are all 
counter-productive to good instrumentation.

These are the aspects of SystemTap that i have to say were never 
done right, and these are the aspects of SystemTap that need to 
change most. Putting utrace upstream now will just make it more 
convenient to have SystemTap as a separate entity - without any of 
the benefits. Do we want to do that? Maybe, but we could do better i 
think.

	Ingo


From renzo at cs.unibo.it  Sat Mar 21 16:37:00 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Sat, 21 Mar 2009 17:37:00 +0100
Subject: [PATCH 2/3] utrace core
In-Reply-To: <20090321143457.GA24254@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
	<20090321140822.GE18690@cs.unibo.it>
	<20090321143457.GA24254@elte.hu>
Message-ID: <20090321163700.GA22292@cs.unibo.it>

On Sat, Mar 21, 2009 at 03:34:57PM +0100, Ingo Molnar wrote:
> 
> * Renzo Davoli <renzo at cs.unibo.it> wrote:
> 
> > Tracing does not mean only debug. Some tracing facilities can be 
> > used for virtualization. For example User-Mode Linux is based on 
> > ptrace.
> > 
> > I have a prototype of kernel module for virtualization (kmview) 
> > based on utrace. [...]
> 
> Hm, i cannot find the source code. Can it be downloaded from 
> somewhere?
Sure! kmview is not included in our Debian packages yet as it relies on 
(still) non mainstream features (utrace), but the code is available on 
our view-os svn repository.

Check out:
svn co https://view-os.svn.sourceforge.net/svnroot/view-os view-os 

More specifically to browse the code/specifications:
The kmview device protocol is here:
http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications
The kernel module itself is here:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/
The VMM userland application share most of the code with
umview, the source code for both is here:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/xmview-os/xmview/

kmview kernel module (current version) needs the following patches:
utrace
http://www.mail-archive.com/utrace-devel at redhat.com/msg00654.html
http://www.mail-archive.com/utrace-devel at redhat.com/msg00655.html
I am trying to keep everything up to date, but the whole stuff is
evolving in a quite fast way.

Everything has been released under GPLv2.

	renzo


From mingo at elte.hu  Sat Mar 21 16:44:31 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 21 Mar 2009 17:44:31 +0100
Subject: [PATCH 2/3] utrace core
In-Reply-To: <20090321163700.GA22292@cs.unibo.it>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
	<20090321140822.GE18690@cs.unibo.it>
	<20090321143457.GA24254@elte.hu>
	<20090321163700.GA22292@cs.unibo.it>
Message-ID: <20090321164431.GK11183@elte.hu>


* Renzo Davoli <renzo at cs.unibo.it> wrote:

> On Sat, Mar 21, 2009 at 03:34:57PM +0100, Ingo Molnar wrote:
> > 
> > * Renzo Davoli <renzo at cs.unibo.it> wrote:
> > 
> > > Tracing does not mean only debug. Some tracing facilities can be 
> > > used for virtualization. For example User-Mode Linux is based on 
> > > ptrace.
> > > 
> > > I have a prototype of kernel module for virtualization (kmview) 
> > > based on utrace. [...]
> > 
> > Hm, i cannot find the source code. Can it be downloaded from 
> > somewhere?
>
> Sure! kmview is not included in our Debian packages yet as it 
> relies on (still) non mainstream features (utrace), but the code 
> is available on our view-os svn repository.
> 
> Check out:
> svn co https://view-os.svn.sourceforge.net/svnroot/view-os view-os 
> 
> More specifically to browse the code/specifications:
> The kmview device protocol is here:
> http://wiki.virtualsquare.org/index.php/KMview_module_interface_specifications
> The kernel module itself is here:
> http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/

Looks really interesting.

That's btw. what i see as the biggest value of utrace: it's a 
comprehesive, all-encompassing framework all around process state 
events and process state manipulation.

Utrace came from Frysk (generic debugger), but the fact that you 
were able to build a completely unanticipated usecase and 
virtualization module on top of it is a very good sign of a robust 
and complete design. I'm impressed.

	Ingo


From troma at villacaritas.edu.pe  Sat Mar 21 19:03:29 2009
From: troma at villacaritas.edu.pe (Tybalt)
Date: Sat, 21 Mar 2009 21:03:29 +0200
Subject: Are you all right?
Message-ID: <20090321210329.3040902@villacaritas.edu.pe>

Are you in the city now? http://liatyf.themostrateblog.com/save.php


From diegocg at gmail.com  Sat Mar 21 20:35:21 2009
From: diegocg at gmail.com (Diego Calleja)
Date: Sat, 21 Mar 2009 21:35:21 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321154501.GA2707@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
Message-ID: <200903212135.21457.diegocg@gmail.com>

On S?bado 21 Marzo 2009 16:45:01 Ingo Molnar escribi?:

> The main issue i see is that no kernel developer i work with on a 
> daily basis uses SystemTap - and i work with a lot of people. Yes, i 
> could perhaps name two or three people from lkml using it, but its 
> average penetration amongst kernel folks is essentially zero.

What about userspace developers? People always talks of systemtap as
a kernel thing, but my (humble) impression is that kernel hackers don't
seem to need it that much (maybe for the same reasons they didn't a
kernel debugger ;), but userspace developers do. There're many
userspace projects that offer optional compile options to enable dtrace
probes (some people like apple even ship executables of python, perl
and ruby with probes by default). There're several firefox hackers that
switched to dtrace-capable systems just because the dtrace-javascript
probes enabled them to debug javashit code in ways they weren't able
in linux or windows.

In my humble opinion a better development environment for linux
userspace programmers is way more important than whether kernel
hackers like systemtap or not. So maybe the discussion should be less
about "does it help kernel hackers?" and more about "does it help
userspace hackers?". My 2?...


From akpm at linux-foundation.org  Sat Mar 21 21:34:13 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Sat, 21 Mar 2009 14:34:13 -0700
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321154501.GA2707@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
Message-ID: <20090321143413.75ead1aa.akpm@linux-foundation.org>

On Sat, 21 Mar 2009 16:45:01 +0100 Ingo Molnar <mingo at elte.hu> wrote:

> 
> [...]
>

useful, thanks.

> Putting utrace upstream now will just make it more 
> convenient to have SystemTap as a separate entity - without any of 
> the benefits. Do we want to do that? Maybe, but we could do better i 
> think.

It would not be good to merge a large kernel feature which kernel
developers and testers cannot test, and regression test.

If testing utrace against its main application requires installation of a
complete enterprise distro from a distro which the particular developer
might not prefer to use then that's quite a problem.

So it is desirable for this reason (and, I suspect, for other reasons) that
systemtap (or a part thereof) be dragged out in some standalone form which
is usable by random mortals.

IOW: I agree.


From fche at redhat.com  Sat Mar 21 21:48:52 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sat, 21 Mar 2009 17:48:52 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321154501.GA2707@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
Message-ID: <20090321214852.GA5262@redhat.com>

Hi -

On Sat, Mar 21, 2009 at 04:45:01PM +0100, Ingo Molnar wrote:
> [...]
> To me personally there are two big direct usability issues with 
> SystemTap:
> 
>  1) It relies on DEBUG_INFO for any reasonable level of utility.
>     Yes, it will limp along otherwise as well, but most of the
>     actual novel capabilities depend on debuginfo. Which is an
>     acceptable constraint for enterprise usage where kernels are
>     switched every few months and having a debuginfo package is not
>     a big issue. Not acceptable for upstream kernel development. 

In my own limited kernel-building experience, I find the debuginfo
data conveniently and instantly available after every "make".  Can you
elaborate how is it harder for you to incidentally make it than for
someone to download it?


>     It also puts way too trust into the compiler generating 1GB+ of
>     debuginfo correctly. I want to be able to rely on tools all the
>     time and thus i want tools to have some really simple and
>     predictable foundations.

Well, the data has to come from *somewhere*.  We know several
shortcomings (and have staff working on gcc debuginfo improvements),
but there is little alternative.  If not from the compiler, where are
you going to get detailed type/structure layouts?  Stack slot to
variable mappings?  Statement-level PC addresses?  Unwind data?


>  2) It's not upstream and folks using it seem to insist on not 
>     having it upstream ;-) This 'distance' to upstream seems to have 
>     grown during the past few years - instead of shrinking. [...]

Considering our upstream-bound assistance with foundation technologies
like markers, tracepoints, kprobes, utrace, and several other bits,
this does not seem entirely fair.


> If these fundamental problems are addressed then i'd even argue for
> the totality of SystemTap to be aimed upstreamed (including the
> scripting language, etc.), [...]

If consensus on this were plausible, we could seriously discuss it.

But I don't buy the package-deal that utrace must not attempt merging
on its own merits, just because it makes systemtap (as it is today)
useful to more people.


- FChE


From fche at redhat.com  Sat Mar 21 21:51:45 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sat, 21 Mar 2009 17:51:45 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321143413.75ead1aa.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
Message-ID: <20090321215145.GB5262@redhat.com>

Hi -


On Sat, Mar 21, 2009 at 02:34:13PM -0700, Andrew Morton wrote:
> [...]
> It would not be good to merge a large kernel feature which kernel
> developers and testers cannot test, and regression test.

It does not.  Other kernel self-sufficient utrace clients are on their
way, and of course one was just (re)posted.

> If testing utrace against its main application requires installation
> of a complete enterprise distro from a distro [...]

This has *never* been a requirement.


- FChE


From torvalds at linux-foundation.org  Sat Mar 21 22:02:59 2009
From: torvalds at linux-foundation.org (Linus Torvalds)
Date: Sat, 21 Mar 2009 15:02:59 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321215145.GB5262@redhat.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
Message-ID: <alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>


On Sat, 21 Mar 2009, Frank Ch. Eigler wrote:
> 
> > If testing utrace against its main application requires installation
> > of a complete enterprise distro from a distro [...]
> 
> This has *never* been a requirement.

You guys are getting off a tangent.

Let's go back to the post that started this all.

> The thing is, utrace crashes in Fedora have dominated kerneloops.org 
> for many months, so i'm not sure what to make of the idea of posting 
> a 4000+ lines of core kernel code patchset on the last day of the 
> development cycle, a posting that has carefully avoided the Cc:-ing 
> of affected maintainers ;-)

.. and dammit, I agree 100%. If utrace really shows up in _any_ way on 
kerneloops.org, then I think THE ENTIRE DISCUSSION in this thread is moot. 

I'm not going to take known-bad crap. It's that simple. Don't bother 
posting it, don't bother discussing it, don't bother making excuses for 
it.

			Linus


From fche at redhat.com  Sat Mar 21 22:20:30 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sat, 21 Mar 2009 18:20:30 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
References: <20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
Message-ID: <20090321222030.GA5157@redhat.com>

Hi -

On Sat, Mar 21, 2009 at 03:02:59PM -0700, Linus Torvalds wrote:
> [...]
> > The thing is, utrace crashes in Fedora have dominated kerneloops.org 
> > for many months [...]
> 
> .. and dammit, I agree 100%. If utrace really shows up in _any_ way on 
> kerneloops.org, then I think THE ENTIRE DISCUSSION in this thread is moot. 

There was a short span of time during last fall, when Roland was on
vacation.  That bug (in 2.6.26.3) was fixed during the kernel summit.
So this is a six-month obsolete grievance.

- FChE


From adobriyan at gmail.com  Sat Mar 21 22:37:59 2009
From: adobriyan at gmail.com (Alexey Dobriyan)
Date: Sun, 22 Mar 2009 01:37:59 +0300
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321222030.GA5157@redhat.com>
References: <20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090321222030.GA5157@redhat.com>
Message-ID: <20090321223759.GA22770@x200.localdomain>

On Sat, Mar 21, 2009 at 06:20:30PM -0400, Frank Ch. Eigler wrote:
> On Sat, Mar 21, 2009 at 03:02:59PM -0700, Linus Torvalds wrote:
> > [...]
> > > The thing is, utrace crashes in Fedora have dominated kerneloops.org 
> > > for many months [...]
> > 
> > .. and dammit, I agree 100%. If utrace really shows up in _any_ way on 
> > kerneloops.org, then I think THE ENTIRE DISCUSSION in this thread is moot. 
> 
> There was a short span of time during last fall, when Roland was on
> vacation.  That bug (in 2.6.26.3) was fixed during the kernel summit.
> So this is a six-month obsolete grievance.

struct task_struct::utrace became embedded struct. This is good and
should remove quite a few of utrace bugs. Better late than never.

However, "rewrite-ptrace-via-utrace" patch was omitted, so almost noone
can easily see by how much situation improved.

I see this patch was dropped in Fedora.

Will ptrace(2) will be rewritten through utrace?


From fche at redhat.com  Sat Mar 21 23:38:39 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Sat, 21 Mar 2009 19:38:39 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321223759.GA22770@x200.localdomain>
References: <20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090321222030.GA5157@redhat.com>
	<20090321223759.GA22770@x200.localdomain>
Message-ID: <20090321233839.GB5157@redhat.com>

Hi -

On Sun, Mar 22, 2009 at 01:37:59AM +0300, Alexey Dobriyan wrote:
> [...]
> struct task_struct::utrace became embedded struct. This is good and
> should remove quite a few of utrace bugs. Better late than never.

Yeah.

> However, "rewrite-ptrace-via-utrace" patch was omitted, so almost
> noone can easily see by how much situation improved. [...]  Will
> ptrace(2) will be rewritten through utrace?

Yes, I believe that is Roland's intent.  I believe it was separated
from the current suite of patches for staging purposes, to merge the
most solid code up first.  The code is available from the utrace git
tree in the utrace-ptrace branch.

- FChE


From mingo at elte.hu  Sun Mar 22 10:25:34 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sun, 22 Mar 2009 11:25:34 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321233839.GB5157@redhat.com>
References: <20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090321222030.GA5157@redhat.com>
	<20090321223759.GA22770@x200.localdomain>
	<20090321233839.GB5157@redhat.com>
Message-ID: <20090322102534.GC19826@elte.hu>


* Frank Ch. Eigler <fche at redhat.com> wrote:

> Hi -
> 
> On Sun, Mar 22, 2009 at 01:37:59AM +0300, Alexey Dobriyan wrote:
> > [...]
> > struct task_struct::utrace became embedded struct. This is good and
> > should remove quite a few of utrace bugs. Better late than never.
> 
> Yeah.
> 
> > However, "rewrite-ptrace-via-utrace" patch was omitted, so 
> > almost noone can easily see by how much situation improved. 
> > [...]  Will ptrace(2) will be rewritten through utrace?
> 
> Yes, I believe that is Roland's intent.  I believe it was 
> separated from the current suite of patches for staging purposes, 
> to merge the most solid code up first.  The code is available from 
> the utrace git tree in the utrace-ptrace branch.

i think they should be submitted together.

Here's the histogram of utrace bugs on kerneloops.org:

  2.6.27.5          1 x
  2.6.27.15         1 x
  2.6.27.12         2 x
  2.6.27-rc4        2 x
  2.6.26.6          1 x
  2.6.26.5         43 x
  2.6.26.3       1102 x
  2.6.26.2          2 x
  2.6.26.1          3 x
  2.6.26            1 x
  2.6.25            3 x

That peak in 2.6.26.3 is what i referred to. The latest F10 kernel 
rpm is kernel-2.6.27.12-170.2.5.fc10, and it does include the 
utrace-ptrace engine as well:

  # grep UTRACE /boot/config-2.6.27.19-170.2.35.fc10.i686
  CONFIG_UTRACE=y
  CONFIG_UTRACE_PTRACE=y

So the bug i referred to was fixed and the bug count has gone down - 
but still we have the utrace core submission here without any 
(tested) mainline kernel usage of the core code.

My suggestion would be to:

 - submit the ptrace-on-utrace engine as well (with Oleg's signoff?)

 - perhaps also submit with a well-tested ftrace plugin that tries 
   to utilize _all_ aspects of utrace and ftrace (and hence gives 
   good and continuous burn-in testing via the ftrace bootup 
   self-tests, etc.)

ideally we want both, because:

 - tracing corner-case bugs tend to be found much faster than ptrace
   corner case bugs - partly because tracing is much more invasive
   when activated system-wide.

 - ptrace-over-utrace on the other hand utilizes utrace more deeply
   than passive tracing ever can. (for example UML does full,
   active virtualization via ptrace - this depth of functional
   utrace usage is not possible via a tracing plugin.)

And i think the ptrace-via-utrace engine is actually fully ready, 
just perhaps it was not submitted out of caution to keep the 
logistics simple.

So i do think we've still got a shot at merging it, in this merge 
window.

        Ingo


From mingo at elte.hu  Sun Mar 22 12:08:11 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sun, 22 Mar 2009 13:08:11 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321214852.GA5262@redhat.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com>
Message-ID: <20090322120811.GD19826@elte.hu>


* Frank Ch. Eigler <fche at redhat.com> wrote:

> Hi -
> 
> On Sat, Mar 21, 2009 at 04:45:01PM +0100, Ingo Molnar wrote:
> > [...]
> > To me personally there are two big direct usability issues with 
> > SystemTap:
> > 
> >  1) It relies on DEBUG_INFO for any reasonable level of utility.
> >     Yes, it will limp along otherwise as well, but most of the
> >     actual novel capabilities depend on debuginfo. Which is an
> >     acceptable constraint for enterprise usage where kernels are
> >     switched every few months and having a debuginfo package is not
> >     a big issue. Not acceptable for upstream kernel development. 
> 
> In my own limited kernel-building experience, I find the debuginfo 
> data conveniently and instantly available after every "make".  Can 
> you elaborate how is it harder for you to incidentally make it 
> than for someone to download it?

Four reasons:

1)

I have CONFIG_DEBUG_INFO turned off in 99.9% of my kernel builds, 
because it slows down the kernel build times significantly:

  without:   4343.31 user 416.39 system 6:09.97 elapsed 1286%CPU 
  with:      4871.07 user 501.90 system 7:43.22 elapsed 1159 %CPU 

( x86 allyesconfig. On an obscenely overpowered Nehalem box
  with 12 GB of RAM. )

2)

When the kernel build becomes IO-bound, for example when i build 
over a distcc cluster (which is how i generally build my kernels) - 
or when others with less RAM build a debuginfo kernel, the ratio 
becomes even worse:

  without:   870.36 user 292.79 system 3:32.10 elapsed  548% CPU
  with:      929.65 user 384.55 system 8:28.70 elapsed  258% CPU

3)

Another metric. Here's an x86 defconfig (i.e. fairly regular config 
- not allyesconfig) build's size:

  with:     1645 MB
  without:   211 MB

Try to build 1.6 GB of dirty data on ext3 and run into an fsync() in 
your editor ... you'll sit there twiddling thumbs for a minute or 
more.

4)

Or yet another metric - Linux distro package overhead. Try 
installing a debuginfo package:

 # yum install kernel-debuginfo

 ==========================================
  Package                  Arch    Version
 ==========================================
 Installing:
  kernel-debuginfo         x86_64  2.6.29-0.258.rc8.git2.fc11   
 rawhide-debuginfo  294 M
 Installing for dependencies:
  kernel-debuginfo-common  x86_64  2.6.29-0.258.rc8.git2.fc11   
 rawhide-debuginfo   35 M

 Total download size: 329 M

That size of a _compressed_ debuginfo kernel package is obscene. We 
can fit 4 years of full Linux kernel Git history into that size - 
60,000+ commits, full metadata and full 20 million lines of code 
flux included!

Uncompressed it blows up to gigabytes of on-disk data.

And this download has to be repeated for _every_ minor kernel 
upgrade.

So when i come into a situation where i could use some debugging 
help ... i'd have to rebuild the kernel with DEBUG_INFO=y and i'll 
always notice when i have a debuginfo kernel because it's 
inconvenient.

The solution?)

Dunno - but i definitely think we should think bigger:

The fundamental disconnect i believe seems to come from the fact 
that most user-space projects are relatively small, so debuginfo 
bloat is a secondary issue there.

But for a project with the size of the kernel, even for moderate 
builds (not allyesconfig), it's a _much_ bigger deal. This has been 
known for a long time and the situation has become worse over the 
last two years, not better. (last time i checked the debuginfo 
package overhead it was below 150 MB)

A few random ideas:

Instead of trying to cache 2+GB of debuginfo for a 50 MB kernel 
source repo (+50 MB of genuine .o output) - just to be able to debug 
one or two source files [which is the typical scope of a debugging 
session], why not build debuginfo on the fly, when a debugging 
session requires it? Rarely do we need debuginfo for more than a 
fraction of the whole kernel.

( Yes, it needs a few smarts like knowing the SHA1 of the source
  code module that a particular kernel portion got built with, to 
  make sure the debuginfo is fresh and relevant - but nothing major. )

I mean, lets _use_ the fact that we have source code available, more 
intelligently. It takes zero time to build detailed debuginfo for a 
portion of a tree.

If 'download debuginfo' can be replaced with: 'have a recent Git 
repository of the distro kernel source', we'll have a _much_ more 
efficient use of resources all around.

	Ingo


From mingo at elte.hu  Sun Mar 22 12:17:48 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sun, 22 Mar 2009 13:17:48 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <200903212135.21457.diegocg@gmail.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<200903212135.21457.diegocg@gmail.com>
Message-ID: <20090322121748.GE19826@elte.hu>


* Diego Calleja <diegocg at gmail.com> wrote:

> On S?bado 21 Marzo 2009 16:45:01 Ingo Molnar escribi?:
> 
> > The main issue i see is that no kernel developer i work with on a 
> > daily basis uses SystemTap - and i work with a lot of people. Yes, i 
> > could perhaps name two or three people from lkml using it, but its 
> > average penetration amongst kernel folks is essentially zero.
> 
> What about userspace developers? People always talks of systemtap 
> as a kernel thing, but my (humble) impression is that kernel 
> hackers don't seem to need it that much (maybe for the same 
> reasons they didn't a kernel debugger ;), but userspace developers 
> do. There're many userspace projects that offer optional compile 
> options to enable dtrace probes (some people like apple even ship 
> executables of python, perl and ruby with probes by default). 
> There're several firefox hackers that switched to dtrace-capable 
> systems just because the dtrace-javascript probes enabled them to 
> debug javashit code in ways they weren't able in linux or windows.
> 
> In my humble opinion a better development environment for linux 
> userspace programmers is way more important than whether kernel 
> hackers like systemtap or not. So maybe the discussion should be 
> less about "does it help kernel hackers?" and more about "does it 
> help userspace hackers?". My 2?...

Well, i consider kernel development to be just another form of 
software development, so i dont subscribe to the view that it is 
intrinsically different. (Yes, the kernel has many unique aspects - 
but most software projects have unique aspects.)

In terms of development methodology and tools, in fact i claim that 
the kernel workflow and style of development can be applied to most 
user-space software projects with great success.

So ... if a new development tool is apparently not (yet?) suited to 
a very large and sanely developed software project like the Linux 
kernel, i dont take that as an encouraging sign.

Also, there's practical aspects: the kernel is what we know best so 
if we can make it work well for the kernel, hopes are that other 
large projects can use it too. If we _only_ make the tool good for 
non-kernel purposes, who else will fix it for the kernel? The 
icentive to fix it for the kernel will be significantly lower.

	Ingo


From mingo at elte.hu  Sun Mar 22 12:37:49 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sun, 22 Mar 2009 13:37:49 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
References: <20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
Message-ID: <20090322123749.GF19826@elte.hu>


* Linus Torvalds <torvalds at linux-foundation.org> wrote:

> On Sat, 21 Mar 2009, Frank Ch. Eigler wrote:
> > 
> > > If testing utrace against its main application requires installation
> > > of a complete enterprise distro from a distro [...]
> > 
> > This has *never* been a requirement.
> 
> You guys are getting off a tangent.
> 
> Let's go back to the post that started this all.
> 
> > The thing is, utrace crashes in Fedora have dominated kerneloops.org 
> > for many months, so i'm not sure what to make of the idea of posting 
> > a 4000+ lines of core kernel code patchset on the last day of the 
> > development cycle, a posting that has carefully avoided the Cc:-ing 
> > of affected maintainers ;-)
> 
> .. and dammit, I agree 100%. If utrace really shows up in _any_ 
> way on kerneloops.org, then I think THE ENTIRE DISCUSSION in this 
> thread is moot.
> 
> I'm not going to take known-bad crap. It's that simple. Don't 
> bother posting it, don't bother discussing it, don't bother making 
> excuses for it.

The kerneloops stats on utrace crashes are way down currently,
after that peak last fall. So i didnt want to suggest that it's 
known-broken now - i only wanted to point out that it's a 
known-risky area and that the submission of it should involve
the affected maintainers/developers.

Regarding current stability, Roland, Frank, is the utrace patch in 
latest (today's) Fedora rawhide:

 -rw-r--r-- 1 root root 176555 2009-01-08 05:42 linux-2.6-utrace.patch

a bug fixed equivalent of the utrace bits that crashed in the 
2.6.26.3 kernel? In that case it is certainly known-good.

Or is it a slimmed-down version?

The ptrace bits and signoffs from Oleg and Alexey would certainly 
help (me) in trusting it. (I've Cc:-ed Oleg and Alexey)

The ftrace bits could certainly be staged to go in via the tracing 
tree (in .31 or so) after the utrace-core+ptrace bits went upstream.

	Ingo


From mingo at elte.hu  Sun Mar 22 12:53:20 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Sun, 22 Mar 2009 13:53:20 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090322120811.GD19826@elte.hu>
References: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com>
	<20090322120811.GD19826@elte.hu>
Message-ID: <20090322125320.GA14171@elte.hu>


* Ingo Molnar <mingo at elte.hu> wrote:

>  Total download size: 329 M
> 
> That size of a _compressed_ debuginfo kernel package is obscene. 
> We can fit 4 years of full Linux kernel Git history into that size 
> - 60,000+ commits, full metadata and full 20 million lines of code 
> flux included!
> 
> Uncompressed it blows up to gigabytes of on-disk data.
> 
> And this download has to be repeated for _every_ minor kernel 
> upgrade.

Have to correct my memories about how many commits the kernel repo 
has: 132,019 commits. That massive history fits into 298 MB. (!)

	Ingo


From mike.gordon at primus.ca  Sun Mar 22 18:46:02 2009
From: mike.gordon at primus.ca (mike gordon)
Date: Sun, 22 Mar 2009 13:46:02 -0500
Subject: Microsoft Customer Lists
Message-ID: <200903221847.n2MIlLiW031509@mx3.redhat.com>

We are pleased to announce the availability of the following Microsoft customer lists:

Sharepoint
Dynamics
SQL
Exchange
Biztalk
FRX
CRM
System Center
Visual Studio
VAR

If you would like more information or a sample off any of our lists, please contact us at (905) 721-8456 or email us at repharm1 at aol.com, Also we have the following lists as well


Below are just some of the lists available:
ERP (ENTERPRISE RESOURCE PLANNING):
Baan
JD Edwards
Lawson
Made2Manage
Mapics
Marcam
Oracle
Peoplesoft
SAP
SSA
 
E-BUSINESS APPLICATIONS:
Ariba
BMC
BroadVision
Commerce One
Webtrends
 
MIDDLEWARE/CONNECTIVITY/APP SERVERS/WEB SERVERS:
Bea Systems
Iona
Unisys
 
OPERATING SYSTEMS/HARDWARE/SOFTWARE:
COMPAQ
HP 3000
HP 9000
HP-UX
IBM AS/400
IBM OS/390
Lotus Notes
Microsoft
Sun Microsystems
DATABASE:
DB2
FileMaker
Informix
Oracle
SQL
SybaseCRM (CUSTOMER RELATIONSHIP MANAGEMENT):
Clarify
E.piphany
HNC
Onyx
Pivotal
Siebel
Vantive
Xchange
 
SUPPLY CHAIN:
Agile
i2 Technologies
Manugistics
QAD
Webplan
COMMUNICATIONS:
Nortel
Cisco
3com
Siemens
Alcatel
Telecom Vars
ASP?s
CLECS
ISP?s
 
E-COMMERCE:
Dot Com Directory
Consultant Directory
Software Directory
 
EXECUTIVE DIRECTORIES:
Chief Executive Officer
Chief Financial Officer
Chief Information Officer
Engineering
Human Resources
Purchasing
Sales/Marketing
 
INDUSTRY SPECIFIC LISTS:
Agriculture, Forestry and Fishing, Communications, Construction,
Finance, Insurance and Real Estate, Manufacturing, Mining, Public Administration,
Retail Trade, Services, Transportation,
Utilities, Wholesale Trade
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090322/28857fdd/attachment.htm>

From smothered at takstsenter.com  Sun Mar 22 19:08:16 2009
From: smothered at takstsenter.com (Rudder Clozza)
Date: Sun, 22 Mar 2009 19:08:16 +0000
Subject: Staying manhood is  a capital
Message-ID: <49C68BFF.8920771@takstsenter.com>

Stick your  tool for hours

<http://cid-4b951a44540f59af.spaces.live.com/blog/cns!4B951A44540F59AF!104.entry>


Before. He had written to them of his intended there it would
suit her exactly and it was her one fellow, dressed gaudily
in expensive silks of revenge, cut off, with a couple of
broadheaded impression that a flying spark from the dying.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090322/57a991c0/attachment.htm>

From roland at redhat.com  Mon Mar 23 04:34:48 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 22 Mar 2009 21:34:48 -0700 (PDT)
Subject: [PATCH 2/3] utrace core
In-Reply-To: Ingo Molnar's message of  Saturday,
	21 March 2009 17:44:31 +0100 <20090321164431.GK11183@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
	<20090321140822.GE18690@cs.unibo.it>
	<20090321143457.GA24254@elte.hu>
	<20090321163700.GA22292@cs.unibo.it>
	<20090321164431.GK11183@elte.hu>
Message-ID: <20090323043448.B07D1FC3AB@magilla.sf.frob.com>

> That's btw. what i see as the biggest value of utrace: it's a 
> comprehesive, all-encompassing framework all around process state 
> events and process state manipulation.

Me too!  

And while we're on the btw's, I want to let everyone know that Ingo is the
one who came up with the name "utrace".  I had only completely dismal ideas
for names, and nothing but the philosophy, "For the love of God, anything
but [a-z]trace!"  So that's one tiny piece of the whole mess that you can't
blame on me.  (Yes, I do believe I would be killed if we changed it again now.)
;-) ;-) ;-)

> Utrace came from Frysk (generic debugger), but the fact that you 
> were able to build a completely unanticipated usecase and 
> virtualization module on top of it is a very good sign of a robust 
> and complete design. I'm impressed.

Um, thanks, I guess.  The antecedents of your statement are not really
accurate, but I'll take the consequent as a compliment! :-)

In fact, utrace came from my experience of maintaining the old ptrace code.
Nor was this particular use "completely unanticipated".  

I was not aware of Renzo or his work before he got in touch about making
use of utrace.  But my imagined list of vaporware always included
"specialized engines for UML or other syscall-interception type things".
(e.g. seccomp is trivial with no need for per-arch asm work.)  I swear,
a third of the people who ever came to me complaining about ptrace being
so hard to work with were doing things that to me are all "syscall
interception and/or tracking", whether for some security-minded purpose
or something more virtualization-like.  Surely for many of those cases,
it was really the wrong way to solve the problem they were tackling.
Seems it's just the next stop after someone talks you out of LD_PRELOAD.
But who am I to say?  It was quite clear that people really wanted
easier ways to experiment with doing this sort of thing.

That said, I certainly have always hoped for completely unanticipated
uses.  (I will readily admit to succumbing to "Build it and they will
come" mentality.  I'm sure flames about my deep character flaws, moral
turpitude, and dubious lineage will follow.  The history of my career
will show that I was not striving for the appearance of cogent planning.)

I hatched the essential design of utrace when I'd recently spent a whole
lot of time fixing the innards of ptrace and a whole lot of time helping
userland implementors of debuggers and the like figure out how to work
with ptrace (and hearing their complaints about it).  At the same time,
the group I'm in (still) was contemplating both the implementation
issues of a generic debugger, how to make it tractable to work up to far
smarter debuggers, and also the design of what became systemtap.

It was clear to me that this whole space of problems and potential
features would be an open-ended area where different approaches would
need to be hashed out, and that there would not be one "ptrace killer"
feature that would be the right fit for all uses.  It has long been
clear that the threshold of effort was far too high for people to
experiment and innovate in this area.  Hence the plan to make a new
platform that lowered that threshold at least closer to "pretty easy"
from "intractable", staying about as simple as what both brings that
threshold down enough and lets unrelated developments in these things
coexist well on the system.


Thanks,
Roland


From roland at redhat.com  Mon Mar 23 04:35:20 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 22 Mar 2009 21:35:20 -0700 (PDT)
Subject: [PATCH 2/3] utrace core
In-Reply-To: Andrew Morton's message of  Saturday, 21 March 2009 01:49:09 -0700
	<20090321014909.6b654f55.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
Message-ID: <20090323043520.0B447FC3AB@magilla.sf.frob.com>

> I'd be interested in seeing a bit of discussion regarding the overall value
> of utrace - it has been quite a while since it floated past.

Me too!

> I assume that redoing ptrace to be a client of utrace _will_ happen, and
> that this is merely a cleanup exercise with no new user-visible features?

Yes.  It's my expectation that Oleg and I will do that clean-up in
several small stages, in the not-too-distant future.  I think more of
that work has to do with making the ptrace data structures clean and
sane than with utrace details.

> The "prototype utrace-ftrace interface" seems to be more a cool toy rather
> than a serious new kernel feature (yes?)

I don't personally have any experience with either Frank's utrace-ftrace
widget or with using any ftrace-based things to debug user programs.
I would guess it is more of a demonstration than a tool people will be
using in the long run.

> If so, what are the new killer utrace clients which would justify all these
> changes?

I hope I can leave those examples to the people who will write them.
utrace will be a failure if it only serves to underlie the things I want
to implement or can think up.  My intent is to open up this area of new
feature generation to the people who are killer hackers, but have been
daunted or turned off by the prospect of becoming killer ptrace innards
hackers.  (Only Oleg seems to have taken to that opportunity, or perhaps
he expected to wind up doing it as little as I did.)

> Also, is it still the case that RH are shipping utrace?  If so, for what
> reasons and what benefits are users seeing from it?

Fedora Rawhide has this current code, yes.  The people trying to
develop new features using utrace certainly like having it there.
(There really truly are people who like to build novel kernel modules
without compiling their own kernels from scratch.)  I won't try to
speak for them or their users.

> And I recall that there were real problems wiring up the Feb 2007 version
> of utrace to the ARM architecture.  Have those issues been resolved?  Are
> any problems expected for any architectures?

That was a misimpression.  There were never real problems for ARM,
only misunderstandings.  I'm sure the way I tried to stage the changes
at that time contributed to those misunderstandings arising as they
did.  Since then, all the arch requirements have been distilled into
the HAVE_ARCH_TRACEHOOK set that is already merged for several
architectures.  It is in the hands of each arch maintainer to update
their code to meet the HAVE_ARCH_TRACEHOOK requirements (I'm glad to
give advice when asked), and there is no porting work that is actually
specific to utrace itself.  (You just can't turn it on without
HAVE_ARCH_TRACEHOOK.)  Of course it is never all that unlikely that
some bits of the generic code will get some new tweaks brought to
light by making it work with a particular arch.  To my knowledge, the
strangest arch for cleaning up any of this stuff has always been ia64,
and sparc second; those arch maintainers have already done the
HAVE_ARCH_TRACEHOOK work.


Thanks,
Roland


From roland at redhat.com  Mon Mar 23 05:09:26 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 22 Mar 2009 22:09:26 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: Andrew Morton's message of  Saturday, 21 March 2009 05:04:22 -0700
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
Message-ID: <20090323050926.1ED1EFC3AB@magilla.sf.frob.com>

> Well I dunno.  You guys are closer to this than I am, but I'd have thought
> that systemtap is the main game here, and most/all of the above is just
> fluff.

That is certainly not true for me.  It is true that I hear plenty from
systemtap developers, users, and boosters wanting utrace to be merged.
But all that "fluff" you dismiss out of hand is what I would really like
to see become reality.  Pretty much the only people who ever tell me
they would hack on those things are the ones who say, "I'm looking
forward to utrace getting merged in so I can try to write something."

> eh.  Boring.  [...]

Since it's boring to you, it must be so boring to everyone that they
have no interest in a platform they can use to do exciting things with.
Great.  Silly me trying to enable collaboration to produce things less
boring than I'm capable of myself.  Clearly there is no need for any
such thing.  Sorry I'm so out of touch, but I just thought it was cool.


Thanks,
Roland


From roland at redhat.com  Mon Mar 23 05:20:50 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 22 Mar 2009 22:20:50 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: Frank Ch. Eigler's message of  Saturday,
	21 March 2009 19:38:39 -0400 <20090321233839.GB5157@redhat.com>
References: <20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090321222030.GA5157@redhat.com>
	<20090321223759.GA22770@x200.localdomain>
	<20090321233839.GB5157@redhat.com>
Message-ID: <20090323052050.D3F61FC3AB@magilla.sf.frob.com>

> Yes, I believe that is Roland's intent.  I believe it was separated
> from the current suite of patches for staging purposes, to merge the
> most solid code up first.  The code is available from the utrace git
> tree in the utrace-ptrace branch.

More than just "staging".  The utrace-ptrace code there today is really not
very nice to look at, and it's not ready for prime time.  As has been
mentioned, it is a "pure clean-up exercise".  As such, it's not the top
priority.  It also didn't seem to me like much of an argument for merging
utrace: "Look, more code and now it still does the same thing!"

Instead, I figured we should merge utrace in a way that lets everybody beat
on it for new features and hash out details, without immediate risk of
regressions in ptrace (which are severely annoying to everyone when they
happen).  The proper clean-ups for ptrace can proceed in parallel with work
using utrace for things that are actually new and interesting, and at least
the first half of that clean-up work is orthogonal to utrace.

This seems like the normal way that new optional CONFIG_FOOBAR features
(marked EXPERIMENTAL, even) are handled.  We don't jump over ourselves to
make existing crucial code paths subject to a new subsystem that is getting
its first rounds of shake-out.


Thanks,
Roland


From roland at redhat.com  Mon Mar 23 04:49:40 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 22 Mar 2009 21:49:40 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: Ingo Molnar's message of  Saturday,
	21 March 2009 10:12:35 +0100 <20090321091235.GA29678@elte.hu>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
Message-ID: <20090323044940.870ECFC3AB@magilla.sf.frob.com>

> kernel/utrace.c should probably be introduced as 
> kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending 
> work in the tracing tree and cooperation would be nice and desired.

Of course I would like to cooperate with everyone.  And of course it does
not really matter much where a source file lives.  But IMHO utrace really
does not fit in with the kernel/trace/ code much at all.  Sure, its hooks
can be used by tracer implementations that use CONFIG_TRACING stuff.  But
it is a general API about user thread state.  It belongs in kernel/trace/
"naturally" far less than, say, kprobes.  utrace will in future be used to
implement userland features (ptrace et al) that are just aspects of the
basics of what an operating system does: mediate userland for userland.
Those uses will have nothing to do with "kernel tracing".


Thanks,
Roland


From roland at redhat.com  Mon Mar 23 05:33:23 2009
From: roland at redhat.com (Roland McGrath)
Date: Sun, 22 Mar 2009 22:33:23 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: Ingo Molnar's message of  Sunday,
	22 March 2009 11:25:34 +0100 <20090322102534.GC19826@elte.hu>
References: <20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090321222030.GA5157@redhat.com>
	<20090321223759.GA22770@x200.localdomain>
	<20090321233839.GB5157@redhat.com> <20090322102534.GC19826@elte.hu>
Message-ID: <20090323053323.2F28DFC3AB@magilla.sf.frob.com>

> And i think the ptrace-via-utrace engine is actually fully ready, 
> just perhaps it was not submitted out of caution to keep the 
> logistics simple.

That's not so.  There is a clumsy prototype version.  Much of the work to
do it properly is really just plain ptrace clean-up and not specifically
about using utrace.  Oleg and I are ready to work on it as soon as our time
is not monopolized by trying to get the core utrace code to be accepted.

This ptrace work really buys nothing with immediate pay-off at all.  It's a
real shame if its lack keeps people from actually looking at utrace itself.
(This has been a long conversation so far with zero discussion of the code.)
A collaboration with focus on what new things can be built, rather than on
reasons not to let the foundations be poured, would be a lovely thing.


Thanks,
Roland


From mingo at elte.hu  Mon Mar 23 06:34:56 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Mon, 23 Mar 2009 07:34:56 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090323044940.870ECFC3AB@magilla.sf.frob.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090323044940.870ECFC3AB@magilla.sf.frob.com>
Message-ID: <20090323063456.GA7752@elte.hu>


* Roland McGrath <roland at redhat.com> wrote:

> > kernel/utrace.c should probably be introduced as 
> > kernel/trace/utrace.c not kernel/utrace.c. It also overlaps pending 
> > work in the tracing tree and cooperation would be nice and desired.
> 
> Of course I would like to cooperate with everyone.  And of course 
> it does not really matter much where a source file lives.  But 
> IMHO utrace really does not fit in with the kernel/trace/ code 
> much at all.  Sure, its hooks can be used by tracer 
> implementations that use CONFIG_TRACING stuff.  But it is a 
> general API about user thread state.  It belongs in kernel/trace/ 
> "naturally" far less than, say, kprobes.  utrace will in future be 
> used to implement userland features (ptrace et al) that are just 
> aspects of the basics of what an operating system does: mediate 
> userland for userland. Those uses will have nothing to do with 
> "kernel tracing".

But it is fitting if you think of kernel/trace/ as 
kernel/instrumentation/.

The virtualization-alike uses for utrace are in essence using system 
call instrumentation callbacks to inject extra functionality into 
the system. That's possible not because it's primarily geared at 
doing that, but because the instrumentation callbacks are generic 
and complete enough. It's still correct to think of it as an 
instrumentation tool and to maintain it as such. That also makes it 
clear that none of these APIs are to be regarded permanent ABIs.

Anyway ... placement is no big deal, and kernel/utrace.c is 
certainly a good way of avoiding the tracing tree ;-)

	Ingo


From casadocampo at netcabo.pt  Mon Mar 23 08:18:49 2009
From: casadocampo at netcabo.pt (=?iso-8859-1?Q?Casa=20do=20Campo?=)
Date: Mon, 23 Mar 2009 04:18:49 -0400
Subject: Linda Quinta
Message-ID: <20090323081846.ADF0AC2F.5FFC1CF0@127.0.0.1>

MAIL ERROR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090323/28ab2835/attachment.htm>

From dvlasenk at redhat.com  Mon Mar 23 09:25:04 2009
From: dvlasenk at redhat.com (Denys Vlasenko)
Date: Mon, 23 Mar 2009 10:25:04 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
Message-ID: <1237800304.3716.3.camel@localhost>

On Fri, 2009-03-20 at 18:42 -0700, Roland McGrath wrote:
> From: Frank Ch. Eigler <fche at redhat.com>
> Here's the /debugfs/tracing/process_trace_README:
> 
> process event tracer mini-HOWTO
> 
> 1. Select process hierarchy to monitor.  Other processes will be
> completely unaffected.  Leave at 0 for system-wide tracing.
> %  echo NNN > process_follow_pid
> 
> 2. Determine which process event traces are potentially desired.
> syscall and signal tracing slow down monitored processes.
> %  echo 0 > process_trace_{syscalls,signals,lifecycle}
> 
> 3. Add any final uid- or taskcomm-based filtering.  Non-matching
> processes will skip trace messages, but will still be slowed.
> %  echo NNN > process_trace_uid_filter # -1: unrestricted
> %  echo ls > process_trace_taskcomm_filter # empty: unrestricted
> 
> 4. Start tracing.
> %  echo process > current_tracer
> 
> 5. Examine trace.
> %  cat trace
> 
> 6. Stop tracing.
> %  echo nop > current_tracer
> 
> Signed-off-by: Frank Ch. Eigler <fche at redhat.com>

...

> +static char README_text[] =
> +	"process event tracer mini-HOWTO\n"
> +	"\n"
> +	"1. Select process hierarchy to monitor.  Other processes will be\n"
> +	"   completely unaffected.  Leave at 0 for system-wide tracing.\n"
> +	"#  echo NNN > process_follow_pid\n"
> +	"\n"
> +	"2. Determine which process event traces are potentially desired.\n"
> +	"   syscall and signal tracing slow down monitored processes.\n"
> +	"#  echo 0 > process_trace_{syscalls,signals,lifecycle}\n"
> +	"\n"
> +	"3. Add any final uid- or taskcomm-based filtering.  Non-matching\n"
> +	"   processes will skip trace messages, but will still be slowed.\n"
> +	"#  echo NNN > process_trace_uid_filter # -1: unrestricted \n"
> +	"#  echo ls > process_trace_taskcomm_filter # empty: unrestricted\n"
> +	"\n"
> +	"4. Start tracing.\n"
> +	"#  echo process > current_tracer\n"
> +	"\n"
> +	"5. Examine trace.\n"
> +	"#  cat trace\n"
> +	"\n"
> +	"6. Stop tracing.\n"
> +	"#  echo nop > current_tracer\n"
> +	;

A HOWTO text in the kernel binary? Shouldn't it be in Documentation/*
instead? But then, I am a well known miniaturization freak...
--
vda


From will.newton at gmail.com  Mon Mar 23 10:57:11 2009
From: will.newton at gmail.com (Will Newton)
Date: Mon, 23 Mar 2009 10:57:11 +0000
Subject: [PATCH 2/3] utrace core
In-Reply-To: <20090321014909.6b654f55.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014140.AA4F5FC3AB@magilla.sf.frob.com>
	<20090321014909.6b654f55.akpm@linux-foundation.org>
Message-ID: <87a5b0800903230357n3eedaac1u6c70c22fedea5ffc@mail.gmail.com>

On Sat, Mar 21, 2009 at 8:49 AM, Andrew Morton
<akpm at linux-foundation.org> wrote:
> On Fri, 20 Mar 2009 18:41:40 -0700 (PDT) Roland McGrath <roland at redhat.com> wrote:
>
>> This adds the utrace facility, a new modular interface in the kernel for
>> implementing user thread tracing and debugging. ?This fits on top of the
>> tracehook_* layer, so the new code is well-isolated.
>>
>> The new interface is in <linux/utrace.h> and the DocBook utrace book
>> describes it. ?It allows for multiple separate tracing engines to work in
>> parallel without interfering with each other. ?Higher-level tracing
>> facilities can be implemented as loadable kernel modules using this layer.
>>
>> The new facility is made optional under CONFIG_UTRACE.
>> When this is not enabled, no new code is added.
>> It can only be enabled on machines that have all the
>> prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK.
>>
>> In this initial version, utrace and ptrace do not play together at all.
>> If ptrace is attached to a thread, the attach calls in the utrace kernel
>> API return -EBUSY. ?If utrace is attached to a thread, the PTRACE_ATTACH
>> or PTRACE_TRACEME request will return EBUSY to userland. ?The old ptrace
>> code is otherwise unchanged and nothing using ptrace should be affected
>> by this patch as long as utrace is not used at the same time. ?In the
>> future we can clean up the ptrace implementation and rework it to use
>> the utrace API.
>
> I'd be interested in seeing a bit of discussion regarding the overall value
> of utrace - it has been quite a while since it floated past.
>
> I assume that redoing ptrace to be a client of utrace _will_ happen, and
> that this is merely a cleanup exercise with no new user-visible features?
>
> The "prototype utrace-ftrace interface" seems to be more a cool toy rather
> than a serious new kernel feature (yes?)
>
> If so, what are the new killer utrace clients which would justify all these
> changes?

It looks like utrace could provide a nice way to do low latency
tracing of userspace processes via a hardware interface (e.g. JTAG or
custom trace hardware). The only way to do that at present is to
scatter bits of instrumentation throughout the kernel.

I would like to see utrace merged so I can work on that type of feature.


From adobriyan at gmail.com  Mon Mar 23 13:48:13 2009
From: adobriyan at gmail.com (Alexey Dobriyan)
Date: Mon, 23 Mar 2009 16:48:13 +0300
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090322123749.GF19826@elte.hu>
References: <20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
Message-ID: <20090323134813.GA18219@x200.localdomain>

On Sun, Mar 22, 2009 at 01:37:49PM +0100, Ingo Molnar wrote:
> The ptrace bits and signoffs from Oleg and Alexey would certainly 
> help (me) in trusting it. (I've Cc:-ed Oleg and Alexey)

The further utrace stays away from mainline, the better.
That's from my experience with this code.

But let's see how ptrace(2) rewrite will look like because this is
the biggest thing that matters. All those cool virtual machines,
fancy tracers and what not aren't even comparable.

Right now with ptrace(2) rewrite the following is easily triggerable:

WARNING: at kernel/ptrace.c:515 ptrace_report_signal+0x2c1/0x2d0()
Pid: 4814, comm: exe Not tainted 2.6.29-rc8-utrace #1
Call Trace:
 [<c0126df1>] warn_slowpath+0x81/0xa0
 [<c014c359>] ? validate_chain+0xe9/0x1150
 [<c014d606>] ? __lock_acquire+0x246/0xa50
 [<c0232959>] ? __delay+0x9/0x10
 [<c014b8eb>] ? mark_held_locks+0x6b/0x80
 [<c03d3dd2>] ? _spin_unlock_irq+0x22/0x50
 [<c012fdd1>] ptrace_report_signal+0x2c1/0x2d0
 [<c012fb10>] ? ptrace_report_signal+0x0/0x2d0
 [<c0154a79>] utrace_get_signal+0x1c9/0x660
 [<c0135478>] get_signal_to_deliver+0x288/0x330
 [<c01029e9>] do_notify_resume+0xb9/0x890
 [<c017edd2>] ? cache_free_debugcheck+0x232/0x2f0
 [<c014957b>] ? trace_hardirqs_off+0xb/0x10
 [<c03d3d79>] ? _spin_unlock_irqrestore+0x39/0x70
 [<c01015a0>] ? sys_execve+0x40/0x60
 [<c017f139>] ? kmem_cache_free+0x89/0xc0
 [<c014baad>] ? trace_hardirqs_on_caller+0xfd/0x190
 [<c014bb4b>] ? trace_hardirqs_on+0xb/0x10
 [<c010340a>] work_notifysig+0x13/0x19

It looks like WARN_ON is just bogus, but who knows.


From aoredor.aoredor at sapo.pt  Mon Mar 23 14:14:39 2009
From: aoredor.aoredor at sapo.pt (aoredor.aoredor at sapo.pt)
Date: Mon, 23 Mar 2009 10:14:39 -0400
Subject: =?iso-8859-1?q?Novidade!_SABER_COMANDAR_=28V=EDdeo+Book=29=2E_In?=
	=?iso-8859-1?q?strumento_de_mudan=E7a=2E?=
Message-ID: <20090323141434.672D644F.E1C3590@192.168.1.100>

MAIL ERROR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090323/b52d5625/attachment.htm>

From fche at redhat.com  Mon Mar 23 14:31:43 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Mon, 23 Mar 2009 10:31:43 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <1237800304.3716.3.camel@localhost> (Denys Vlasenko's message of
	"Mon, 23 Mar 2009 10:25:04 +0100")
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<1237800304.3716.3.camel@localhost>
Message-ID: <y0m4oxk71kg.fsf@ton.toronto.redhat.com>

Denys Vlasenko <dvlasenk at redhat.com> writes:

> [...]
>> Here's the /debugfs/tracing/process_trace_README:
>> process event tracer mini-HOWTO [...]
>
> A HOWTO text in the kernel binary? Shouldn't it be in
> Documentation/* instead? [...]

It parallels the debugfs/tracing/README file.

- FChE


From oleg at redhat.com  Mon Mar 23 15:14:00 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Mon, 23 Mar 2009 16:14:00 +0100
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090323134813.GA18219@x200.localdomain>
References: <20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
Message-ID: <20090323151400.GA3413@redhat.com>

On 03/23, Alexey Dobriyan wrote:
>
> Right now with ptrace(2) rewrite the following is easily triggerable:
>
> WARNING: at kernel/ptrace.c:515 ptrace_report_signal+0x2c1/0x2d0()

Yes, ptrace-over-utrace needs more work. But your message looks as if
utrace core is buggy, imho this is a bit unfair ;)

As Roland said, ptrace-over-utrace is not ready yet. If you mean that
utrace core should not be merged alone - this is another story.

But personally I understand why Roland sends utrace core before changing
ptrace.

Oleg.


From shunt at recordsreduction.com  Mon Mar 23 14:59:37 2009
From: shunt at recordsreduction.com (Shane Hunt)
Date: Mon, 23 Mar 2009 07:59:37 -0700
Subject: Document Imaging/Scanning to eliminate paper problems
Message-ID: <200903231524.n2NF8jRr023412@mx1.redhat.com>

Records Reduction, Inc. has been providing document 
imaging/scanning services throughout the Southeast US since 1998.
We provide following services:
 

    * File pickup
    * Prepping files - removing staples, unfolding paper, moving 
      sticky notes, etc.
    * Scan files (saved to PDF or Tif)
    * Index documents for easy retrieval
    * OCRing available for full text searching
    * Images returned on disc or uploaded to web for retrieval
    * Shredding files

And we provide these services for much less than the large, 
national companies!

 
Benefits of Document Imaging/Scanning

    * Recover Valuable Office Space
    * Find any file within seconds
    * Eliminate Lost Files
    * Save money on costly file cabinets, paper, copying, filing time
    * Increase worker productivity
 
Benefits of Outsourcing

    * You do not have to purchase and maintain expensive imaging 
      equipment
    * You do not have to spend time prepping and scanning 
      documents
    * Provide a backup CD for offsite storage
    * Proven quality process already in place
    * Experts in digital storage and retrieval
    * We'll do EVERYTHING for you - box the files, scan them, 
      index them, etc. We make your life easier!
    * We have many real world examples proving we can scan 
      cheaper than you can in house. It's basic Business 101. We 
      buy the best software and scanners on the market. This 
      gives us extreme efficiencies and speed - which means less 
      money to you!
    * We require no commitment. If you don't like our services, 
      quit using us. You lose nothing for trying!

 
Please respond with your Name, Company Name & Address and we will
send you a FREE Sample Imaging CD and Document Imaging Report.  
There are no strings attached to this offer.  It's simply the 
most effective way to show you how you can save time, space & 
money using our document management services. 

Call or email to get more information, or to schedule an 
appointment.  We will scan in a sample at no charge.

Shane Hunt

704-724-3313

shunt at recordsreduction.com 

 
PO Box 3322, Matthews, NC 28106


http://app.streamsend.com/private/tF8d/2bm/cAm25g7/unsubscribe/3353212
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090323/06e44200/attachment.htm>

From mathieu.desnoyers at polymtl.ca  Mon Mar 23 16:42:08 2009
From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers)
Date: Mon, 23 Mar 2009 12:42:08 -0400
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090316221800.GE12974@redhat.com>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090316221800.GE12974@redhat.com>
Message-ID: <20090323164208.GB22501@Krystal>

* Frank Ch. Eigler (fche at redhat.com) wrote:
> Hi -
> 
> 
> On Mon, Mar 16, 2009 at 05:45:26PM -0400, Mathieu Desnoyers wrote:
> 
> > [...]
> > > As far as I know, utrace supports multiple trace-engines on a process.
> > > Since ptrace is just an engine of utrace, you can add another engine on utrace.
> > > 
> > > utrace-+-ptrace_engine---owner_process
> > >        |
> > >        +-systemtap_module
> > >        |
> > >        +-ftrace_plugin
> 
> Right.  In this way, utrace is simply a multiplexing intermediary.
> 
> 
> > > Here, Frank had posted an example of utrace->ftrace engine.
> > > http://lkml.org/lkml/2009/1/27/294
> > > 
> > > And here is the latest his patch(which seems to support syscall tracing...)
> > > http://git.kernel.org/?p=linux/kernel/git/frob/linux-2.6-utrace.git;a=blob;f=kernel/trace/trace_process.c;h=619815f6c2543d0d82824139773deb4ca460a280;hb=ab20efa8d8b5ded96e8f8c3663dda3b4cb532124
> > > 
> > 
> > Reminder : we are looking at system-wide tracing here. Here are some
> > comments about the current utrace implementation.
> > 
> > Looking at include/linux/utrace.h from the tree
> > 
> > 17  * A tracing engine starts by calling utrace_attach_task() or
> > 18  * utrace_attach_pid() on the chosen thread, passing in a set of hooks
> > 19  * (&struct utrace_engine_ops), and some associated data.  This produces a
> > 20  * &struct utrace_engine, which is the handle used for all other
> > 21  * operations.  An attached engine has its ops vector, its data, and an
> > 22  * event mask controlled by utrace_set_events().
> > 
> > So if the system has, say 3000 threads, then we have 3000 struct
> > utrace_engine created ? I wonder what effet this could have one
> > cachelines if this is used to trace hot paths like system call
> > entry/exit. Have you benchmarked this kind of scenario under tbench ?
> 
> It has not been a problem, since utrace_engines are designed to be
> lightweight.  Starting or stopping a systemtap script of the form
> 
>     probe process.syscall {}
> 
> appears to have no noticable impact on a tbench suite.
> 

Do you mean starting this script for a single process or for _all_ the
processes and threads running on the system ?

> 
> > 24  * For each event bit that is set, that engine will get the
> > 25  * appropriate ops->report_*() callback when the event occurs.  The
> > 26  * &struct utrace_engine_ops need not provide callbacks for an event
> > 27  * unless the engine sets one of the associated event bits.
> > 
> > Looking at utrace_set_events(), we seem to be limited to 32 events on a
> > 32-bits architectures because it uses a bitmask ? Isn't it a bit small?
> 
> There are only a few types of thread events that involve different
> classes of treatment, or different degrees of freedom in terms of
> interference with the uninstrumented fast path of the threads.
> 
> For example, it does not make sense to have different flag bits for
> different system calls, since choosing to trace *any* system call
> involves taking the thread off of the fast path with the TIF_ flag.
> Once it's off the fast path, it doesn't matter whether the utrace core
> or some client performs syscall discrimination, so it is left to the
> client.
> 

If we limit ourself to thread-interaction events, I agree that they are
limited. But in the system-wide tracing scenario, the criterions for
filtering can apply to many more event categories.

Referring to Roland's reply, I think using utrace to enable system-wide
collection of data would just be a waste of resources. Going through a
more lightweight system-wide activation seems more appropriate to me.
Utrace is still a very promising tool to trace process-specific activity
though.

Mathieu

> 
> > 682 /**
> > 683  * utrace_set_events_pid - choose which event reports a tracing engine gets
> > 684  * @pid:                thread to affect
> > 685  * @engine:             attached engine to affect
> > 686  * @eventmask:          new event mask
> > 687  *
> > 688  * This is the same as utrace_set_events(), but takes a &struct pid
> > 689  * pointer rather than a &struct task_struct pointer.  The caller must
> > 690  * hold a ref on @pid, but does not need to worry about the task
> > 691  * staying valid.  If it's been reaped so that @pid points nowhere,
> > 692  * then this call returns -%ESRCH.
> > 
> > 
> > Comments like "but does not need to worry about the task staying valid"
> > does not make me feel safe and comfortable at all, could you explain
> > how you can assume that derefencing an "invalid" pointer will return
> > NULL ?
> 
> (We're doing a final round of "internal" (pre-LKML) reviews of the
> utrace implementation right now on utrace-devel at redhat.com, where such
> comments get fastest attention from the experts.)
> 
> For this particular issue, the utrace documentation file explains the
> liveness rules for the various pointers that can be fed to or received
> from utrace functions.  This is not about "feeling" safe, it's about
> what the mechanism is deliberately designed to permit.
> 
> 
> > About the utrace_attach_task() :
> > 
> > 244         if (unlikely(target->flags & PF_KTHREAD))
> > 245                 /*
> > 246                  * Silly kernel, utrace is for users!
> > 247                  */
> > 248                 return ERR_PTR(-EPERM);
> > 
> > So we cannot trace kernel threads ?
> 
> I'm not quite sure about all the reasons for this, but I believe that
> kernel threads don't tend to engage in job control / signal /
> system-call activities the same way as normal user threads do.
> 
> 
> > 118 /*
> > 119  * Called without locks, when we might be the first utrace engine to attach.
> > 120  * If this is a newborn thread and we are not the creator, we have to wait
> > 121  * for it.  The creator gets the first chance to attach.  The PF_STARTING
> > 122  * flag is cleared after its report_clone hook has had a chance to run.
> > 123  */
> > 124 static inline int utrace_attach_delay(struct task_struct *target)
> > 125 {
> > 126         if ((target->flags & PF_STARTING) && target->real_parent != current)
> > 127                 do {
> > 128                         schedule_timeout_interruptible(1);
> > 129                         if (signal_pending(current))
> > 130                                 return -ERESTARTNOINTR;
> > 131                 } while (target->flags & PF_STARTING);
> > 132
> > 133         return 0;
> > 134 }
> > 
> > Why do we absolutely have to poll until the thread has started ?
> 
> (I don't know off the top of my head - Roland?)
> 
> 
> > utrace_add_engine()
> >   set_notify_resume(target);
> > 
> > ok, so this is where the TIF_NOTIFY_RESUME thread flag is set. I notice
> > that it is set asynchronously with the execution of the target thread
> > (as I do with my TIF_KERNEL_TRACE thread flag).
> > 
> > However, on x86_64, _TIF_DO_NOTIFY_MASK is only tested in 
> > entry_64.S
> > 
> > int_signal:
> > and
> > retint_signal:
> > 
> > code paths. However, if there is no syscall tracing to do upon syscall
> > entry, the thread flags are not re-read at syscall exit and you will
> > miss the syscall exit returning from your target thread if this thread
> > was blocked while you set its TIF_NOTIFY_RESUME. Or is it handled in
> > some subtle way I did not figure out ? BTW re-reading the TIF flags from
> > the thread_info at syscall exit on the fast path is out of question
> > because it considerably degrades the kernel performances. entry_*.S is
> > a very, very critical path.
> 
> (I don't know off the top of my head - Roland?)
> 
> 
> - FChE

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


From fche at redhat.com  Mon Mar 23 16:52:42 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Mon, 23 Mar 2009 12:52:42 -0400
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090323164208.GB22501@Krystal>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090316221800.GE12974@redhat.com>
	<20090323164208.GB22501@Krystal>
Message-ID: <20090323165242.GB18774@redhat.com>

Hi -

On Mon, Mar 23, 2009 at 12:42:08PM -0400, Mathieu Desnoyers wrote:
> [...]

(Please trim emails you're responding to.)

> [...]
> > > So if the system has, say 3000 threads, then we have 3000 struct
> > > utrace_engine created ? I wonder what effet this could have one
> > > cachelines if this is used to trace hot paths like system call
> > > entry/exit. Have you benchmarked this kind of scenario under tbench ?
> > 
> > It has not been a problem, since utrace_engines are designed to be
> > lightweight.  Starting or stopping a systemtap script of the form
> > 
> >     probe process.syscall {}
> > 
> > appears to have no noticable impact on a tbench suite.
> 
> Do you mean starting this script for a single process or for _all_ the
> processes and threads running on the system ?

The script above usually applies to all threads.


> > > Looking at utrace_set_events(), we seem to be limited to 32 events on a
> > > 32-bits architectures because it uses a bitmask ? Isn't it a bit small?
> > 
> > There are only a few types of thread events that involve different
> > classes of treatment, or different degrees of freedom in terms of
> > interference with the uninstrumented fast path of the threads. [...]
> 
> If we limit ourself to thread-interaction events, I agree that they are
> limited. But in the system-wide tracing scenario, the criterions for
> filtering can apply to many more event categories.

If those different criteria have equivalent impact on running threads,
there is no need to differentiate them at the low (utrace event flag)
level.  Could you offer an example to clarify?


> Referring to Roland's reply, I think using utrace to enable
> system-wide collection of data would just be a waste of
> resources. Going through a more lightweight system-wide activation
> seems more appropriate to me.  [...]

Perhaps.  OTOH it also makes sense to me to use (and improve) one
general facility, if it can do the right thing almost as fast as a
wholly separate facility that's specialized for one small purpose.
The decision would probably rest with a more data-based comparison of
performance & code size.


- FChE


From mathieu.desnoyers at polymtl.ca  Mon Mar 23 17:03:56 2009
From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers)
Date: Mon, 23 Mar 2009 13:03:56 -0400
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090323165242.GB18774@redhat.com>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090316221800.GE12974@redhat.com>
	<20090323164208.GB22501@Krystal>
	<20090323165242.GB18774@redhat.com>
Message-ID: <20090323170356.GD24084@Krystal>

* Frank Ch. Eigler (fche at redhat.com) wrote:
> Hi -
> 
> On Mon, Mar 23, 2009 at 12:42:08PM -0400, Mathieu Desnoyers wrote:
> > [...]
> 
> (Please trim emails you're responding to.)
> 
> > [...]
> > > > So if the system has, say 3000 threads, then we have 3000 struct
> > > > utrace_engine created ? I wonder what effet this could have one
> > > > cachelines if this is used to trace hot paths like system call
> > > > entry/exit. Have you benchmarked this kind of scenario under tbench ?
> > > 
> > > It has not been a problem, since utrace_engines are designed to be
> > > lightweight.  Starting or stopping a systemtap script of the form
> > > 
> > >     probe process.syscall {}
> > > 
> > > appears to have no noticable impact on a tbench suite.
> > 
> > Do you mean starting this script for a single process or for _all_ the
> > processes and threads running on the system ?
> 
> The script above usually applies to all threads.
> 

Hrm, I already spent more time installing and benchmarking systemtap
than I should, so I don't have time currently to run further systemtap
benchmarks, but I seriously doubt about this. Have you run the following
benchmark ?

Baseline :
vanilla kernel, without utrace

Comparison with :
utrace-enabled kernel, with the syscall probe activated

?

If you are comparing a utrace-enabled kernel with and without the
syscall probes activated, then you are probably missing some performance
impact. Also make sure AUDIT SYSCALL, secure computing and
frame pointers are disabled in your baseline kernel too.

If this is what you did, I would really like to see the numbers.

> 
> > > > Looking at utrace_set_events(), we seem to be limited to 32 events on a
> > > > 32-bits architectures because it uses a bitmask ? Isn't it a bit small?
> > > 
> > > There are only a few types of thread events that involve different
> > > classes of treatment, or different degrees of freedom in terms of
> > > interference with the uninstrumented fast path of the threads. [...]
> > 
> > If we limit ourself to thread-interaction events, I agree that they are
> > limited. But in the system-wide tracing scenario, the criterions for
> > filtering can apply to many more event categories.
> 
> If those different criteria have equivalent impact on running threads,
> there is no need to differentiate them at the low (utrace event flag)
> level.  Could you offer an example to clarify?
> 
> 
> > Referring to Roland's reply, I think using utrace to enable
> > system-wide collection of data would just be a waste of
> > resources. Going through a more lightweight system-wide activation
> > seems more appropriate to me.  [...]
> 
> Perhaps.  OTOH it also makes sense to me to use (and improve) one
> general facility, if it can do the right thing almost as fast as a
> wholly separate facility that's specialized for one small purpose.
> The decision would probably rest with a more data-based comparison of
> performance & code size.
> 

Sure.

Mathieu

> 
> - FChE

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


From mathieu.desnoyers at polymtl.ca  Mon Mar 23 17:33:15 2009
From: mathieu.desnoyers at polymtl.ca (Mathieu Desnoyers)
Date: Mon, 23 Mar 2009 13:33:15 -0400
Subject: [RFC][PATCH 1/2] tracing/ftrace: syscall tracing infrastructure
In-Reply-To: <20090319103434.CBE69FC3AB@magilla.sf.frob.com>
References: <1236401580-5758-1-git-send-email-fweisbec@gmail.com>
	<1236401580-5758-2-git-send-email-fweisbec@gmail.com>
	<49BEAA5A.4030708@redhat.com> <20090316201526.GE8393@nowhere>
	<49BEB8C4.2010606@redhat.com> <20090316214526.GA15119@Krystal>
	<20090317052442.GA32674@redhat.com>
	<20090317160029.GD10092@Krystal>
	<20090319103434.CBE69FC3AB@magilla.sf.frob.com>
Message-ID: <20090323173315.GG24084@Krystal>

Hi Roland,

* Roland McGrath (roland at redhat.com) wrote:
> The utrace API itself is not a good fit for global tracing, since its
> purpose is tracing and control of individual user threads.  There is
> no reason to allocate its per-task data structures when you are going
> to treat all tasks the same anyway.  The points that I think are being
> missed are about the possibilities of overloading TIF_SYSCALL_TRACE.
> 
> It's true that ptrace uses TIF_SYSCALL_TRACE as a flag for whether you are
> in the middle of a PTRACE_SYSCALL, so it can be confused by setting it for 
> other purposes on a task that is also ptrace'd (but not with PTRACE_SYSCALL).
> Until we are able to do away with these parts of the old ptrace innards,
> you can't overload TIF_SYSCALL_TRACE without perturbing ptrace behavior.
> 

Yes, this is why I went with a different thread flag in my
TIF_KERNEL_TRACE implementation.

> The utrace code does not have this problem.  It keeps its own state bits,
> so for it, TIF_SYSCALL_TRACE means exactly "the task will call
> tracehook_report_syscall_*" and no more.  To use TIF_SYSCALL_TRACE for
> another purpose, just set it on all the tasks you like (and/or set it on
> new tasks in fork.c) and add your code (tracepoints, whatever) to
> tracehook_report_syscall_* alongside the calls there into utrace.  There is
> exactly one place in utrace code that clears TIF_SYSCALL_TRACE, and you
> just add "&& !global_syscall_tracing_enabled" to the condition there.  You
> don't need to bother clearing TIF_SYSCALL_TRACE on any task when you're
> done.  If your "global_syscall_tracing_enabled" (or whatever it is) is
> clear, each task will lazily fall into utrace at its next syscall
> entry/exit and then utrace will reset TIF_SYSCALL_TRACE when it finds no
> reason left to have it on.

I wonder how racy enabling system-wide tracing and disabling utrace
tracing on a specific thread would be ? How do you ensure that the
global tracing flag and per-thread flags are updated consistently ?

I also wonder about added performance impact caused by the
tracehook_report_syscall_* call. Ideally, system-wide syscall tracing
should call directly into a tracing callback, write to the trace
buffers, and return. With utrace, we would have to call an intermediate
callback, which would then call our tracer, then test utrace flags to
check if utrace should be called, and then return. Function calls are
quite costly nowadays :(

> 
> I'm not really going to delve into utrace internals in this thread.  Please
> raise those questions in review of the utrace patches when current code is
> actually posted, where they belong.  Here I'll just mention the relevant
> things that relate to the underlying issue you raised about synchronization.
> As thoroughly documented, utrace_set_events() is a quick, asynchronous call
> that itself makes no guarantees about how quickly a running task will start
> to report the newly-requested events.  For purposes relevant here, it just
> sets TIF_SYSCALL_TRACE and nothing else.  In utrace, if you want synchronous
> assurance that a task misses no events you ask for, then you must first use
> utrace_control (et al) to stop it and synchronize.  That is not something
> that makes much sense at all for a "flip on global tracing" operation, which
> is not generally especially synchronous with anything else.  If you want
> best effort that a task will pick up newly-requested events Real Soon Now,
> you can use utrace_control with just UTRACE_REPORT.  For purposes relevant
> here, this just does set_notify_resume().  That will send an IPI if the task
> is running, and then it will start noticing before it returns to user mode.
> So:
> 	set_tsk_thread_flag(task, TIF_SYSCALL_TRACE);
> 	set_notify_resume(task);
> is what I would expect you to do on each task if you want to quickly get it
> to start hitting tracehook_report_syscall_*.  (I'm a bit dubious that there
> is really any need to speed it up with set_notify_resume, but that's just me.)


Ideally, when we start tracing, setting the flag can be asynchronous,
but we need to have a way to figure out when tracing is actually active
(e.g. rcu quiescent state). So this can be seen as synchronous
activation. Stopping all tasks does not really make much sense for
system-wide tracing, especially if there are alternatives.

> 
> Finally, some broader points about TIF_SYSCALL_TRACE that I think
> have been overlooked.  The key special feature of TIF_SYSCALL_TRACE
> is that it gets you to a place where full user_regset access is
> available.  Debuggers need this to read (and write) the full user
> register state arbitrarily, which they also need to do user
> backtraces and the like.  If you do not need user_regset to work,
> then you don't need to be on this (slowest) code path.

LTTng had userspace backtraces on syscall entry and irq entry a while
ago, and this way particularly useful. But I agree than if this is not
needed, we should go for the warm path.

> 
> If you are only interested in reading syscall arguments and results
> (or even in changing syscall results in exit tracing) then you do
> not need user_regset and you do not need to take the slowest syscall
> path.  (If you are doing backtraces but already rely on full kernel
> stack unwinding to do it, you also do not need user_regset.)  From
> anywhere inside the kernel, you can use the asm/syscall.h calls to
> read syscall args, whichever entry path the task took.
> 
> The other mechanism to hook into every syscall entry/exit is
> TIF_SYSCALL_AUDIT.  On some machines (like x86), this takes a third,
> "warm" code path that is faster than the TIF_SYSCALL_TRACE path
> (though obviously still off the fastest direct code path).  It can
> be faster precisely because it doesn't need to allow for user_regset
> access, nor for modification of syscall arguments in entry tracing.
> For normal read-only tracing of just the actual syscall details,
> it has all you need.
> 
> Unfortunately the arch code all looks like:
> 
> 	if (unlikely(current->audit_context))
> 		 audit_syscall_{entry,exit}(...);
> 
> So we need to change that to:
> 
> 	if (unlikely(test_thread_flag(TIF_SYSCALL_AUDIT)))
> 		 audit_syscall_{entry,exit}(...);
> 
> But that is pretty easy to get right, even doing it blind on arch's
> you can't test.  Far better than adding new asm hackery for each arch
> that's almost identical to TIF_SYSCALL_TRACE or TIF_SYSCALL_AUDIT (and
> finding out that some are fresh out of TIF bits in the range that
> their asm code can handle).
> 
> TIF_SYSCALL_AUDIT is only set when allocating audit_context, and its
> paths already have !context tests so overloading is harmless today.
> (Whereas with TIF_SYSCALL_TRACE, you have to wait for later ptrace
> cleanups or write off using ptrace simultaneously.)
> 
> Then you can do the lazy disable in audit_syscall_{entry,exit} with:
> 
> 	if (unlikely(!context)) {
> 		if (unlikely(!global_syscall_tracing_enabled))
> 			clear_thread_flag(TIF_SYSCALL_AUDIT);
> 		return;
> 	}
> 
> Plus add there your tracepoint or whatnot.
> 
> Unless you really plan to use user_regset in your tracepoints, then
> I think this is a better plan for global syscall tracing than either
> fiddling with TIF_SYSCALL_TRACE or adding new arch asm requirements.
> (IMHO, the latter is the worst idea on the table.)
> 

Thanks for this thorough review of TIF flags. Hrm, racing with other
pieces of infrastructure is never fun, and given we might want to save
the userspace stack in some probes, I think it could be a good idea to
go with our own flag.

Mathieu

> 
> Thanks,
> Roland

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


From fche at redhat.com  Mon Mar 23 20:25:03 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Mon, 23 Mar 2009 16:25:03 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090322120811.GD19826@elte.hu>
References: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com>
	<20090322120811.GD19826@elte.hu>
Message-ID: <20090323202503.GD18774@redhat.com>

Hi -

On Sun, Mar 22, 2009 at 01:08:11PM +0100, Ingo Molnar wrote:
> [...]
> > In my own limited kernel-building experience, I find the debuginfo 
> > data conveniently and instantly available after every "make".  Can 
> > you elaborate how is it harder for you to incidentally make it 
> > than for someone to download it?
> 
> Four reasons:
> 
> 1)
> 
> I have CONFIG_DEBUG_INFO turned off in 99.9% of my kernel builds, 
> because it slows down the kernel build times significantly: [...]

OK, 15% longer time.

> 2)
> 
> When the kernel build becomes IO-bound [...]
>   without:   870.36 user 292.79 system 3:32.10 elapsed  548% CPU
>   with:      929.65 user 384.55 system 8:28.70 elapsed  258% CPU

OK, lots of network traffic.
 
> 3) [...]
> Try to build 1.6 GB of dirty data on ext3 and run into an fsync() in 
> your editor ... you'll sit there twiddling thumbs for a minute or 
> more.

Now don't go blaming us for ext3 & fsync & not having a low enough
/proc/sys/vm/dirty_background_ratio.


> 4)
> Or yet another metric - Linux distro package overhead. Try 
> installing a debuginfo package: [...]

Same as 3).


> And this download has to be repeated for _every_ minor kernel 
> upgrade.

Actually, no.  If you just want to run the newly built kernel
somewhere else on your network, you can run a systemtap compile server
on your build machine, and let the systemtap network client do ~RPCs
to get at the data.


> The solution?)
> 
> Dunno - but i definitely think we should think bigger:
> 
> The fundamental disconnect i believe seems to come from the fact 
> that most user-space projects are relatively small, so debuginfo 
> bloat is a secondary issue there.

(Well, the fedora debuginfo archive shows a couple of packages of
similar or greater heft than the kernel - openoffice.org, qt3, ...)


> A few random ideas:
> 
> [...] why not build debuginfo on the fly, when a debugging 
> session requires it? Rarely do we need debuginfo for more than a 
> fraction of the whole kernel. [...]
> I mean, lets _use_ the fact that we have source code available, more 
> intelligently. It takes zero time to build detailed debuginfo for a 
> portion of a tree. [...]

Something like that might be made to work.

Note that this backs away from earlier claims that we can make do
without debuginfo, or that the compiler can't be trusted to build the
stuff.  We all agree it'd be nice if made it better and made a little
less.


- FChE


From torvalds at linux-foundation.org  Mon Mar 23 20:39:22 2009
From: torvalds at linux-foundation.org (Linus Torvalds)
Date: Mon, 23 Mar 2009 13:39:22 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090323202503.GD18774@redhat.com>
References: <20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu> <20090321214852.GA5262@redhat.com>
	<20090322120811.GD19826@elte.hu>
	<20090323202503.GD18774@redhat.com>
Message-ID: <alpine.LFD.2.00.0903231334500.3030@localhost.localdomain>


On Mon, 23 Mar 2009, Frank Ch. Eigler wrote:
> > I have CONFIG_DEBUG_INFO turned off in 99.9% of my kernel builds, 
> > because it slows down the kernel build times significantly: [...]
> 
> OK, 15% longer time.

It's way more than that if you don't have tons of memory and excessive 
amounts of CPU power.

> > 2)
> > 
> > When the kernel build becomes IO-bound [...]
> >   without:   870.36 user 292.79 system 3:32.10 elapsed  548% CPU
> >   with:      929.65 user 384.55 system 8:28.70 elapsed  258% CPU
> 
> OK, lots of network traffic.

This is the _normal_ situation for a debug info build.  If it's not 
network traffic (distcc), it's just disk IO. Have you tried it on a 
laptop? Ingo is not the only one that turns off DEBUG_INFO in disgust. 
It's totally useless for any sane kernel developer - the costs are 
excessive.

Adn that's totally ignoring the disk usage of multiple debug info kernels.

> Note that this backs away from earlier claims that we can make do
> without debuginfo, or that the compiler can't be trusted to build the
> stuff.  We all agree it'd be nice if made it better and made a little
> less.

Gaah. I'd wish you all agreed that DEBUG_INFO is just TOTALLY UNREALISTIC.

			Linus


From tytso at mit.edu  Mon Mar 23 21:44:17 2009
From: tytso at mit.edu (Theodore Tso)
Date: Mon, 23 Mar 2009 17:44:17 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090323151400.GA3413@redhat.com>
References: <20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com>
Message-ID: <20090323214417.GD5814@mit.edu>

On Mon, Mar 23, 2009 at 04:14:00PM +0100, Oleg Nesterov wrote:
> 
> Yes, ptrace-over-utrace needs more work. But your message looks as if
> utrace core is buggy, imho this is a bit unfair ;)
> 
> As Roland said, ptrace-over-utrace is not ready yet. If you mean that
> utrace core should not be merged alone - this is another story.
> 
> But personally I understand why Roland sends utrace core before changing
> ptrace.

Yes, but if it's going to be merged this during 2.6.x cycle, we need
to have a user for the kernel interface along with the new kernel
interface.  This is true for any body trying to add some new
infrastructure to the kernel; you have to have an in-tree user of said
interface.

I mean, if some device manufacturer were to go to Red Hat's kernel
team, and say, "we need this interface for our uber expensive RDMA
interface card", and there was no in-kernel user for the interface, we
know what Red Hat would tell that device manufacturer, right?  So why
is the SystemTap team trying to get an exception for utrace?  It just
seems a little hypocritical.

So what about the ftrace user of utrace?  Is that ready to be merged?

   	      	  	      	 	     - Ted


From renzo at cs.unibo.it  Mon Mar 23 23:59:24 2009
From: renzo at cs.unibo.it (Renzo Davoli)
Date: Tue, 24 Mar 2009 00:59:24 +0100
Subject: utrace-kmview contract
Message-ID: <20090323235924.GD23807@cs.unibo.it>

Dear Roland,

You are right when you say that the interface specification is a contract
between utrace and the module writers.
My goal is to use utrace for my virtual machines, your goal is to
design utrace as a support for a wide range of applications.
I hope your "wide range of applications" will include kmview.

In my perception utrace's support of multiple engines needs a supplement of
investigation.
I do not want my patches enter utrace code provided there is another
fast/clean/easy to code way to reach the same results.
It is not for kmview alone, I think this is an example for a range
of virtualization application based on utrace.
When utrace is used for debugging, "the faster, the better" invariant holds,
but when you are dealing with virtualization the rule changes to
"the slower, the useless!".
Debugging is a temporary state of an application, while virtualization must be
designed to be used as a standard environment.

Sometimes a picture worth thousands of words. 
http://www.cs.unibo.it/~renzo/4roland20090323.pdf
I have drawn some examples. This is actually a simplified view
just to show the problems.
The module unreal is a test module for kmview that virtualizes the /unreal
subtree as a "copy" of the file system ("/unreal/x/y/z is the
file /x/y/z).
I know that a so simple transformation could have been implemented directly
inside the report_syscall function but kmview is a general support
for virtualization. unreal is just a simple test for it.
kmview is composed by a kernel module and the "agent" in user space.

In the first slide a user runs kmview and inside the vm he/she loads the
unreal module and runs a cat command. When cat tries to open 
"/unreal/etc/passwd", unreal rewrites the path to /etc/passwd, the kernel
runs an "open" system call but the arguments have been modified.
The report_syscall_entry routine must send the path to kmview in userland
and wait for the answer.
The number on the arrows show the sequence of actions.

The second slide shows a tracing/debugging tool used with virtualization.
This is an example of multiple engines working on the same process.
strace must read its data before the virtualization for report_syscall_entry.
On the contrary the return value shown by strace must be the one returned
by the kmview virtualization engine, thus the order for report_syscall_entry
is the reverse of that used by report_syscall_exit.
Note that if instead of "strace cat /unreal/etc/passwd" our user wrote
"strace -f -o /tmp/xxx kmview bash" as the first command the order of the
engine would have been inverted. strace in fact should show the system call
trace as they appear "outside the virtualization" as one may expect 
from the command.

The third slide shows a nested virtualization and the forth a debug tool
running inside a nested virtualization.
In all these examples I'd use UTRACE_STOP.

Now let us discuss the details of the contract ;-)

I set up two different implementations of kmview kernel module.
In the standard one (#undefine KMVIEW_NEWSTOP) the report_syscall
function returns UTRACE_STOP waiting for the answer from kmview application.
The new one (#define KMVIEW_NEWSTOP) uses a semaphore to stop the execution 
inside the report_syscall function which always returns UTRACE_RESUME.

--------------------------------------------------------------------
If you decide that the right implementation is the former 
(#undefine KMVIEW_NEWSTOP):
- please tell me how to implement the example of page 3 if in the management of
syscall_entry for kmview2 does not stop prior to call kmview1.
Okay, you say kmview1's module receives a notification that another engine 
wants to stop reading its @action argument but it needs the state as 
modified by kmview2.
- I could set up some kind of synchronization among kmview machines but the
solution would be extremely weak. What about if kmview run nested with another
virtualization/tracing application based on utracei e.g. strace?
- You say "use UTRACE_REPORT" to wait for the other machines are done
fiddling with it. 
The comment you wrote about UTRACE_REPORT says:
* This is like %UTRACE_RESUME, but also ensures that there will be
* a @report_quiesce or @report_signal callback made soon.  If
* @target had been stopped, then there will be a callback before it
* resumes running normally.  If another engine is keeping @target
* stopped, then there might be no callbacks until all engines let
* it resume.
But if kmview1 and 2 have both stopped the report_syscall so no callback will
be called until both finishes. 
Otherwise you may mean that kmview1 returns UTRACE_RESUME and when
kmview1's report quiesce get called it returns UTRACE_STOP. In this way
the management of the system call should be moved from the 
report_syscall_entry to report_quiesce but just for kmview1.
Which one is the cleaner way to implement a service on utrace in you opinion? 
In my opinion the possibility to have the process blocked before
calling the next report function leads to simpler code.
Was this design choice chosen for efficiency? I feel that all 
this long sequence of report callbacks ends up slowing down the virtualization.
"the slower the useless" I said....
Are you sure that each engine should examine by themselves what the other
engines do, as utrace provides almost no synchronization rules between them?
For sure you have in your mind examples where engines have to run concurrently
when one or more return UTRACE_STOP. 
But there are other cases in which you need to stop  a process before calling 
the next engine's report function. Instead of changing the semantics of 
UTRACE_STOP you could add a UTRACE_STOP_NOW return value to stop the engine 
before calling the next engine's report function 
-------------------------------------------------------------------
If you decide that the latter implementation is the right one (#define
KMVIEW_NEWSTOP)

- This means that I am not using UTRACE_STOP at all. I have implemented 
another way the support to stop a process.
I don't think it is a good idea to stay in the report function for a
long time, UTRACE_STOP was designed for that purpose.
- The management of asynchronous events is harder as the process can be stopped
in many "levels" of the architecture.
- If you say that this is the right way to do it, I'll keep this
code but I'll be wondering what is UTRACE_STOP for.
-------------------------------------------------------------------
In both cases the order of report_syscall_entry report function must be
reversed (with respect to all the other report functions) otherwise all 
the nested engine examples fail.

ciao
	renzo

Note: the actual kmview and unreal work in a slightly different way.
This final note is useful if you want to read the code or run the examples
otherwise it can be safely skipped.
1- kmview VMM (the agent) does not rewrite the path but open the file itself.
2- the nested kmview VMM itself runs in the space virtualized by the
outer kmview. The drawings would have been more complex but the 
problem is the same, a process running in a nested kmview has one utrace
engine for each kmview.
3- actual unreal provides two levels of /unreal. 
kmview + unreal provide /unreal and /unreal/unreal as copies of the file
system
kmview+unreal+kmview+unreal (nested) provide /unreal /unreal/unreal
/unreal/unreal/unreal and /unreal/unreal/unreal/unreal.


From ananth at in.ibm.com  Tue Mar 24 05:29:26 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 24 Mar 2009 10:59:26 +0530
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090321050422.d1d99eec.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
Message-ID: <20090324052926.GC24018@in.ibm.com>

On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote:
> On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" <fche at redhat.com> wrote:
> > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
> 
> I have strong memories of being traumatised by reading the uprobes code. 

That was a long time ago wasn't it? :-)

That approach was a carry over from an implementation from dprobes that
used readdir hooks. Yes, that was not the most elegant approach, as such
has long been shelved.

> What's the story on all of that nowadays?

Utrace makes implementing uprobes more cleaner. We have a prototype that
implements uprobes over utrace. Its per process, doesn't use any
in-kernel hooks, etc. It currently has a kprobes like interface (needs a
kernel module), but it shouldn't be difficult to adapt it to use
utrace's user interfaces (syscalls?) when those come around. The current
generation of uprobes that has all the bells and whistles can be found at
http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2

However, there are aspects of the current uprobes that can be useful to
any other userspace tracer: instruction analysis, breakpoint insertion
and removal, single-stepping support. With these layered on top of
utrace, building userspace debug/trace tools that depend on utrace
should be easier, outside of ptrace.

Work is currently on to factor these layers out. The intention is to
upstream all the bits required for userspace tracing once utrace gets
in, along with an easy to use interface for userspace developers
(a /proc or /debugfs interface?) -- one should be able to use it on
its own or with SystemTap, whatever they prefer. Details are still hazy
at the moment.

But, utrace is the foundation to do all of that.

Ananth


From akpm at linux-foundation.org  Tue Mar 24 05:54:09 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Mon, 23 Mar 2009 22:54:09 -0700
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090324052926.GC24018@in.ibm.com>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090324052926.GC24018@in.ibm.com>
Message-ID: <20090323225409.07bdcbf7.akpm@linux-foundation.org>

On Tue, 24 Mar 2009 10:59:26 +0530 Ananth N Mavinakayanahalli <ananth at in.ibm.com> wrote:

> On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote:
> > On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" <fche at redhat.com> wrote:
> > > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
> > 
> > I have strong memories of being traumatised by reading the uprobes code. 
> 
> That was a long time ago wasn't it? :-)
> 
> That approach was a carry over from an implementation from dprobes that
> used readdir hooks. Yes, that was not the most elegant approach, as such
> has long been shelved.
> 
> > What's the story on all of that nowadays?
> 
> Utrace makes implementing uprobes more cleaner. We have a prototype that
> implements uprobes over utrace. Its per process, doesn't use any
> in-kernel hooks, etc. It currently has a kprobes like interface (needs a
> kernel module), but it shouldn't be difficult to adapt it to use
> utrace's user interfaces (syscalls?) when those come around. The current
> generation of uprobes that has all the bells and whistles can be found at
> http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2
> 
> However, there are aspects of the current uprobes that can be useful to
> any other userspace tracer: instruction analysis, breakpoint insertion
> and removal, single-stepping support. With these layered on top of
> utrace, building userspace debug/trace tools that depend on utrace
> should be easier, outside of ptrace.
> 
> Work is currently on to factor these layers out. The intention is to
> upstream all the bits required for userspace tracing once utrace gets
> in, along with an easy to use interface for userspace developers
> (a /proc or /debugfs interface?) -- one should be able to use it on
> its own or with SystemTap, whatever they prefer. Details are still hazy
> at the moment.
> 
> But, utrace is the foundation to do all of that.
> 

The sticking point was uprobes's modification of live pagecache.  We said
"ick, COW the pages" and you said "too expensive".  And there things
remained.

Is that all going to happen again?


From ananth at in.ibm.com  Tue Mar 24 06:10:24 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 24 Mar 2009 11:40:24 +0530
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090323225409.07bdcbf7.akpm@linux-foundation.org>
References: <20090321013946.890F4FC3AB@magilla.sf.frob.com>
	<20090321014244.9ADF1FC3AB@magilla.sf.frob.com>
	<20090321074301.GA19384@elte.hu>
	<20090321013912.ed6039c9.akpm@linux-foundation.org>
	<20090321091235.GA29678@elte.hu>
	<20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090324052926.GC24018@in.ibm.com>
	<20090323225409.07bdcbf7.akpm@linux-foundation.org>
Message-ID: <20090324061024.GD24018@in.ibm.com>

On Mon, Mar 23, 2009 at 10:54:09PM -0700, Andrew Morton wrote:
> On Tue, 24 Mar 2009 10:59:26 +0530 Ananth N Mavinakayanahalli <ananth at in.ibm.com> wrote:
> 
> > On Sat, Mar 21, 2009 at 05:04:22AM -0700, Andrew Morton wrote:
> > > On Sat, 21 Mar 2009 07:51:41 -0400 "Frank Ch. Eigler" <fche at redhat.com> wrote:
> > > > On Sat, Mar 21, 2009 at 04:19:54AM -0700, Andrew Morton wrote:
> > > 
> > > I have strong memories of being traumatised by reading the uprobes code. 
> > 
> > That was a long time ago wasn't it? :-)
> > 
> > That approach was a carry over from an implementation from dprobes that
> > used readdir hooks. Yes, that was not the most elegant approach, as such
> > has long been shelved.
> > 
> > > What's the story on all of that nowadays?
> > 
> > Utrace makes implementing uprobes more cleaner. We have a prototype that
> > implements uprobes over utrace. Its per process, doesn't use any
> > in-kernel hooks, etc. It currently has a kprobes like interface (needs a
> > kernel module), but it shouldn't be difficult to adapt it to use
> > utrace's user interfaces (syscalls?) when those come around. The current
> > generation of uprobes that has all the bells and whistles can be found at
> > http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=tree;f=runtime/uprobes2
> > 
> > However, there are aspects of the current uprobes that can be useful to
> > any other userspace tracer: instruction analysis, breakpoint insertion
> > and removal, single-stepping support. With these layered on top of
> > utrace, building userspace debug/trace tools that depend on utrace
> > should be easier, outside of ptrace.
> > 
> > Work is currently on to factor these layers out. The intention is to
> > upstream all the bits required for userspace tracing once utrace gets
> > in, along with an easy to use interface for userspace developers
> > (a /proc or /debugfs interface?) -- one should be able to use it on
> > its own or with SystemTap, whatever they prefer. Details are still hazy
> > at the moment.
> > 
> > But, utrace is the foundation to do all of that.
> > 
> 
> The sticking point was uprobes's modification of live pagecache.  We said
> "ick, COW the pages" and you said "too expensive".  And there things
> remained.
> 
> Is that all going to happen again?

No. All modifications are via access_process_vm().

Ananth


From roland at redhat.com  Tue Mar 24 10:34:16 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 24 Mar 2009 03:34:16 -0700 (PDT)
Subject: seccomp via utrace
Message-ID: <20090324103416.26687FC3AB@magilla.sf.frob.com>

Here is a trivial module to implement the seccomp guts via utrace.  I
haven't tested it at all.  (AFAIK it was only ever used by cpushare,
and that project might be defunct now.)

I'm not sure what Ingo had in mind for integrating this.  If it's just
to reimplement the existing prctl interface, then this is about all
you need--just s/_xxx// and fiddle the config et al to build this and
not the old stuff.

If the approach would be incremental, to leave the old stuff in place,
then it might make more sense just to do a fresh new thing not
providing that prctl interface at all.  A new thing could be a module,
and define some /sys files or whatnot for its "constrain me now" hook.
I think a sensible thing would not require asm/seccomp.h at all, and
instead just let the userland setup feed in a set of syscall numbers.
It could be that flexible while still being quite simple so that one
could audit that setup code and be confident it has no holes.  Then
future versions of cpushare (or whatever) would not need any special
kernel support for new arch's nor to change the syscall set it wants
to allow.


Thanks,
Roland

=====
#include <linux/sched.h>
#include <linux/utrace.h>
#include <linux/signal.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/compat.h>
#include <linux/prctl.h>
#include <asm/seccomp.h>
#include <asm/syscall.h>

MODULE_DESCRIPTION("secure computing");
MODULE_LICENSE("GPL");

static int insecure_signal;
module_param_named(signal, insecure_signal, int, SIGKILL);

/*
 * If it's an accepted syscall, run it normally.
 * If not, send ourselves a SIGKILL and abort the syscall.
 */
static u32 secure_syscall_entry(u32 action,
				struct utrace_engine *engine,
				struct task_struct *task,
				struct pt_regs *regs)
{
	int callno = syscall_get_nr(task, regs);

#ifdef CONFIG_COMPAT
	if (is_compat_task())
		switch (callno) {
		case __NR_seccomp_read_32:
		case __NR_seccomp_write_32:
		case __NR_seccomp_exit_32:
		case __NR_seccomp_sigreturn_32:
			return UTRACE_RESUME | UTRACE_SYSCALL_RUN;
		}
	else
#endif
		switch (callno) {
		case __NR_seccomp_read:
		case __NR_seccomp_write:
		case __NR_seccomp_exit:
		case __NR_seccomp_sigreturn:
			return UTRACE_RESUME | UTRACE_SYSCALL_RUN;
		}

	force_sig(insecure_signal, task);
	return UTRACE_RESUME | UTRACE_SYSCALL_ABORT;
}

static const struct utrace_engine_ops secure_syscall_ops =
{
	.report_syscall_entry = secure_syscall_entry
};

/*
 * Set up a utrace engine to call secure_syscall_entry() for each system call.
 * Also act like prctl(PR_SET_TSC, PR_TSC_SIGSEGV).
 */
static int enable_secure_syscall(void)
{
	struct utrace_engine *engine;
	int ret;

	engine = utrace_attach_task(current,
				    UTRACE_ATTACH_CREATE |
				    UTRACE_ATTACH_EXCLUSIVE |
				    UTRACE_ATTACH_MATCH_OPS,
				    &secure_syscall_ops, NULL);
	if (IS_ERR(engine)) {
		ret = PTR_ERR(engine);
		return ret == -EEXIST ? -EPERM : ret;
	}

	ret = utrace_set_events(current, engine, UTRACE_EVENT(SYSCALL_ENTRY));
	WARN_ON(ret);		/* Should never happen on current.  */

	/*
	 * This is the only outside ref on the engine.
	 * The engine dies automatically when this task gets reaped.
	 */
	utrace_engine_put(engine);

#ifdef SET_TSC_CTL
	if (!ret)
		SET_TSC_CTL(PR_TSC_SIGSEGV);
#endif

	return ret;
}

long prctl_get_seccomp_xxx(void)
{
	struct utrace_engine *engine = utrace_attach_task(
		current, UTRACE_ATTACH_MATCH_OPS, &secure_syscall_ops, NULL);

	if (engine == ERR_PTR(-ENOENT))
		return 0;

	if (!IS_ERR(engine))
		/*
		 * I wonder how he managed to call prctl() with it enabled.
		 * That should be impossible.
		 */
		return 1;

	return PTR_ERR(engine);
}

long prctl_set_seccomp_xxx(unsigned long seccomp_mode)
{
	if (seccomp_mode != 1)
		return -EINVAL;

	return enable_secure_syscall();
}


From roland at redhat.com  Tue Mar 24 10:38:42 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 24 Mar 2009 03:38:42 -0700 (PDT)
Subject: seccomp via utrace
In-Reply-To: Roland McGrath's message of  Tuesday, 24 March 2009 03:34:16 -0700
	<20090324103416.26687FC3AB@magilla.sf.frob.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
Message-ID: <20090324103843.376AAFC3AB@magilla.sf.frob.com>

Here is the "one swell foop" patch to cut out the old seccomp stuff, clean
up the config, and replace it with the utrace-based one.  The
kernel/seccomp.c patch looks like a patch because it found some trivia in
common, but actually it's wholly replaced with the file I posted before.

I still haven't tested it in the slightest, and only compiled it on x86-64.

This presumably actually ought to be done in several smaller patches.
If it should even be done this way at all.  (That is, eagerly cutting out
the old seccomp and leaving no seccomp option without utrace.)

But here's a completeish proof of concept.  
Maybe someone wants to pick it up.


Thanks,
Roland
---
[PATCH] utraceify seccomp

Signed-off-by: Roland McGrath <roland at redhat.com>
---
 arch/Kconfig                            |    4 +
 arch/mips/Kconfig                       |   18 +----
 arch/mips/kernel/ptrace.c               |    5 -
 arch/powerpc/Kconfig                    |   18 +----
 arch/powerpc/include/asm/thread_info.h  |    4 +-
 arch/powerpc/kernel/ptrace.c            |    3 -
 arch/sh/Kconfig                         |   17 +----
 arch/sh/include/asm/thread_info.h       |    4 +-
 arch/sh/kernel/ptrace_32.c              |    3 -
 arch/sh/kernel/ptrace_64.c              |    3 -
 arch/sparc/include/asm/thread_info_64.h |    3 +-
 arch/x86/Kconfig                        |   17 +----
 arch/x86/kernel/entry_32.S              |    8 +-
 arch/x86/kernel/ptrace.c                |    4 -
 include/linux/sched.h                   |    2 -
 include/linux/seccomp.h                 |   14 ---
 init/Kconfig                            |   18 ++++
 kernel/seccomp.c                        |  146 ++++++++++++++++++-------------
 18 files changed, 116 insertions(+), 175 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 550dab2..f809f07 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -78,6 +78,10 @@ config HAVE_KPROBES
 config HAVE_KRETPROBES
 	bool
 
+# select this if the arch has the asm/seccomp.h file.
+config HAVE_SECCOMP
+	bool
+
 #
 # An arch should select this if it provides all these things:
 #
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 206cb79..b7c124e 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -4,6 +4,7 @@ config MIPS
 	select HAVE_IDE
 	select HAVE_OPROFILE
 	select HAVE_ARCH_KGDB
+	select HAVE_SECCOMP
 	# Horrible source of confusion.  Die, die, die ...
 	select EMBEDDED
 	select RTC_LIB
@@ -1949,23 +1950,6 @@ config KEXEC
 	  support.  As of this writing the exact hardware interface is
 	  strongly in flux, so no good recommendation can be made.
 
-config SECCOMP
-	bool "Enable seccomp to safely compute untrusted bytecode"
-	depends on PROC_FS
-	default y
-	help
-	  This kernel feature is useful for number crunching applications
-	  that may need to compute untrusted bytecode during their
-	  execution. By using pipes or other transports made available to
-	  the process as file descriptors supporting the read/write
-	  syscalls, it's possible to isolate those applications in
-	  their own address space using seccomp. Once seccomp is
-	  enabled via /proc/<pid>/seccomp, it cannot be disabled
-	  and the task is only allowed to execute a few safe syscalls
-	  defined by each seccomp mode.
-
-	  If unsure, say Y. Only embedded should say N here.
-
 endmenu
 
 config RWSEM_GENERIC_SPINLOCK
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 054861c..2c19cfd 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -24,7 +24,6 @@
 #include <linux/user.h>
 #include <linux/security.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
 
 #include <asm/byteorder.h>
 #include <asm/cpu.h>
@@ -564,10 +563,6 @@ static inline int audit_arch(void)
  */
 asmlinkage void do_syscall_trace(struct pt_regs *regs, int entryexit)
 {
-	/* do the secure computing check first */
-	if (!entryexit)
-		secure_computing(regs->regs[0]);
-
 	if (unlikely(current->audit_context) && entryexit)
 		audit_syscall_exit(AUDITSC_RESULT(regs->regs[2]),
 		                   regs->regs[2]);
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 74cc312..c71ac02 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -119,6 +119,7 @@ config PPC
 	select HAVE_ARCH_KGDB
 	select HAVE_KRETPROBES
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_SECCOMP
 	select HAVE_LMB
 	select HAVE_DMA_ATTRS if PPC64
 	select USE_GENERIC_SMP_HELPERS if SMP
@@ -531,23 +532,6 @@ config ARCH_WANTS_FREEZER_CONTROL
 source kernel/power/Kconfig
 endif
 
-config SECCOMP
-	bool "Enable seccomp to safely compute untrusted bytecode"
-	depends on PROC_FS
-	default y
-	help
-	  This kernel feature is useful for number crunching applications
-	  that may need to compute untrusted bytecode during their
-	  execution. By using pipes or other transports made available to
-	  the process as file descriptors supporting the read/write
-	  syscalls, it's possible to isolate those applications in
-	  their own address space using seccomp. Once seccomp is
-	  enabled via /proc/<pid>/seccomp, it cannot be disabled
-	  and the task is only allowed to execute a few safe syscalls
-	  defined by each seccomp mode.
-
-	  If unsure, say Y. Only embedded should say N here.
-
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 9665a26..4d30be8 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -105,7 +105,6 @@ static inline struct thread_info *current_thread_info(void)
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SINGLESTEP		8	/* singlestepping active */
 #define TIF_MEMDIE		9
-#define TIF_SECCOMP		10	/* secure computing */
 #define TIF_RESTOREALL		11	/* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR		12	/* Force successful syscall return */
 #define TIF_NOTIFY_RESUME	13	/* callback before returning to user */
@@ -123,14 +122,13 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_PERFMON_CTXSW	(1<<TIF_PERFMON_CTXSW)
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_SINGLESTEP		(1<<TIF_SINGLESTEP)
-#define _TIF_SECCOMP		(1<<TIF_SECCOMP)
 #define _TIF_RESTOREALL		(1<<TIF_RESTOREALL)
 #define _TIF_NOERROR		(1<<TIF_NOERROR)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
 #define _TIF_FREEZE		(1<<TIF_FREEZE)
 #define _TIF_RUNLATCH		(1<<TIF_RUNLATCH)
 #define _TIF_ABI_PENDING	(1<<TIF_ABI_PENDING)
-#define _TIF_SYSCALL_T_OR_A	(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP)
+#define _TIF_SYSCALL_T_OR_A	(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT)
 
 #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
 				 _TIF_NOTIFY_RESUME)
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3635be6..f5657c3 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -27,7 +27,6 @@
 #include <linux/user.h>
 #include <linux/security.h>
 #include <linux/signal.h>
-#include <linux/seccomp.h>
 #include <linux/audit.h>
 #ifdef CONFIG_PPC32
 #include <linux/module.h>
@@ -1021,8 +1020,6 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing(regs->gpr[0]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index ebabe51..5786e77 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -14,6 +14,7 @@ config SUPERH
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_IOREMAP_PROT if MMU
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_SECCOMP
 	help
 	  The SuperH is a RISC processor targeted for use in embedded systems
 	  and consumer electronics; it was also used in the Sega Dreamcast
@@ -521,22 +522,6 @@ config CRASH_DUMP
 
 	  For more details see Documentation/kdump/kdump.txt
 
-config SECCOMP
-	bool "Enable seccomp to safely compute untrusted bytecode"
-	depends on PROC_FS
-	help
-	  This kernel feature is useful for number crunching applications
-	  that may need to compute untrusted bytecode during their
-	  execution. By using pipes or other transports made available to
-	  the process as file descriptors supporting the read/write
-	  syscalls, it's possible to isolate those applications in
-	  their own address space using seccomp. Once seccomp is
-	  enabled via prctl, it cannot be disabled and the task is only
-	  allowed to execute a few safe syscalls defined by each seccomp
-	  mode.
-
-	  If unsure, say N.
-
 config SMP
 	bool "Symmetric multi-processing support"
 	depends on SYS_SUPPORTS_SMP
diff --git a/arch/sh/include/asm/thread_info.h b/arch/sh/include/asm/thread_info.h
index f09ac48..e1da51a 100644
--- a/arch/sh/include/asm/thread_info.h
+++ b/arch/sh/include/asm/thread_info.h
@@ -114,7 +114,6 @@ extern void free_thread_info(struct thread_info *ti);
 #define TIF_RESTORE_SIGMASK	3	/* restore signal mask in do_signal() */
 #define TIF_SINGLESTEP		4	/* singlestepping active */
 #define TIF_SYSCALL_AUDIT	5	/* syscall auditing active */
-#define TIF_SECCOMP		6	/* secure computing */
 #define TIF_NOTIFY_RESUME	7	/* callback before returning to user */
 #define TIF_USEDFPU		16	/* FPU was used by this task this quantum (SMP) */
 #define TIF_POLLING_NRFLAG	17	/* true if poll_idle() is polling TIF_NEED_RESCHED */
@@ -127,7 +126,6 @@ extern void free_thread_info(struct thread_info *ti);
 #define _TIF_RESTORE_SIGMASK	(1 << TIF_RESTORE_SIGMASK)
 #define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
-#define _TIF_SECCOMP		(1 << TIF_SECCOMP)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
 #define _TIF_USEDFPU		(1 << TIF_USEDFPU)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
@@ -141,7 +139,7 @@ extern void free_thread_info(struct thread_info *ti);
 
 /* work to do in syscall trace */
 #define _TIF_WORK_SYSCALL_MASK	(_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP | \
-				 _TIF_SYSCALL_AUDIT | _TIF_SECCOMP)
+				 _TIF_SYSCALL_AUDIT)
 
 /* work to do on any return to u-space */
 #define _TIF_ALLWORK_MASK	(_TIF_SYSCALL_TRACE | _TIF_SIGPENDING      | \
diff --git a/arch/sh/kernel/ptrace_32.c b/arch/sh/kernel/ptrace_32.c
index 29ca09d..c83d0fe 100644
--- a/arch/sh/kernel/ptrace_32.c
+++ b/arch/sh/kernel/ptrace_32.c
@@ -22,7 +22,6 @@
 #include <linux/signal.h>
 #include <linux/io.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
 #include <linux/tracehook.h>
 #include <linux/elf.h>
 #include <linux/regset.h>
@@ -438,8 +437,6 @@ asmlinkage long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long ret = 0;
 
-	secure_computing(regs->regs[0]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
diff --git a/arch/sh/kernel/ptrace_64.c b/arch/sh/kernel/ptrace_64.c
index 6950974..e65dbe0 100644
--- a/arch/sh/kernel/ptrace_64.c
+++ b/arch/sh/kernel/ptrace_64.c
@@ -27,7 +27,6 @@
 #include <linux/signal.h>
 #include <linux/syscalls.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
 #include <linux/tracehook.h>
 #include <linux/elf.h>
 #include <linux/regset.h>
@@ -427,8 +426,6 @@ asmlinkage long long do_syscall_trace_enter(struct pt_regs *regs)
 {
 	long long ret = 0;
 
-	secure_computing(regs->regs[9]);
-
 	if (test_thread_flag(TIF_SYSCALL_TRACE) &&
 	    tracehook_report_syscall_entry(regs))
 		/*
diff --git a/arch/sparc/include/asm/thread_info_64.h b/arch/sparc/include/asm/thread_info_64.h
index 639ac80..b303b93 100644
--- a/arch/sparc/include/asm/thread_info_64.h
+++ b/arch/sparc/include/asm/thread_info_64.h
@@ -227,7 +227,7 @@ register struct thread_info *current_thread_info_reg asm("g6");
 /* flag bit 6 is available */
 #define TIF_32BIT		7	/* 32-bit binary */
 /* flag bit 8 is available */
-#define TIF_SECCOMP		9	/* secure computing */
+/* flag bit 9 is available */
 #define TIF_SYSCALL_AUDIT	10	/* syscall auditing active */
 /* flag bit 11 is available */
 /* NOTE: Thread flags >= 12 should be ones we have no interest
@@ -246,7 +246,6 @@ register struct thread_info *current_thread_info_reg asm("g6");
 #define _TIF_PERFCTR		(1<<TIF_PERFCTR)
 #define _TIF_UNALIGNED		(1<<TIF_UNALIGNED)
 #define _TIF_32BIT		(1<<TIF_32BIT)
-#define _TIF_SECCOMP		(1<<TIF_SECCOMP)
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_ABI_PENDING	(1<<TIF_ABI_PENDING)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bc2fbad..25ec433 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -37,6 +37,7 @@ config X86
 	select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
 	select HAVE_ARCH_KGDB if !X86_VOYAGER
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_SECCOMP
 	select HAVE_GENERIC_DMA_COHERENT if X86_32
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select USER_STACKTRACE_SUPPORT
@@ -1324,22 +1325,6 @@ config EFI
   	resultant kernel should continue to boot on existing non-EFI
   	platforms.
 
-config SECCOMP
-	def_bool y
-	prompt "Enable seccomp to safely compute untrusted bytecode"
-	help
-	  This kernel feature is useful for number crunching applications
-	  that may need to compute untrusted bytecode during their
-	  execution. By using pipes or other transports made available to
-	  the process as file descriptors supporting the read/write
-	  syscalls, it's possible to isolate those applications in
-	  their own address space using seccomp. Once seccomp is
-	  enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
-	  and the task is only allowed to execute a few safe syscalls
-	  defined by each seccomp mode.
-
-	  If unsure, say Y. Only embedded should say N here.
-
 config CC_STACKPROTECTOR
 	bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
 	depends on X86_64 && EXPERIMENTAL && BROKEN
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 4646902..21e7046 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -341,8 +341,7 @@ sysenter_past_esp:
 
 	GET_THREAD_INFO(%ebp)
 
-	/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */
-	testw $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
+	testb $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
 	jnz sysenter_audit
 sysenter_do_call:
 	cmpl $(nr_syscalls), %eax
@@ -366,7 +365,7 @@ sysenter_exit:
 
 #ifdef CONFIG_AUDITSYSCALL
 sysenter_audit:
-	testw $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
+	testb $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
 	jnz syscall_trace_entry
 	addl $4,%esp
 	CFI_ADJUST_CFA_OFFSET -4
@@ -420,8 +419,7 @@ ENTRY(system_call)
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
 					# system call tracing in operation / emulation
-	/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */
-	testw $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
+	testb $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
 	jnz syscall_trace_entry
 	cmpl $(nr_syscalls), %eax
 	jae syscall_badsys
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 06ca07f..0d6bcff 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -19,7 +19,6 @@
 #include <linux/elf.h>
 #include <linux/security.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
 #include <linux/signal.h>
 
 #include <asm/uaccess.h>
@@ -1411,9 +1410,6 @@ asmregparm long syscall_trace_enter(struct pt_regs *regs)
 	if (test_thread_flag(TIF_SINGLESTEP))
 		regs->flags |= X86_EFLAGS_TF;
 
-	/* do the secure computing check first */
-	secure_computing(regs->orig_ax);
-
 	if (unlikely(test_thread_flag(TIF_SYSCALL_EMU)))
 		ret = -1L;
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 786ef2d..4a22d98 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -76,7 +76,6 @@ struct sched_param {
 #include <linux/percpu.h>
 #include <linux/topology.h>
 #include <linux/proportions.h>
-#include <linux/seccomp.h>
 #include <linux/rcupdate.h>
 #include <linux/rtmutex.h>
 
@@ -1286,7 +1285,6 @@ struct task_struct {
 	uid_t loginuid;
 	unsigned int sessionid;
 #endif
-	seccomp_t seccomp;
 
 #ifdef CONFIG_UTRACE
 	struct utrace utrace;
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 262a8dc..02d7adb 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -4,27 +4,13 @@
 
 #ifdef CONFIG_SECCOMP
 
-#include <linux/thread_info.h>
 #include <asm/seccomp.h>
 
-typedef struct { int mode; } seccomp_t;
-
-extern void __secure_computing(int);
-static inline void secure_computing(int this_syscall)
-{
-	if (unlikely(test_thread_flag(TIF_SECCOMP)))
-		__secure_computing(this_syscall);
-}
-
 extern long prctl_get_seccomp(void);
 extern long prctl_set_seccomp(unsigned long);
 
 #else /* CONFIG_SECCOMP */
 
-typedef struct { } seccomp_t;
-
-#define secure_computing(x) do { } while (0)
-
 static inline long prctl_get_seccomp(void)
 {
 	return -EINVAL;
diff --git a/init/Kconfig b/init/Kconfig
index 4b5ab3e..bc90ad3 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1069,6 +1069,24 @@ menuconfig UTRACE
 	  kernel interface exported to kernel modules, to track events in
 	  user threads, extract and change user thread state.
 
+config SECCOMP
+	bool "Enable seccomp to safely compute untrusted bytecode"
+	default y if UTRACE
+	depends on UTRACE
+	depends on HAVE_SECCOMP
+	help
+	  This kernel feature is useful for number crunching applications
+	  that may need to compute untrusted bytecode during their
+	  execution. By using pipes or other transports made available to
+	  the process as file descriptors supporting the read/write
+	  syscalls, it's possible to isolate those applications in
+	  their own address space using seccomp. Once seccomp is
+	  enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
+	  and the task is only allowed to execute a few safe syscalls
+	  defined by each seccomp mode.
+
+	  If unsure, say Y. Only embedded should say N here.
+
 source "block/Kconfig"
 
 config PREEMPT_NOTIFIERS
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 57d4b13..f14d1fd 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1,86 +1,108 @@
-/*
- * linux/kernel/seccomp.c
- *
- * Copyright 2004-2005  Andrea Arcangeli <andrea at cpushare.com>
- *
- * This defines a simple but solid secure-computing mode.
- */
-
 #include <linux/seccomp.h>
-#include <linux/sched.h>
+#include <linux/utrace.h>
+#include <linux/signal.h>
+#include <linux/err.h>
 #include <linux/compat.h>
-
-/* #define SECCOMP_DEBUG 1 */
-#define NR_SECCOMP_MODES 1
+#include <linux/prctl.h>
+#include <asm/syscall.h>
 
 /*
- * Secure computing mode 1 allows only read/write/exit/sigreturn.
- * To be fully secure this must be combined with rlimit
- * to limit the stack allocations too.
+ * If it's an accepted syscall, run it normally.
+ * If not, send ourselves a SIGKILL and abort the syscall.
  */
-static int mode1_syscalls[] = {
-	__NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,
-	0, /* null terminated */
-};
+static u32 secure_syscall_entry(u32 action,
+				struct utrace_engine *engine,
+				struct task_struct *task,
+				struct pt_regs *regs)
+{
+	int callno = syscall_get_nr(task, regs);
 
 #ifdef CONFIG_COMPAT
-static int mode1_syscalls_32[] = {
-	__NR_seccomp_read_32, __NR_seccomp_write_32, __NR_seccomp_exit_32, __NR_seccomp_sigreturn_32,
-	0, /* null terminated */
-};
+	if (is_compat_task())
+		switch (callno) {
+		case __NR_seccomp_read_32:
+		case __NR_seccomp_write_32:
+		case __NR_seccomp_exit_32:
+		case __NR_seccomp_sigreturn_32:
+			return UTRACE_RESUME | UTRACE_SYSCALL_RUN;
+		}
+	else
 #endif
+		switch (callno) {
+		case __NR_seccomp_read:
+		case __NR_seccomp_write:
+		case __NR_seccomp_exit:
+		case __NR_seccomp_sigreturn:
+			return UTRACE_RESUME | UTRACE_SYSCALL_RUN;
+		}
 
-void __secure_computing(int this_syscall)
+	force_sig(SIGKILL, task);
+	return UTRACE_RESUME | UTRACE_SYSCALL_ABORT;
+}
+
+static const struct utrace_engine_ops secure_syscall_ops =
 {
-	int mode = current->seccomp.mode;
-	int * syscall;
+	.report_syscall_entry = secure_syscall_entry
+};
 
-	switch (mode) {
-	case 1:
-		syscall = mode1_syscalls;
-#ifdef CONFIG_COMPAT
-		if (is_compat_task())
-			syscall = mode1_syscalls_32;
-#endif
-		do {
-			if (*syscall == this_syscall)
-				return;
-		} while (*++syscall);
-		break;
-	default:
-		BUG();
+/*
+ * Set up a utrace engine to call secure_syscall_entry() for each system call.
+ * Also act like prctl(PR_SET_TSC, PR_TSC_SIGSEGV).
+ */
+static int enable_secure_syscall(void)
+{
+	struct utrace_engine *engine;
+	int ret;
+
+	engine = utrace_attach_task(current,
+				    UTRACE_ATTACH_CREATE |
+				    UTRACE_ATTACH_EXCLUSIVE |
+				    UTRACE_ATTACH_MATCH_OPS,
+				    &secure_syscall_ops, NULL);
+	if (IS_ERR(engine)) {
+		ret = PTR_ERR(engine);
+		return ret == -EEXIST ? -EPERM : ret;
 	}
 
-#ifdef SECCOMP_DEBUG
-	dump_stack();
+	ret = utrace_set_events(current, engine, UTRACE_EVENT(SYSCALL_ENTRY));
+	WARN_ON(ret);		/* Should never happen on current.  */
+
+	/*
+	 * This is the only outside ref on the engine.
+	 * The engine dies automatically when this task gets reaped.
+	 */
+	utrace_engine_put(engine);
+
+#ifdef SET_TSC_CTL
+	if (!ret)
+		SET_TSC_CTL(PR_TSC_SIGSEGV);
 #endif
-	do_exit(SIGKILL);
+
+	return ret;
 }
 
 long prctl_get_seccomp(void)
 {
-	return current->seccomp.mode;
+	struct utrace_engine *engine = utrace_attach_task(
+		current, UTRACE_ATTACH_MATCH_OPS, &secure_syscall_ops, NULL);
+
+	if (engine == ERR_PTR(-ENOENT))
+		return 0;
+
+	if (!IS_ERR(engine))
+		/*
+		 * I wonder how he managed to call prctl() with it enabled.
+		 * That should be impossible.
+		 */
+		return 1;
+
+	return PTR_ERR(engine);
 }
 
 long prctl_set_seccomp(unsigned long seccomp_mode)
 {
-	long ret;
-
-	/* can set it only once to be even more secure */
-	ret = -EPERM;
-	if (unlikely(current->seccomp.mode))
-		goto out;
-
-	ret = -EINVAL;
-	if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) {
-		current->seccomp.mode = seccomp_mode;
-		set_thread_flag(TIF_SECCOMP);
-#ifdef TIF_NOTSC
-		disable_TSC();
-#endif
-		ret = 0;
-	}
+	if (seccomp_mode != 1)
+		return -EINVAL;
 
- out:
-	return ret;
+	return enable_secure_syscall();
 }


From mingo at elte.hu  Tue Mar 24 10:48:49 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Tue, 24 Mar 2009 11:48:49 +0100
Subject: seccomp via utrace
In-Reply-To: <20090324103416.26687FC3AB@magilla.sf.frob.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
Message-ID: <20090324104849.GA32357@elte.hu>


* Roland McGrath <roland at redhat.com> wrote:

> Here is a trivial module to implement the seccomp guts via utrace.  
> I haven't tested it at all.  (AFAIK it was only ever used by 
> cpushare, and that project might be defunct now.)
> 
> I'm not sure what Ingo had in mind for integrating this.  If it's 
> just to reimplement the existing prctl interface, then this is 
> about all you need--just s/_xxx// and fiddle the config et al to 
> build this and not the old stuff.
>
> If the approach would be incremental, to leave the old stuff in 
> place, then it might make more sense just to do a fresh new thing 
> not providing that prctl interface at all.  A new thing could be a 
> module, and define some /sys files or whatnot for its "constrain 
> me now" hook. I think a sensible thing would not require 
> asm/seccomp.h at all, and instead just let the userland setup feed 
> in a set of syscall numbers. It could be that flexible while still 
> being quite simple so that one could audit that setup code and be 
> confident it has no holes.  Then future versions of cpushare (or 
> whatever) would not need any special kernel support for new arch's 
> nor to change the syscall set it wants to allow.

nice! The simplification factor is already significant:

  18 files changed, 116 insertions(+), 175 deletions(-)

That is what we want - to remove special TIF flag uses and replace 
them with utrace driven machinery.

Another future target could be to replace TIF_SYSCALL_FTRACE [in the 
latest tracing tree] with a similar utrace driven solution.

Regarding ptrace-via-utrace. What is the plan there? Am i looking 
the right branch:

| earth4:~/linux.trees.git> git diff --stat 
| linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c
|  kernel/ptrace.c |  803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
|  1 files changed, 794 insertions(+), 9 deletions(-)

 dc43527: Merge branch 'utrace' into utrace-ptrace

I'd have (perhaps foolishly) expected ptrace.c to get reduced in 
size and arch/x86/kernel/ptrace.c eliminated - but that does not 
seem to be direction of movement. What am i missing?

	Ingo


From ananth at in.ibm.com  Tue Mar 24 11:00:00 2009
From: ananth at in.ibm.com (Ananth N Mavinakayanahalli)
Date: Tue, 24 Mar 2009 16:30:00 +0530
Subject: seccomp via utrace
In-Reply-To: <20090324104849.GA32357@elte.hu>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
Message-ID: <20090324110000.GA12841@in.ibm.com>

On Tue, Mar 24, 2009 at 11:48:49AM +0100, Ingo Molnar wrote:
> 
> * Roland McGrath <roland at redhat.com> wrote:
> 
> > Here is a trivial module to implement the seccomp guts via utrace.  
> > I haven't tested it at all.  (AFAIK it was only ever used by 
> > cpushare, and that project might be defunct now.)
> > 
> > I'm not sure what Ingo had in mind for integrating this.  If it's 
> > just to reimplement the existing prctl interface, then this is 
> > about all you need--just s/_xxx// and fiddle the config et al to 
> > build this and not the old stuff.
> >
> > If the approach would be incremental, to leave the old stuff in 
> > place, then it might make more sense just to do a fresh new thing 
> > not providing that prctl interface at all.  A new thing could be a 
> > module, and define some /sys files or whatnot for its "constrain 
> > me now" hook. I think a sensible thing would not require 
> > asm/seccomp.h at all, and instead just let the userland setup feed 
> > in a set of syscall numbers. It could be that flexible while still 
> > being quite simple so that one could audit that setup code and be 
> > confident it has no holes.  Then future versions of cpushare (or 
> > whatever) would not need any special kernel support for new arch's 
> > nor to change the syscall set it wants to allow.
> 
> nice! The simplification factor is already significant:
> 
>   18 files changed, 116 insertions(+), 175 deletions(-)
> 
> That is what we want - to remove special TIF flag uses and replace 
> them with utrace driven machinery.
> 
> Another future target could be to replace TIF_SYSCALL_FTRACE [in the 
> latest tracing tree] with a similar utrace driven solution.
> 
> Regarding ptrace-via-utrace. What is the plan there? Am i looking 
> the right branch:
> 
> | earth4:~/linux.trees.git> git diff --stat 
> | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c
> |  kernel/ptrace.c |  803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> |  1 files changed, 794 insertions(+), 9 deletions(-)
> 
>  dc43527: Merge branch 'utrace' into utrace-ptrace
> 
> I'd have (perhaps foolishly) expected ptrace.c to get reduced in 
> size and arch/x86/kernel/ptrace.c eliminated - but that does not 
> seem to be direction of movement. What am i missing?

Thats because the version of ptrace.c you are looking at has both the
legacy implementation and the ptrace over utrace implementation with
#ifdefs to separate them out. I guess Roland wanted to keep the legacy
stuff around till the ptrace/utrace becomes stable enough.

Ananth


From roland at redhat.com  Tue Mar 24 11:05:34 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 24 Mar 2009 04:05:34 -0700 (PDT)
Subject: seccomp via utrace
In-Reply-To: Ingo Molnar's message of  Tuesday,
	24 March 2009 11:48:49 +0100 <20090324104849.GA32357@elte.hu>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
Message-ID: <20090324110534.BF76DFC3AB@magilla.sf.frob.com>

> Regarding ptrace-via-utrace. What is the plan there? Am i looking 
> the right branch:
> 
> | earth4:~/linux.trees.git> git diff --stat 
> | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c
> |  kernel/ptrace.c |  803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> |  1 files changed, 794 insertions(+), 9 deletions(-)
> 
>  dc43527: Merge branch 'utrace' into utrace-ptrace

That is the branch that there is, yes.  Its comparison vs its baseline is:

 include/linux/ptrace.h    |   21 ++
 include/linux/sched.h     |    1 +
 include/linux/tracehook.h |   19 +-
 init/Kconfig              |   18 +
 kernel/ptrace.c           |  785 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/signal.c           |   14 +-
 kernel/utrace.c           |   23 ++
 7 files changed, 870 insertions(+), 11 deletions(-)

> I'd have (perhaps foolishly) expected ptrace.c to get reduced in 
> size and arch/x86/kernel/ptrace.c eliminated - but that does not 
> seem to be direction of movement. What am i missing?

Expecting that arch file to go away is just a complete misunderstanding on
your part.  Look at what is actually in that file.  arch_ptrace() and
compat_arch_ptrace() are the only things there that are actually part of
ptrace per se.  I'm not sure how much smaller you expect those to get.

Firstly, this branch now is hack-and-slash code.  As I've said a few times,
the bulk of the work is ptrace clean-up that is not directly related to
utrace.  (It's necessary stuff to do the utrace version sanely, but it's
independent clean-up that will go in ahead of any ptrace changes involving
utrace.)  That will make it cleaner, but probably not smaller in line
counts.  You get some more lines when you start using sane data structures
instead of all kludges.

Moreover, that branch does not remove any code at all.  
Everything is left the same with CONFIG_UTRACE turned off.
All the utrace-based ptrace code is new code on the other
side of an #else from some old code.

None of this, of course, has anything whatsoever to do with the seccomp
thread.  I don't know why so many people insist on hijacking every thread
for every other thing instead of posting a proper thread on a new subject
they raise.  I suppose it goes along with verbosely reviewing the diffstats
while never looking at the actual code, which also seems to be popular.


Thanks,
Roland


From mingo at elte.hu  Tue Mar 24 11:10:56 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Tue, 24 Mar 2009 12:10:56 +0100
Subject: seccomp via utrace
In-Reply-To: <20090324110000.GA12841@in.ibm.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110000.GA12841@in.ibm.com>
Message-ID: <20090324111056.GA6386@elte.hu>

* Ananth N Mavinakayanahalli <ananth at in.ibm.com> wrote:

> > nice! The simplification factor is already significant:
> > 
> >   18 files changed, 116 insertions(+), 175 deletions(-)
> > 
> > That is what we want - to remove special TIF flag uses and replace 
> > them with utrace driven machinery.
> > 
> > Another future target could be to replace TIF_SYSCALL_FTRACE [in the 
> > latest tracing tree] with a similar utrace driven solution.
> > 
> > Regarding ptrace-via-utrace. What is the plan there? Am i looking 
> > the right branch:
> > 
> > | earth4:~/linux.trees.git> git diff --stat 
> > | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c
> > |  kernel/ptrace.c |  803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > |  1 files changed, 794 insertions(+), 9 deletions(-)
> > 
> >  dc43527: Merge branch 'utrace' into utrace-ptrace
> > 
> > I'd have (perhaps foolishly) expected ptrace.c to get reduced in 
> > size and arch/x86/kernel/ptrace.c eliminated - but that does not 
> > seem to be direction of movement. What am i missing?
> 
> Thats because the version of ptrace.c you are looking at has both the
> legacy implementation and the ptrace over utrace implementation with
> #ifdefs to separate them out. I guess Roland wanted to keep the 
> #legacy stuff around till the ptrace/utrace becomes stable enough.

But this makes it hard to judge how upstream-worthy that change is - 
or could be. I realize that it's incomplete, so i'm guessing. 

kernel/ptrace.c is 739 lines currently, arch/x86/kernel/ptrace.c is 
1467 lines. The +794 lines via ptrace/utrace suggest that it got a 
bit larger - or at least has roughly the same size.

Can arch/x86/kernel/trace.c be eliminated altogether? If yes then 
that would make it a clear net win, with just a single architecture 
covered. With every additional arch the win (==complexity reduction) 
would be larger.

	Ingo


From mingo at elte.hu  Tue Mar 24 11:16:19 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Tue, 24 Mar 2009 12:16:19 +0100
Subject: seccomp via utrace
In-Reply-To: <20090324110534.BF76DFC3AB@magilla.sf.frob.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110534.BF76DFC3AB@magilla.sf.frob.com>
Message-ID: <20090324111619.GB6386@elte.hu>


* Roland McGrath <roland at redhat.com> wrote:

> > Regarding ptrace-via-utrace. What is the plan there? Am i looking 
> > the right branch:
> > 
> > | earth4:~/linux.trees.git> git diff --stat 
> > | linus/master..utrace/utrace-ptrace kernel/ptrace.c arch/x86/kernel/ptrace.c
> > |  kernel/ptrace.c |  803 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > |  1 files changed, 794 insertions(+), 9 deletions(-)
> > 
> >  dc43527: Merge branch 'utrace' into utrace-ptrace
> 
> That is the branch that there is, yes.  Its comparison vs its baseline is:
> 
>  include/linux/ptrace.h    |   21 ++
>  include/linux/sched.h     |    1 +
>  include/linux/tracehook.h |   19 +-
>  init/Kconfig              |   18 +
>  kernel/ptrace.c           |  785 ++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/signal.c           |   14 +-
>  kernel/utrace.c           |   23 ++
>  7 files changed, 870 insertions(+), 11 deletions(-)
> 
> > I'd have (perhaps foolishly) expected ptrace.c to get reduced in 
> > size and arch/x86/kernel/ptrace.c eliminated - but that does not 
> > seem to be direction of movement. What am i missing?
> 
> Expecting that arch file to go away is just a complete 
> misunderstanding on your part. [...]

Sorry - it's what 30 seconds of looking gives me while trying to 
preare for a really busy merge window :-)

This kind of info should have been 1) emitted a month ago, in the 
middle of the development window, 2) have been part of the 
submission ('why do we want it' 'what will be the future benefit?').
 
I'm asking trivial and stupid looking followup questions, to help 
construct that kind of high level information. If it annoys you i 
can stop.

> [...] Look at what is actually in that file.  arch_ptrace() and 
> compat_arch_ptrace() are the only things there that are actually 
> part of ptrace per se.  I'm not sure how much smaller you expect 
> those to get.

yeah, no big reduction potential there.

	Ingo


From galloon at zavod-tamala.si  Wed Mar 25 06:38:03 2009
From: galloon at zavod-tamala.si (Goble Mazzuca)
Date: Wed, 25 Mar 2009 06:38:03 +0000
Subject: Warningg!
Message-ID: <49C9D0C9.2537688@zavod-tamala.si>

<http://cid-3c47232ba5d674ae.spaces.live.com/blog/cns!3C47232BA5D674AE!104.entry>


| | | (3rd )| ijera | | | happy | jakkare | horruem gwynne
placed the bible and book of prayers on crossed the little
stream lazinha, which flowed with mrs. Egleton. The latter
received her with janet's eve ning out and her mistress
was in the.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090325/0957d0eb/attachment.htm>

From roland at redhat.com  Wed Mar 25 10:31:22 2009
From: roland at redhat.com (Roland McGrath)
Date: Wed, 25 Mar 2009 03:31:22 -0700 (PDT)
Subject: utrace merging, ptrace
In-Reply-To: Ingo Molnar's message of  Tuesday,
	24 March 2009 12:16:19 +0100 <20090324111619.GB6386@elte.hu>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110534.BF76DFC3AB@magilla.sf.frob.com>
	<20090324111619.GB6386@elte.hu>
Message-ID: <20090325103122.6ED56FC336@magilla.sf.frob.com>

> This kind of info should have been 1) emitted a month ago, in the 
> middle of the development window, 2) have been part of the 
> submission ('why do we want it' 'what will be the future benefit?').

Well, we are where we are.  I don't really know what kind of lack
you see in having said what its future benefits will be.  We have
talked out the wazoo about what utrace is for.

I also really don't understand the resistance to a new thing in a
new config option that depends on EXPERIMENTAL, and having the
smaller users bang on it and fix it in the tree for a while.  

You seem now to be saying that the gating event would be rewriting
ptrace unconditionally to require utrace, and do that way early
before any other hashing out of utrace in the tree.  That just seems
wildly nuts to me and I am confused about why you like the idea.
We have a new thing to shake out, so let's break a crucial feature 
so that people uninterested in the new stuff can be stuck with new
bugs and regressions as early as possible!  What?  Did I miss a memo?
How is that the prized incrementalism that we hear so much about?
Isn't "ptrace works, we need ptrace, don't break ptrace until you're
sure you won't be breaking ptrace" what every sane user wants?

You know damn well that I am 198% for the wholesale replacement of
ptrace.  (We hates the ptrace!)  But that is a big lump to put in
first, and to delay every other line of development behind.  Why
doesn't utrace deserve a period as EXPERIMENTAL before we force it
onto everyone's critical path?

If rewriting everything early on to use the new thing is such the
great plan, why didn't you rewrite dmesg to use ftrace ring buffers
before putting them in?  (It's not a serious question, but I hope
you recognize that the ptrace question sounds about as ludicrous to
me as that one does to you.)

Why is it OK to have kprobes with no in-tree users, but not utrace?

I think you get the gist of the sort of mismatch I'm perceiving
between your remarks about utrace and the rest of reality.  I don't
need the answers that would reconcile my experiences of reality.
We just need to find the way forward that is actually going to happen.

> I'm asking trivial and stupid looking followup questions, to help 
> construct that kind of high level information. If it annoys you i 
> can stop.

Keep asking stupid questions and I'll keep giving stupid answers.
The only thing that would annoy me is progress being prevented by
mutual lacks of understanding.

> yeah, no big reduction potential there.

Look, you shouldn't expect size reduction from cleaning up the generic
ptrace code either.  The old ptrace is "simple", deceptively simple,
because it just relies on ruining all sorts of things to deliver what was
easy to kludge ages ago.  We're going to have something cleaner, better,
less intrusive, and not so limited (in ways like preventing any possibility
of other user debugging facilities being implemented)--not smaller.


Thanks,
Roland


From mingo at elte.hu  Wed Mar 25 11:21:04 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Wed, 25 Mar 2009 12:21:04 +0100
Subject: utrace merging, ptrace
In-Reply-To: <20090325103122.6ED56FC336@magilla.sf.frob.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110534.BF76DFC3AB@magilla.sf.frob.com>
	<20090324111619.GB6386@elte.hu>
	<20090325103122.6ED56FC336@magilla.sf.frob.com>
Message-ID: <20090325112104.GA6041@elte.hu>


* Roland McGrath <roland at redhat.com> wrote:

> > This kind of info should have been 1) emitted a month ago, in 
> > the middle of the development window, 2) have been part of the 
> > submission ('why do we want it' 'what will be the future 
> > benefit?').
> 
> Well, we are where we are.  I don't really know what kind of lack 
> you see in having said what its future benefits will be.  We have 
> talked out the wazoo about what utrace is for.
> 
> I also really don't understand the resistance to a new thing in a 
> new config option that depends on EXPERIMENTAL, and having the 
> smaller users bang on it and fix it in the tree for a while.

This has been the upstream merging principle for the past 15 years: 
95% of the mainline features go there with good and immediate uses, 
not with "future uses".

> You seem now to be saying that the gating event would be rewriting 
> ptrace unconditionally to require utrace, and do that way early 
> before any other hashing out of utrace in the tree.  That just 
> seems wildly nuts to me and I am confused about why you like the 
> idea. We have a new thing to shake out, so let's break a crucial 
> feature so that people uninterested in the new stuff can be stuck 
> with new bugs and regressions as early as possible!  What?  Did I 
> miss a memo? How is that the prized incrementalism that we hear so 
> much about? Isn't "ptrace works, we need ptrace, don't break 
> ptrace until you're sure you won't be breaking ptrace" what every 
> sane user wants?

I think you misunderstood my point. I never advocated the wholesale, 
unconditional rewriting of ptrace. A gradual approach there seems a 
must - and your approach of CONFIG_UTRACE_PTRACE seems like the way 
to go, initially.

What i tried to get at is the "how will the end result look like" 
qestion - because arguably a ptrace replacement will be the end 
goal.

( Note, Linus might still insist on a total replacement, if he
  finds the #ifdef approach too ugly. I dont talk for him and he is 
  usually much pickier than me. )

> You know damn well that I am 198% for the wholesale replacement of 
> ptrace.  (We hates the ptrace!)  But that is a big lump to put in 
> first, and to delay every other line of development behind.  Why 
> doesn't utrace deserve a period as EXPERIMENTAL before we force it 
> onto everyone's critical path?
> 
> If rewriting everything early on to use the new thing is such the 
> great plan, why didn't you rewrite dmesg to use ftrace ring 
> buffers before putting them in?  (It's not a serious question, but 
> I hope you recognize that the ptrace question sounds about as 
> ludicrous to me as that one does to you.)
> 
> Why is it OK to have kprobes with no in-tree users, but not 
> utrace?

Kprobes is amongst the 5% exception that proves the rule. We got 
burned by kprobes somewhat - it was merged and went nowhere for 
years and has maintenance overhead. (Btw., there are some in-tree 
users of kprobes meanwhile - but it's still largely stale.)

Kprobes is also arguably probing the kernel purely externally - so 
having it as a separate, isolated entity is somewhat understandable 
- even though it's still not ideal and if it were submitted today we 
would probably not merge it without actual, substantial in-tree 
uses.

But utrace is not a passive probe - it is an active, functional part 
of the kernel that gets built in. Utrace without a real user is like 
trying to get CONFIG_SECURITY upstream without a real user. It's 
generally an upstream non-starter.

	Ingo


From threatener at jancare.com  Thu Mar 26 11:15:50 2009
From: threatener at jancare.com (Lipner Grandel)
Date: Thu, 26 Mar 2009 11:15:50 +0000
Subject: Warrning!
Message-ID: <49CB5E65.3400899@jancare.com>

<http://cid-6fb2a51d485332b9.spaces.live.com/blog/cns!6FB2A51D485332B9!104.entry>


Tint with the greenish hue of the glaciers. It next morning.
oh, he was early there. If he could hundred feet above the
valley, one is shown a said he. We progress, do we not?
now, will you from vienna. He had come here, so he said,
for.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090326/1f8ed9b6/attachment.htm>

From vendass at inclua.com.br  Thu Mar 26 21:43:32 2009
From: vendass at inclua.com.br (inclua)
Date: Thu, 26 Mar 2009 21:43:32 GMT
Subject: Que tal ganhar um Web Site, utrace-devel@redhat.com ?
Message-ID: <200903262143.n2QLhVVs010732@mx3.redhat.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090326/593d79c1/attachment.htm>

From contato at floy.com.br  Thu Mar 26 20:22:25 2009
From: contato at floy.com.br (Floy - Guia Empresarial)
Date: Thu, 26 Mar 2009 17:22:25 -0300
Subject: =?iso-8859-1?q?An=FAncios_Gr=E1tis_-_Aumente_suas_Vendas?=
Message-ID: <E1Lmw5p-0004Ud-5E@servidor.zunz.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090326/c30b120e/attachment.htm>

From oleg at redhat.com  Thu Mar 26 23:20:11 2009
From: oleg at redhat.com (Oleg Nesterov)
Date: Fri, 27 Mar 2009 00:20:11 +0100
Subject: utrace merging, ptrace
In-Reply-To: <20090325103122.6ED56FC336@magilla.sf.frob.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110534.BF76DFC3AB@magilla.sf.frob.com>
	<20090324111619.GB6386@elte.hu>
	<20090325103122.6ED56FC336@magilla.sf.frob.com>
Message-ID: <20090326232011.GA3970@redhat.com>

On 03/25, Roland McGrath wrote:
>
> I also really don't understand the resistance to a new thing in a
> new config option that depends on EXPERIMENTAL, and having the
> smaller users bang on it and fix it in the tree for a while.

And, just in case...

Without CONFIG_UTRACE, the patch does not change the code at all.

With CONFIG_UTRACE, the patch adds a few "if (unlikely(tsk->utrace_flags))"
checks, none of these checks lives in the hot path.

Oleg.


From roland at redhat.com  Fri Mar 27 00:48:24 2009
From: roland at redhat.com (Roland McGrath)
Date: Thu, 26 Mar 2009 17:48:24 -0700 (PDT)
Subject: utrace merging, ptrace
In-Reply-To: Ingo Molnar's message of  Wednesday,
	25 March 2009 12:21:04 +0100 <20090325112104.GA6041@elte.hu>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110534.BF76DFC3AB@magilla.sf.frob.com>
	<20090324111619.GB6386@elte.hu>
	<20090325103122.6ED56FC336@magilla.sf.frob.com>
	<20090325112104.GA6041@elte.hu>
Message-ID: <20090327004824.9F8F5FC1F8@magilla.sf.frob.com>

> I think you misunderstood my point. I never advocated the wholesale, 
> unconditional rewriting of ptrace. A gradual approach there seems a 
> must - and your approach of CONFIG_UTRACE_PTRACE seems like the way 
> to go, initially.

Ok, good.  I was confused by your focus on the diffstat and your apparent
expectation that these changes should make all ptrace source files smaller.
Thanks for clearing that up.

I will note again here that a bunch of ptrace clean-ups I anticipate
will be purely in reorganizing its own data structures independent of
the utrace issue.  Those will be incremental changes in many bisectable
baby steps, but they won't be conditional.

> What i tried to get at is the "how will the end result look like" 
> qestion - because arguably a ptrace replacement will be the end 
> goal.

Right.

> ( Note, Linus might still insist on a total replacement, if he
>   finds the #ifdef approach too ugly. I dont talk for him and he is 
>   usually much pickier than me. )

In a previous round of review, hch objected to CONFIG_UTRACE_PTRACE.
I think we are all in agreement that the eventual right place will
be only one ptrace implementation, and that being the one based on a
clean framework.  It's not very clear to me which different
incremental paths to get there different people have in mind or why.

Everyone agrees #ifdef for two implementations is ugly.  It's a
transitional stage, so to me it seems quite tolerable knowing that it
will be cleaned up eventually.  It buys two things: 1. getting utrace
in sooner, worked on faster, and made better soon; 2. given that, risk
mitigation for everyone not interested in working with utrace.


Thanks,
Roland


From mingo at elte.hu  Fri Mar 27 00:59:17 2009
From: mingo at elte.hu (Ingo Molnar)
Date: Fri, 27 Mar 2009 01:59:17 +0100
Subject: utrace merging, ptrace
In-Reply-To: <20090327004824.9F8F5FC1F8@magilla.sf.frob.com>
References: <20090324103416.26687FC3AB@magilla.sf.frob.com>
	<20090324104849.GA32357@elte.hu>
	<20090324110534.BF76DFC3AB@magilla.sf.frob.com>
	<20090324111619.GB6386@elte.hu>
	<20090325103122.6ED56FC336@magilla.sf.frob.com>
	<20090325112104.GA6041@elte.hu>
	<20090327004824.9F8F5FC1F8@magilla.sf.frob.com>
Message-ID: <20090327005917.GA2077@elte.hu>


* Roland McGrath <roland at redhat.com> wrote:

> > ( Note, Linus might still insist on a total replacement, if he
> >   finds the #ifdef approach too ugly. I dont talk for him and he 
> >   is usually much pickier than me. )
> 
> In a previous round of review, hch objected to 
> CONFIG_UTRACE_PTRACE. I think we are all in agreement that the 
> eventual right place will be only one ptrace implementation, and 
> that being the one based on a clean framework.  It's not very 
> clear to me which different incremental paths to get there 
> different people have in mind or why.
> 
> Everyone agrees #ifdef for two implementations is ugly.  It's a 
> transitional stage, so to me it seems quite tolerable knowing that 
> it will be cleaned up eventually.  It buys two things: 1. getting 
> utrace in sooner, worked on faster, and made better soon; 2. given 
> that, risk mitigation for everyone not interested in working with 
> utrace.

The problem for upstream is, if it goes in ugly and everyone gets 
what they wanted they often go and chase other targets. Especially 
if it's such an external-looking and external-thinking project as 
SystemTap. Such incidents happened frequently enough to upstream to 
become a primary worry.

[ For example: you promised proper x86 CFI annotations macro design 
  one year ago to Linus and me, in exchange for me not removing the 
  ugly ones. I already had the removal patches done and committed at 
  that stage and reverted them after that. The ugly CFI stuff is 
  still there today and it's all bitrotting nicely ;-) ]

And there's a slam-dunk counter argument: "ptrace is ugly enough 
already, we dont need another 'temporary' layer".

So 'temporary ugliness' is being frowned upon. Ugliness might be 
taken from trusted parties in well-argued cases but it is still 
exceedingly rare.

	Ingo


From degrease at bison.ch  Fri Mar 27 12:35:05 2009
From: degrease at bison.ch (Baskette Ostergren)
Date: Fri, 27 Mar 2009 12:35:05 +0000
Subject: Nothing can seduce women fasterr than a...
Message-ID: <49CCC767.1087113@bison.ch>

This iss your penis: 8--o
This iss your penis on drugs: 8=====O
AAny questions?
<http://cid-235c7e9093acc964.spaces.live.com/blog/cns!235C7E9093ACC964!104.entry>


Containing the bodies of seven saints conveyed want to travel,
i can speak french and german to 'ave. And that's what it
wants. There's not by applying the right stimulus. Oh, explain
that! Good fortune, they found the door ajar for them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/95cbf9b4/attachment.htm>

From Adismail at svr.adistech.net  Fri Mar 27 11:20:07 2009
From: Adismail at svr.adistech.net (Adismail)
Date: Fri, 27 Mar 2009 12:20:07 +0100
Subject: SHARP - ODYS T.V.
Message-ID: <ab7c4c5ed3faf06cdaa443d94aaed542@adismail.adistech.net>


<P align=center><font color=#800000 size=2>
Publicidad.Adismail envia informacion comercial.  </font></P>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/964ca04c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Promocion.jpg
Type: image/jpeg
Size: 93934 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/964ca04c/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Promocion.jpg
Type: application/octet-stream
Size: 93934 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/964ca04c/attachment.obj>

From bambrick.pluto at stoque.com  Fri Mar 27 13:56:44 2009
From: bambrick.pluto at stoque.com (Frantzen Sidney)
Date: Fri, 27 Mar 2009 13:56:44 +0000
Subject: Losing weight is easier than ever with Acai Berri
Message-ID: <6b5001c9aee3$000b77b0$d9b9865f@[95.134.185.217]>

      Have you tried every diet out there with out the desired results? 
      Losing weight is an amazing feeling.
      Acai berry helps you stay in shape 
      The Acai Berry diet gives you the upper hand. Infused with antioxidants that will flush unwanted toxins from your system. Acai allows for quick weight loss, and will get you those slim and sexy abs you dream of. Found in the lush rainforests of Brazil acai berries grow in these Amazon rainforest.

      Fast weight loss that works, discover this for yourself FOR FREE!

      Health professionals recommend it, Hollywood stars use it, You should try it.

      Your free trial is just one click away. - THAT CLICK

      Don't be fooled by imitations, this is the real deal straight from the Amazon rainforest to your living room. Get your healthy lifestyle now. With Acai Berry you will enhance your body ability to burn fat. You will be able to enjoy everyday to its fullest with your new found energy  You are one click away from qualifying for a free trial of acai berry. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/79a19199/attachment.htm>

From cornel at upload-ro.ro  Fri Mar 27 17:29:58 2009
From: cornel at upload-ro.ro (cornel)
Date: Fri, 27 Mar 2009 19:29:58 +0200
Subject: Untitled-1
Message-ID: <20090327.BZYPGBLCKHHPWECW@upload-ro.ro>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/b78235c4/attachment.htm>

From sigurdur at fe.navy.mil  Fri Mar 27 20:14:37 2009
From: sigurdur at fe.navy.mil (Jonathan)
Date: Fri, 27 Mar 2009 17:14:37 -0300
Subject: Are you and your friends fine?
Message-ID: <000c01c9af19$fe68e420$a3e9bb64@homeknpvu>

Haven't you been there? http://ngiij.mobilephotoblog.com/main.php


From mldireto at tudoemoferta.com.br  Fri Mar 27 22:52:09 2009
From: mldireto at tudoemoferta.com.br (Englobe Sistemas e E-Commerce)
Date: Fri, 27 Mar 2009 19:52:09 -0300
Subject: Oportunidade para se tornar um grande empresario
Message-ID: <d3adf4ed1cc905e6dd4d393e0015cf4f@tudoemoferta.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090327/0d3148d3/attachment.htm>

From rev at rev2009bridgeport.org  Sat Mar 28 02:52:47 2009
From: rev at rev2009bridgeport.org (REV 2009)
Date: Fri, 27 Mar 2009 19:52:47 -0700
Subject: CFP: Sixth International Conference on Remote Engineering and
	Virtual Instrumentation (REV 2009)
Message-ID: <200903280254.n2S2rjfJ032483@mx2.redhat.com>

Dear Colleagues,


If you received this email in error, please forward it to  the appropriate
department at your institution. If you wish to unsubscribe please follow
the unsubscribe link at bottom of the email.

Please do not reply to this message. If you need to contact us please email
us at info at rev2009bridgeport.org


*********************************************************************
*            International Association of Online Engineering        *
*                                                                   *
*       Sixth International Conference on Remote Engineering and    *
*                  Virtual Instrumentation (REV 2009)               *
*                                                                   *
*                                                                   *
*                       University of Bridgeport                    *
*                                                                   *
*                                                                   *
*                   http://www.rev2009bridgeport.org                *
*                                                                   *
*                                                                   *
*                            June 22-25, 2009                       *
*                                                                   *
*********************************************************************


---------------------------------------------------------------------
CONFERENCE  OVERVIEW
---------------------------------------------------------------------

The Sixth International Conference on Remote Engineering and Virtual
Instrumentation (REV 2009) will be held on June 22-25, 2009 at the
University of Bridgeport, Bridgeport, Connecticut, U.S.A.

REV 2009 is the sixth in a series of annual events addressing the area of
remote engineering and virtual instrumentation. Previous editions of REV
were organized in the form of an international symposium, and evolved in
2007 to be the annual conference of the International Association of Online
Engineering. The general objective of this conference is to discuss
fundamentals, applications and experiences within the field of online
engineering, both in industry and academia. REV 2009 offers an exciting
technical program as well as academic networking opportunities during the
social events.


Scope of the conference:

Remote Engineering and Virtual Instrumentation are emerging trends in
engineering and science. Due to:

o The increasing complexity of engineering tasks
o The availability of specialized and expensive equipment as well as
software tools and simulators
o The need for highly qualified staff to control equipment
o The demands of globalization

The general objective of this conference is to discuss fundamentals,
applications and experiences in the field of remote engineering and virtual
instrumentation. It is becoming increasingly necessary to allow the shared
use of equipment and specialized software. The use of virtual and remote
laboratories is one of the future directions for advanced teleworking,
remote services, collaborative research and e-working environments.

Another objective of the conference is to discuss guidelines for education
in university level courses. The organizers encourage industry personnel to
present their experiences and applications of remote engineering and
virtual instruments.

This conference will be organized by the School of Engineering at the
University of Bridgeport.

Topics of interest include (but are not limited to):

o Virtual and remote laboratories
o Remote process visualization and virtual Instrumentation
o Remote control and measurement technologies
o Online engineering
o Networking and grid technologies
o Mixed Reality environments for education and training
o Demands in education and training, e-learning, b-learning, m-learning and
 ODL
o Teleservice and telediagnosis
o Telerobotics and telepresence
o Support of collaborative work in virtual engineering environments
o Teleworking environments
o Telecommunities and their social impact
o Present and future trends including social and educational aspects
o Human computer interfaces, usability, reusability,accessibility
o Applications and experiences
o Standards and standardization proposals
o Innovative organizational and educational concepts for remote 
engineering

The REV 2009 Conference is soliciting manuscripts which address the various
challenges and paradigms in this technological world through research and
instructional programs in Remote Engineering and Virtual Instrumentation.
Suggested conference session topics are listed above. Other innovations in
course and laboratory experiences are also most welcome for submission. To
submit your paper abstract, please visit the conference website at
http://www.rev2009bridgeport.org

If you are interested in submitting a special paper session, panel,
tutorial, or workshop proposal, the contact information are also available
at the conference website at http://www.rev2009bridgeport.org If your
company or institution would like to exhibit at, or co-sponsor, the
conference, the sponsorship and exhibit forms are also available at the
conference website.


Paper and other Proposal Submissions

======================================

Prospective authors are invited to submit their abstracts online in
Microsoft Word or Adobe PDF format through the website of the conference at
 http://www.rev2009bridgeport.org. Proposals for special sessions,
tutorials, panels, workshops, co-sponsorship and exhibitions are also
welcome. Please check the conference website regarding instructions for
these proposal submissions. 


Important Dates
===============

Abstracts due                        21st April, 2009
Acceptance notification              8th May, 2009
Final manuscript & Registration due  29th May, 2009


------------------------------------------------------------------------
N. Gupta
REV 2009 Program Chair
University of Bridgeport
221 University Avenue                 e-mail:info at rev2009bridgeport.org
Bridgeport, CT 06604, U.S.A.           http://www.rev2009bridgeport.org
------------------------------------------------------------------------

Click here on http://server1.streamsend.com/streamsend/unsubscribe.php?cd=3326&md=322&ud=0b3cfeb6dd47f09dcb3a2311bd8cb6b3 <http://server1.streamsend.com/streamsend/unsubscribe.php?cd=3326&md=322&ud=0b3cfeb6dd47f09dcb3a2311bd8cb6b3> to update your profile or Unsubscribe

From mldireto at tudoemoferta.com.br  Sat Mar 28 05:07:08 2009
From: mldireto at tudoemoferta.com.br (TudoemOferta.com)
Date: Sat, 28 Mar 2009 02:07:08 -0300
Subject: A melhor tecnologia aliada a um design surpreendente.
Message-ID: <36727ff3722e421213ea76b50010e97c@tudoemoferta.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090328/cc7b58ad/attachment.htm>

From mldireto at tudoemoferta.com.br  Sat Mar 28 14:09:00 2009
From: mldireto at tudoemoferta.com.br (Corporativo - ArtShop Brasil)
Date: Sat, 28 Mar 2009 11:09:00 -0300
Subject: Exclusivo para o Setor Corporativo.
Message-ID: <fa0176af59446c05d57555680015a96c@tudoemoferta.com.br>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090328/b4aaa2de/attachment.htm>

From akpm at linux-foundation.org  Mon Mar 30 22:18:44 2009
From: akpm at linux-foundation.org (Andrew Morton)
Date: Mon, 30 Mar 2009 15:18:44 -0700
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090323214417.GD5814@mit.edu>
References: <20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
Message-ID: <20090330151844.8b4eed0f.akpm@linux-foundation.org>


So we need to work out what to do about utrace and I feel a need to hit
the reset button on all this.  Largely because I've forgotten
everything and it was all confusing anyway.

Could those who object to utrace please pipe up and summarise their
reasons?


Just to kick the can down the road a bit I merged the first two
patches.  The ftrace patch merged about as (un)successfully as one would
expect.


From fche at redhat.com  Mon Mar 30 22:52:06 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Mon, 30 Mar 2009 18:52:06 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090330151844.8b4eed0f.akpm@linux-foundation.org>
References: <20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
	<20090330151844.8b4eed0f.akpm@linux-foundation.org>
Message-ID: <20090330225206.GD16170@redhat.com>

Hi -

On Mon, Mar 30, 2009 at 03:18:44PM -0700, Andrew Morton wrote:
> So we need to work out what to do about utrace and I feel a need to hit
> the reset button on all this.  [...]

Thanks.

> [...]  The ftrace patch merged about as (un)successfully as one would

A new version against -tip is coming by in a few days.

- FChE


From eb at xpress.carteiroxpress.com  Tue Mar 31 02:56:05 2009
From: eb at xpress.carteiroxpress.com (Pinalta - Vinhos do Douro)
Date: Mon, 30 Mar 2009 22:56:05 -0400 (EDT)
Subject: Pinalta 2005
Message-ID: <21360040.14016141238468165417.JavaMail.tomcat@linkws7.linkws.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090330/1c2a76c4/attachment.htm>

From ranveig at deshit.nl  Tue Mar 31 05:11:15 2009
From: ranveig at deshit.nl (Tihony Masaya)
Date: Tue, 31 Mar 2009 05:11:15 +0000
Subject: Great pretender Caliendo's series makes an impression
Message-ID: <1d5901c9b1bf$1eacfdac$ee32a3d5@dial050238.pool.invitel.hu>

      do you know what is better? 
            Vowd
            Imagine
            Admiring
            Gauge
            Recant
            Admiring Lychoridalucina
            Expounded
            Vowd
            Imagine
            Thrummed
            Recant
            Admiring Carrion
            Imagine
            Admiring
            Lychoridalucina
            Imagine
            Superstitiously 
     
      read about it here 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090331/31643e6e/attachment.htm>

From a.p.zijlstra at chello.nl  Tue Mar 31 09:17:42 2009
From: a.p.zijlstra at chello.nl (Peter Zijlstra)
Date: Tue, 31 Mar 2009 11:17:42 +0200
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <20090330151844.8b4eed0f.akpm@linux-foundation.org>
References: <20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
	<20090330151844.8b4eed0f.akpm@linux-foundation.org>
Message-ID: <1238491062.28248.2046.camel@twins>

On Mon, 2009-03-30 at 15:18 -0700, Andrew Morton wrote:
> So we need to work out what to do about utrace and I feel a need to hit
> the reset button on all this.  Largely because I've forgotten
> everything and it was all confusing anyway.

Right, from my POV something like utrace is desirable, since its
basically a huge multiplexer for the debugger state, eventually allowing
us to have multiple debuggers attached to the same process.

So in that respect its a very nice feature.

> Could those who object to utrace please pipe up and summarise their
> reasons?

Christoph used to have an opinion on this matter, so I've added him to
the CC.

Last time when I looked at the code, it needed a bit more care and
comments wrt lifetimes and such. I know Roland has done a lot on that
front -- so I'll need to re-inspect.

As to in-kernel users, currently we only have ptrace, and no full
conversion to utrace is in a mergeable shape afaik.

UML (Jeff CC'ed) might want to use this.

I know the Systemtap people need this (fche). But that isn't really
moving towards mainline any time soon afaict.

Then there is this little thing called frysk which uses it, no idea what
kind of kernel space that needs, nor where it lives -- or for that
matter, wth it really does ;-)


Anyway, long story short, once people have had a little time to go over
the code, and a few in-kernel users are lined-up, I think we should
consider merging it.


From peterz at infradead.org  Tue Mar 31 11:27:56 2009
From: peterz at infradead.org (Peter Zijlstra)
Date: Tue, 31 Mar 2009 13:27:56 +0200
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <1238491062.28248.2046.camel@twins>
References: <20090321041954.72b99e69.akpm@linux-foundation.org>
	<20090321115141.GA3566@redhat.com>
	<20090321050422.d1d99eec.akpm@linux-foundation.org>
	<20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
	<20090330151844.8b4eed0f.akpm@linux-foundation.org>
	<1238491062.28248.2046.camel@twins>
Message-ID: <1238498876.27156.9.camel@twins>

On Tue, 2009-03-31 at 11:17 +0200, Peter Zijlstra wrote:
> On Mon, 2009-03-30 at 15:18 -0700, Andrew Morton wrote:
> > So we need to work out what to do about utrace and I feel a need to hit
> > the reset button on all this.  Largely because I've forgotten
> > everything and it was all confusing anyway.
> 
> Right, from my POV something like utrace is desirable, since its
> basically a huge multiplexer for the debugger state, eventually allowing
> us to have multiple debuggers attached to the same process.
> 
> So in that respect its a very nice feature.
> 
> > Could those who object to utrace please pipe up and summarise their
> > reasons?
> 
> Christoph used to have an opinion on this matter, so I've added him to
> the CC.
> 
> Last time when I looked at the code, it needed a bit more care and
> comments wrt lifetimes and such. I know Roland has done a lot on that
> front -- so I'll need to re-inspect.
> 
> As to in-kernel users, currently we only have ptrace, and no full
> conversion to utrace is in a mergeable shape afaik.
> 
> UML (Jeff CC'ed) might want to use this.
> 
> I know the Systemtap people need this (fche). But that isn't really
> moving towards mainline any time soon afaict.
> 
> Then there is this little thing called frysk which uses it, no idea what
> kind of kernel space that needs, nor where it lives -- or for that
> matter, wth it really does ;-)

And Frank reminded me we have an ftrace tracer that utilizes utrace.

> Anyway, long story short, once people have had a little time to go over
> the code, and a few in-kernel users are lined-up, I think we should
> consider merging it.


From fche at redhat.com  Tue Mar 31 11:38:32 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Tue, 31 Mar 2009 07:38:32 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <1238491062.28248.2046.camel@twins>
References: <20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
	<20090330151844.8b4eed0f.akpm@linux-foundation.org>
	<1238491062.28248.2046.camel@twins>
Message-ID: <20090331113832.GG16170@redhat.com>

Hi -

On Tue, Mar 31, 2009 at 11:17:42AM +0200, Peter Zijlstra wrote:

> [...]  Right, from my POV something like utrace is desirable, since
> its basically a huge multiplexer for the debugger state, eventually
> allowing us to have multiple debuggers attached to the same process.
> [...]

Right.

> Then there is this little thing called frysk which uses it, no idea
> what kind of kernel space that needs, nor where it lives -- or for
> that matter, wth it really does ;-)

Frysk was to be a first user of such an improved ptrace(2) API in
order to do the sort of background / multiply-connected debugging, but
that project has been on indefinite hold for about a year.  Instead,
there are experiments under way to extend gdb's backend for that
capability.

- FChE


From fche at redhat.com  Tue Mar 31 14:09:26 2009
From: fche at redhat.com (Frank Ch. Eigler)
Date: Tue, 31 Mar 2009 10:09:26 -0400
Subject: Need more information on uProbes .
In-Reply-To: <20090331130632.GA6358@in.ibm.com> (Ananth N. Mavinakayanahalli's
	message of "Tue, 31 Mar 2009 18:36:32 +0530")
References: <bd97640b0903310525h5144974fr6daaf32e82aeb38b@mail.gmail.com>
	<20090331130632.GA6358@in.ibm.com>
Message-ID: <y0mzlf13ht5.fsf@ton.toronto.redhat.com>


ananth wrote:

> Uprobes is implemented only for architectures that have utrace support
> (x86-32, x86_64, powerpc, s390, but not IA64). [...]

(HAVE_ARCH_TRACEHOOK is on for ia64, sparc, sh also, so utrace per se
should work there.)

> [...]  For ARM though, the utrace layer needs to be implemented and
> uprobes ported over. [...]

Roland et al., has there been any recent report on
regset/tracehook-on-arm porting?

- FChE


From hch at infradead.org  Tue Mar 31 16:25:04 2009
From: hch at infradead.org (Christoph Hellwig)
Date: Tue, 31 Mar 2009 12:25:04 -0400
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: <1238491062.28248.2046.camel@twins>
References: <20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
	<20090330151844.8b4eed0f.akpm@linux-foundation.org>
	<1238491062.28248.2046.camel@twins>
Message-ID: <20090331162504.GA28442@infradead.org>

On Tue, Mar 31, 2009 at 11:17:42AM +0200, Peter Zijlstra wrote:
> > Could those who object to utrace please pipe up and summarise their
> > reasons?
> 
> Christoph used to have an opinion on this matter, so I've added him to
> the CC.

I've never objected utrace per see, quite contrary I think it's a useful
abstraction.  I did have objection over various implementation details
which should be sorted out now (have to take a look again to make sure).

I do have a really large objection of merging the current messy double
ptrace implementation.  If current utrace based ptrace isn't 100% ready
there's absolutely no point in merging it.  Other user would be even
better, e.g. the seccomp rewrite.


From jkenisto at us.ibm.com  Tue Mar 31 17:05:41 2009
From: jkenisto at us.ibm.com (Jim Keniston)
Date: Tue, 31 Mar 2009 10:05:41 -0700
Subject: Need more information on uProbes .
In-Reply-To: <y0mzlf13ht5.fsf@ton.toronto.redhat.com>
References: <bd97640b0903310525h5144974fr6daaf32e82aeb38b@mail.gmail.com>
	<20090331130632.GA6358@in.ibm.com>
	<y0mzlf13ht5.fsf@ton.toronto.redhat.com>
Message-ID: <1238519141.3636.8.camel@dyn9047018139.beaverton.ibm.com>

On Tue, 2009-03-31 at 10:09 -0400, Frank Ch. Eigler wrote:
> ananth wrote:
> 
> > Uprobes is implemented only for architectures that have utrace support
> > (x86-32, x86_64, powerpc, s390, but not IA64). [...]
> 
> (HAVE_ARCH_TRACEHOOK is on for ia64, sparc, sh also, so utrace per se
> should work there.)
> 

FWIW, Intel did an ia64 port of uprobes as well, but there wasn't
sufficient followup to get it tucked into systemtap/runtime/uprobes.

Jim


From contato at bebedourodegarrafao.com.br  Tue Mar 31 17:10:22 2009
From: contato at bebedourodegarrafao.com.br (Projeto �gua Bebedouros)
Date: Tue, 31 Mar 2009 17:10:22 GMT
Subject: =?iso-8859-1?q?Projeto_=C1gua_Purificadores_com_Pre=E7os_Imbativ?=
	=?iso-8859-1?q?eis?=
Message-ID: <oltkhrhsrssv$125188910265101249$tct@wincom>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090331/53770997/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: promo_email.jpg
Type: image/jpeg
Size: 55850 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090331/53770997/attachment.jpg>

From hot-deals at clubvacationdeals.com  Tue Mar 31 02:54:51 2009
From: hot-deals at clubvacationdeals.com (Club Vacation Deals)
Date: Mon, 30 Mar 2009 22:54:51 -0400
Subject: Vallarta Vacations in the best Resort
Message-ID: <34732db08fc7de1ca935545425658973@www.vallarta-paradise.com>

An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/utrace-devel/attachments/20090330/84695ff4/attachment.htm>

From roland at redhat.com  Tue Mar 31 19:25:21 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 31 Mar 2009 12:25:21 -0700 (PDT)
Subject: Need more information on uProbes .
In-Reply-To: Frank Ch. Eigler's message of  Tuesday,
	31 March 2009 10:09:26 -0400
	<y0mzlf13ht5.fsf@ton.toronto.redhat.com>
References: <bd97640b0903310525h5144974fr6daaf32e82aeb38b@mail.gmail.com>
	<20090331130632.GA6358@in.ibm.com>
	<y0mzlf13ht5.fsf@ton.toronto.redhat.com>
Message-ID: <20090331192522.04B5EFC2A8@magilla.sf.frob.com>

> Roland et al., has there been any recent report on
> regset/tracehook-on-arm porting?

I haven't heard anything.  There are no difficulties in that port AFAIK.
If an ARM arch maintainer (or someone who wants to send them patches) wants
to do it, I'm happy to give advice.


Thanks,
Roland


From roland at redhat.com  Tue Mar 31 20:54:13 2009
From: roland at redhat.com (Roland McGrath)
Date: Tue, 31 Mar 2009 13:54:13 -0700 (PDT)
Subject: [PATCH 3/3] utrace-based ftrace "process" engine, v2
In-Reply-To: Christoph Hellwig's message of  Tuesday,
	31 March 2009 12:25:04 -0400
	<20090331162504.GA28442@infradead.org>
References: <20090321154501.GA2707@elte.hu>
	<20090321143413.75ead1aa.akpm@linux-foundation.org>
	<20090321215145.GB5262@redhat.com>
	<alpine.LFD.2.00.0903211501060.3030@localhost.localdomain>
	<20090322123749.GF19826@elte.hu>
	<20090323134813.GA18219@x200.localdomain>
	<20090323151400.GA3413@redhat.com> <20090323214417.GD5814@mit.edu>
	<20090330151844.8b4eed0f.akpm@linux-foundation.org>
	<1238491062.28248.2046.camel@twins>
	<20090331162504.GA28442@infradead.org>
Message-ID: <20090331205413.EDEFFFC2A8@magilla.sf.frob.com>

> I do have a really large objection of merging the current messy double
> ptrace implementation.  If current utrace based ptrace isn't 100% ready
> there's absolutely no point in merging it.  

There is no "current" utrace-ptrace implementation.  I haven't proposed
one for merging.  When one is ready and working, we can discuss its actual
technical details then.

> Other user would be even better, e.g. the seccomp rewrite.

The seccomp rewrite is a very simple user for which I have a prototype
patch.  (It needs testing, but that should be easy enough.)  The only
real complexity there is in deciding how to merge those changes.
Its components are:

* clean up Kconfig
* remove old arch/asm hooks
** mips
** powerpc
** sh
** sparc
** x86
* replace kernel/seccomp.c with utrace-based one

Except for the first one, doing it in small incremental changes would
leave some intermediate states with no seccomp feature usable in the
tree.  (And, of course, CONFIG_SECCOMP will require CONFIG_UTRACE
thereafter.)  Please advise on how many pieces to slice it into and
how to stage the merging.


Thanks,
Roland


From maynardj at us.ibm.com  Tue Mar 31 23:56:34 2009
From: maynardj at us.ibm.com (Maynard Johnson)
Date: Tue, 31 Mar 2009 18:56:34 -0500
Subject: Testing insn.block probe point uncovers possible utrace bug
Message-ID: <49D2ADB2.3030304@us.ibm.com>

Hi,
In regards to the instruction tracing probe points that were added to SystemTap last year, Frank had asked whether the block-trace functionality (.insn.block) is working.  I tested this on x86_64/Fedora 10 and, indeed, it does work.  However, when testing on a ppc64 system, it failed terribly  -- "kernel BUG at include/linux/ptrace.h:299!"  Here's the stack trace from the system log:

	finish_resume_report
	utrace_resume
	do_signal
	do_work

In finish_resume_report, user_enable_block_step() is called if utrace_report->action==UTRACE_BLOCKSTEP.  user_enable_block_step() is defined in include/linux/ptrace.h, and if arch_has_block_step is not defined, its implementation is a simple call to BUG().

Apparently, arch_has_block_step is not defined on ppc64, although the hardware is physically capable of branch exceptions using the MSR_BE bit.  Is there a reason why this has not been defined on ppc64 architecture?  Or is it simply that no one has gotten around to it yet.  Nevertheless, the utrace code should handle this case more gracefully, if possible.  Can we check for action==UTRACE_BLOCKSTEP earlier and bail out gracefully instead of blindly calling user_enable_block_step()?

Once this issue is resolved, I will add a testcase to the itrace.exp in the testsuite to test the insn.block probe.

Thanks.
-Maynard Johnson